The distinction between data science, big data and data analytics
Big data is a term that has become quite ubiquitous; so ubiquitous that there isn’t an industry where the word isn’t being used to describe everything from making a chart in excel using 100 rows of data to cleaning up and presenting data graphically.
Based on a new understanding of what these terms mean, I would like to clarify the what the differences are.
Big Data refers to large amounts of data, data that is not meant to fit into your typical excel sheet. We are talking millions of rows of data that can be typically managed through a file system such as the Hadoop Distributed File System (HDFS).
Data engineering and (pre-)processing are the first steps in working through a data analytics project. In other words, first comes the data engineering and pre-processing, then comes the cleaning of the data, removing duplicates and redundancies and such.
Data science is a discipline in itself that uses the data made available in the previous step. So data science is essentially the skill required to extract insight, knowledge and new learning from data that has been pre-processed in the previous step.
Data analytics I would say forms the overall umbrella term used to describe the whole lifecycle of tasks, starting with data engineering, data science and presentation of the data in a more insightful form.