Data Science Engineer – Who, What, & Why?
Data science is the study of where information comes from, what it represents and how it can be turned into a valuable resource in the creation of business and IT strategies. Mining large amounts of structured and unstructured data to identify patterns can help an organization rein in costs, increase efficiencies, recognize new market opportunities and increase the organization’s competitive advantage.
Engineering is a scientific field and job that involves taking our scientific understanding of the natural world and using it to invent, design, and build things to solve problems and achieve practical goals. It is basically the action of working artfully to bring something about. Engineers develop the architecture that helps analyze and process data in the way the organization needs it and they make sure those systems are performing smoothly.
Data Science Engineers are those who engineer data science
In Brandon Rohrer words, Data Science can answer five questions:
- Is this A or B?
- Is this weird?
- How much –or—How many?
- How is this data organized?
- What should I do next?
Separate family of machine learning methods, which we call algorithms, can answer these questions. Think about an algorithm as a recipe, data as ingredients, computer processing does job of a blender which process data as per algorithms.
First type of question is classification problems where we predict choices. Second one refers to anomaly detection problems. It flags an unusual behavior. Third type accounts to regression problems where we make numerical predictions. Fourth one is clustering problems where algorithm separates data into natural clumps. This one also invites association problems. The fifth questions bring another family of machine learning algorithms called reinforcement learning. This is inspired by observing how human brain responds to a situation.
According to Tom Mitchell, machine learning is “concerned with the question of how to construct computer programs that automatically improve with experience.” Machine learning is interdisciplinary in nature, and employs techniques from the fields of computer science, statistics, and artificial intelligence, among others. The main artefacts of machine learning research are algorithms which facilitate this automatic improvement from experience, algorithms which can be applied in a variety of diverse fields.
Data Science Engineers engineer (& sometimes re-engineer) data science. They are responsible to re-define & re-invent data science. This is achieved by going ‘wide & deep’ on data science & related fields. They design, develop, & deploy novel techniques and enhance existing ones to impact accuracy, cost, time, & interpretation – for entire data science cycle – starting from data ingestion and ending to business insights.
Data science engineers must have high degree of research acumen and workaround mindset
Data engineers are the plumbers and story creators building a data pipeline, while data scientists are the painters and storytellers giving meaning to an otherwise static entity.
Data science engineers are architects and story improviser innovating novel methods & discarding traditional ones. Simply put, data science engineers will bring new era of data science and replace the existing ecosystem for exponentially better results.