The Right and the Wrong Way to Do Cross-validation

  You might wonder why do we need cross validation in the first place itself. Let’s explain that first. Normally, the generalization performance of a machine learning algorithm depends on its prediction capability on an independent test data. This assessment is of utmost importance to us. Cross Validation is such a model validation technique...

Re-evaluate your career paths in the age of AI

Research says that AI will automate most of the functions of a Data Scientist by 2020. The explosion of Big Data led to the hottest job of the past five years – Data Scientist. Now experts are speculating 2017 to be the year of Artificial Intelligence. What does it say about the future of...

Data Science Engineer – Who, What, & Why?

  Data science is the study of where information comes from, what it represents and how it can be turned into a valuable resource in the creation of business and IT strategies. Mining large amounts of structured and unstructured data to identify patterns can help an organization rein in costs, increase efficiencies, recognize new...

Data Science – Let the Data Sing

  The hype is real. But let’s get past it. What exactly is Data Science? And why is it the next big thing. Massive amounts of data are being generated every sec. The total amount of data in the world is 4.4 zetabytes. And this is not just the internet data. We are talking...

Feature selection using Decision Tree

  One of the key differentiators in any data science problem is the quality of feature selection and importance. When we have a lot of data available to be used by our model, the task of feature selection becomes inevitable due to computational constraints and the elimination of noisy variables for better prediction. Also,...

How to avoid overfitting while training?

Overfitting happens mostly because the model becomes too complex. Such a model will give poor accuracies, as it memorizes the noise in the training data. A model is usually fit by achieving the highest accuracy on the training data set. However, its efficiency is judged by its its performance on test data. Overfitting occurs...