Take risk to work on machines rather work for machines

 Big dead fishes found in the ocean. New fishes to take over. In the next decade, over half the Fortune 500 will no longer exist. Statistics from U.S. Bureau of Economic Analysis Most of the largest companies in the world will be replaced in the next decade. What can save them are reinventions and...

Shrinkage Methods in Linear Regression

  Ever have a question that, “Why is Linear Regression giving me such good accuracy on the training set but a low accuracy on the test set in spite of adding all the available dependent features to the model?” The question above seems inexplicable to many people but is answered by a concept called...

The Right and the Wrong Way to Do Cross-validation

  You might wonder why do we need cross validation in the first place itself. Let’s explain that first. Normally, the generalization performance of a machine learning algorithm depends on its prediction capability on an independent test data. This assessment is of utmost importance to us. Cross Validation is such a model validation technique...

FuncTools in Python

  In functional programming we use functions to produce the required output instead of objects. The primary difference between the two is that the state of objects changes continuously whereas functions have no state and make no changes to variables which Is not visible. The itertools functions is one of the most popular modules...

Re-evaluate your career paths in the age of AI

Research says that AI will automate most of the functions of a Data Scientist by 2020. The explosion of Big Data led to the hottest job of the past five years – Data Scientist. Now experts are speculating 2017 to be the year of Artificial Intelligence. What does it say about the future of...

Functional Programmers: Why we call them first-class citizens?

  In programming language design, a first-class citizen (also type, object, entity, or value) in a given programming language is an entity which supports all the operations generally available to other entities. These operations typically include being passed as an argument, returned from a function, and assigned to a variable. In computer science, a...

Internship Redefined: Why it was needed and how we did it?

  Many a times, the term internship is ill-defined to both the company and the candidate. Company perceives an intern as a trainee who works without or on less pay, in order to gain some extremely confined work experience. On the other hand, candidate takes an internship to be an opportunity where he is...

Data Science Engineer – Who, What, & Why?

  Data science is the study of where information comes from, what it represents and how it can be turned into a valuable resource in the creation of business and IT strategies. Mining large amounts of structured and unstructured data to identify patterns can help an organization rein in costs, increase efficiencies, recognize new...

Friend Follower Analysis using Apache Spark GraphX’s PageRank algorithm

GraphX is Apache Spark’s API for graphs and graph-parallel computation. This includes transformation, exploration, and graph computation. Data can be viewed both as graph & collections. This use case discusses friend follower analysis using Apache Spark GraphX’s PageRank operator. PageRank measures the importance of each vertex in a graph, by determining which vertexes have the...

Apache Spark: Where is it going?

  Databricks ran Apache Spark Survey 2016 this summer to identify how organizations are using Apache Spark. The survey results suggest that Spark’s growth continues across various industries, building sophisticated data solutions by people in various functional roles. Databricks 2016 survey results reflect answers from 900 distinct organizations and 1615 respondents, who were predominantly...