Hyperparameter Optimization and Why is it important?

 

A machine learning model consists of various parameters that need to be learned from the data. The crux of Machine learning is fitting a model to the data. This process of training a model with existing data to fit the model parameters, is called model training.

Hyperparameters refer to another kind of parameters that cannot be directly learned from the training process and need to be predefined.

These hyperparameters define higher level concepts about the model such as complexity, capacity to learn, rate of convergence, penalty etc. The optimal hyperparameters lead to better efficiency, faster convergence and better results overall.

In short, hyperparameters are knobs or turns which lead to better statistical learning model.

Hyperparameters are also referred to as meta parameters and free parameters. Hyperparameters optimization refers to the method of finding optimal hyperparameters for a Machine Learning Algorithm. This is important since the performance of any Machine Learning Algorithm depends to a huge extent on what the value of hyperparameters are. Since hyperparameters are set before any Machine learning algorithm is run, hence, it becomes very essential to set an optimal value of hyperparameters as it effects the convergence of any algorithm to a large extent.

Parameters are constant or variable terms in any function that facilitates the specific form of the function. These parameters are a part of the Machine Learning algorithm and its optimal values are found within the algorithm whereas hyperparameters are constants or variables that are set before the algorithm is run.

For example, learning rate, penalty, C in Logistic regression, number of estimators, min samples split, etc. in Random Forest and Decision Trees, etc. are all examples of hyperparameters while values of coefficients of x’s in linear and logistic regression are examples of parameters.
While talking about hyperparameters, we need to select an optimization strategy that will find us the best set of hyperparameters. The next question that arises is how to optimize hyperparameters i.e., how to find those hyperparameters that lead to maximum efficiency of Machine Learning algorithm. There are several ways that are used extensively in today’s world to find best hyperparameters such as Grid Search, Random Search and Bayesian Search.

Grid Search suggest parameter configurations deterministically by making a grid of all possible hyperparameter combinations of a Machine Learning algorithm and tests all combinations on the algorithm and trains all combinations on the required dataset to get the best hyperparameter values of that algorithm.

Random Search randomly takes few hyperparameters combinations out of all possible combinations and tests it on the dataset to get best hyperparameters.

Bayesian Search finds the best hyperparameter values for a required dataset using Gaussian Process and Bayes rule.

While comparing grid and random search, we often see that even though grid search gives best results but often it takes a lot of time, while random search gives one of the best results in lesser time. Hence, most people tend to use Random Search as it gives required results in lesser time. However, an interesting question that arises here is why not use Bayesian Search?

Bayesian search uses Gaussian process and hence, tends to give better results as compared to grid search. Bayesian optimization is an adaptive approach to parameter optimization, trading off between exploring new areas of the parameter space and exploiting historical information to find the parameters that maximise the function faster. But, many a times the Bayesian approach takes a lot of time in the exploration phase and tends to be a bit slower than its other alternatives. However, if used correctly and smartly, Bayesian optimization could be one of the best and most useful optimization techniques of our technocratic era.

Take Away

The interesting point that I find fascinating is how do we use Bayesian optimization and exploit it so that we can get better and faster results than Random Search? Another thing that fascinates me is how can a technique like Random Search, achieve best results when it just randomly picks hyperparameter combinations? Should we rely on Random search? Or should we try to enhance and use the full potential of Bayesian search which is more reliable and accurate? It’s up to us to decide!

Interested in knowing more on such niche techniques? Check out http://research.busigence.com

Subscribe

Related Posts