Predicting online buying behavior using deep learning algorithm
User interaction with a website can be used to predict buy event. A user’s behavior can be used to predict user intentionality towards different products. This can be used to show ads for retargeting. In e-commerce a lot of data of user is available like searches, page views, time spent per item, basket view, adclick, adview, purchase history etc. This data can be used to model user behavior and make predictions whether a buy event will happen or not.
Popular model which are used for solving this kind of problem are logistic regression(LR), and decision trees. Neural Networks have the advantage over LR because they are able to capture non-linear relationship between the input features and their “deeper” architecture has inherently greater modelling strength.
The advantage of probabilistic generative models inspired by deep neural networks is that they can mimic the process of a consumer’s purchase behavior and capture the latent variables to explain the data.
There are lot of features can be extracted from interaction data of user with website. The duration of product for a user is the time difference between click to that product page and click to another page. Description of a product is used in form of a text converted to 50 dimensional vector using word2vec. Other product information like product price is also used. User specific features like click to buy ratio, no of page views, no of clicks in a session are used.
Classifiers:
Decision Trees:
Extremely easy to visualize and interpret: a decision tree can be represented graphically, allowing the user to actually see the structure of the classifier.
White-box models: by observing a decision tree, one can clearly understand all the intermediate steps of the classification process, such as which variables are used, by what order, etc. This is not true for other methods such as neural networks, whose parameters cannot be directly interpreted.
Random Forest:
The Random Forest (RF) algorithm creates an ensemble of decision trees using randomization. When an input is to be classified, each tree classifies the input individually. The final classification is then decided by choosing the majority vote over all the trees. The likelihood of a certain input belonging to each class is computed by averaging the probabilities at the leaves of each tree.
Deep Learning Classifiers:
Deep learning is a class of machine learning techniques which model higher level abstractions of data through many nonlinear processing that are hierarchical.
Deep Belief Networks:
A DBN is composed of a stack of restricted Boltzmann machines (RBMs). A core component of the DBN is a greedy, layer-by-layer learning algorithm which optimizes DBN weights. Separately, initializing the weights of an MLP with a correspondingly configured DBN often produces much better results than that with the random weights.
DBN has many advantages like:
- Effective use of unlabeled data
- Can be interpreted as probabilistic generative models
- overfitting problem can be alleviated by generative pre training.
Auto-encoders:
Autoencoders are a representation learning technique using unsupervised pertaining to learn good representations of the data transform and reduce the dimensionality of the problem in order to facilitate the supervised learning stage.
Once an autoencoder layer has been trained, a second autoencoder can be trained using the output of the first autoencoder layer. This procedure can be repeated indefinitely and create stacked autoencoder layers of arbitrary depth. It is being shown that each subsequent trained layer learns a better representation of the output of the previous layer.
Take Away
One of the main advantages of DBN or SdA is that we can use all the available data (even if unlabeled) to pre-train the model in an unsupervised, or generative, way.
Neural networks have many hyper-parameters: architectural, such as layer sizes and hidden unit transfer functions; optimization, such as learning rates and momentum values; and regularization, such as the dropout probabilities for each layer. Deep Neural Networks can have a very large number of parameters, in our case, between one and 4 million weights. Bayesian optimization is ideally suited for globally optimizing blackbox, noisy functions.
Deep Neural networks require careful tuning of many hyper-parameters, which is one of the hardest and time consuming tasks in implementing these solutions.
Interested in knowing more on such niche techniques? Check out http://research.busigence.com