In machine learning, there’s something called the “No Free Lunch” theorem, which means no algorithm performs best for every problem. So, you need to figure out which algorithm is best for your problem with the available data set. In today’s blog I will focus on 10 most commonly used machine learning algorithms. As we are going to learn 10 different algorithms in this post, it will be little longer than usual, but have patient I will try to make it as simple as possible. So, let’s get started.
Linear Regression is supervised learning. regression is supervised machine learning algorithm. Linear Regression is a model that assumes a linear relationship between the input variables (x) and the single output variable (y) and can predict the output. The representation of linear regression is an equation that describes a line that best fits the relationship between the input variables (x) and the output variables (y), by finding specific weightings for the input variables called coefficients (B). For example: y = B0 + B1 * x. Example: We will consider the same regression example here(figure below), if we have a data set of house prices with respect to house size, it can predict an unknown house price(q), if given the house size(P).
Some good rules of thumb when using this technique are to remove variables that are very similar (correlated) and to remove noise from your data, if possible. It is a fast and simple technique and good first algorithm to try.
2. Logistic Regression
Logistic regression is like linear regression, but instead of fitting a straight line or hyper plane, the prediction for the output is transformed using a non-linear function called the logistic function or Sigmoid function. The function looks like a big S and transforms any output to 0 to 1 range. For your reference please see the below figure(taken from wiki: https://en.wikipedia.org/wiki/Logistic_regression#/media/File:Exam_pass_logistic_curve.jpeg)
Like linear regression, logistic regression does work better when you remove attributes that are unrelated to the output variable as well as attributes that are very similar (correlated) to each other.
3. Linear discriminate Analysis
It consists of statistical properties of your data, calculated for each class. For a single input variable this includes: The mean value for each class, The variance calculated across all classes. Predictions are made by calculating a discriminate value for each class and making a prediction for the class with the largest value
so it is a good idea to remove outliers from your data beforehand. It’s a simple and powerful method for classification predictive modelling problems.
4. Classification and Regression Trees or Decision Trees
Decision trees are important type of algorithm for predicting models. Each node represents a single input variable(x) and a split point on that variable. The leaf node of the tree contains an output (y) and the prediction for the model. Predictions are made by walking the splits of the tree until arriving at a leaf node and output the class value at that leaf node.
5. Naïve Bais Algorithm
The definition of Bayes theorem is- P(A|B)=P(B|A)P(A)/P(B), where A,B are events and P(A|B)- is a conditional probability: the likelihood of event A occurring given that B is true. P(A) and P(B) are the probabilities of observing A and B independently of each other; this is known as the marginal probability.
Naive Bayes is called ‘naïve’ because it assumes that each input variable is independent. This is a strong assumption and unrealistic for real data, nevertheless, the technique is very effective on a large range of complex problems.
The model is consist of to two types of probabilities that can be calculated directly from the training data. They are – A. probability of each class, B. Conditional probability of each class given each x value. Once calculated, the probability model can be used to make predictions for new data using Bayes Theorem.
6. K-NN(K Nearest Neighbor) algorithm
K nearest neighbors algorithm is a simple procedure to store all available cases and classifies new cases based on a similarity measure. It is a simple, easy-to-implement supervised machine learning algorithm which can be used for both classification and regression algorithms. Predictions are made for a new data point after searching through the entire training set for the K most similar neighbors and by summarizing the output variable for those K instances. The idea of distance or closeness with neighbors can be break down in very high dimensions (lots of input variables) and that also can negatively affect the performance of the algorithm. This is called the curse of dimensionality. Which means you only use those input variables that are most relevant to predicting the output variable.
As this algorithm is frequently used and easy to implement, I will try to explain it with the following diagrams and data set. Suppose, we have a data set with two groups, group A(blue) and group B(yellow) as shown in the figure below and we want to classify the unknown point p1(red). Do to so, the algorithm will try and find 4 nearest distanced neighbour(as k=4) for the point p1 and label them accordingly.
7. Learning Vector Quantization(LVQ)
A downside of K-Nearest Neighbors is that you need to hang on to your entire training dataset. The Learning Vector Quantization algorithm (or LVQ for short) is an artificial neural network algorithm that allows you to choose how many training instances to hang onto and learns exactly what those instances should look like.
The representation for LVQ is a collection of codebook vectors. These are selected randomly in the beginning and adapted to best summarize the training dataset over a number of iterations of the learning algorithm. After learned, the codebook vectors can be used to make predictions just like K-Nearest Neighbors. The most similar neighbor (best matching codebook vector) is found by calculating the distance between each codebook vector and the new data instance. The class value or (real value in the case of regression) for the best matching unit is then returned as the prediction.
8. Support Vector Machine(SVM)
Support Vector Machine” (SVM) is a supervised machine learning algorithm. This is another algorithm which can be used for both classification and regression problems. However, it is mostly used in classification problems. In this algorithm, we plot each data item as a point in n-dimensional space (where n is number of features we have) with the value of each feature being the value of a particular coordinate. Then, we perform classification by finding the hyper-plane that differentiate the two classes very. In SVM, the hyperplane is selected input points that to best separates the input variables into the points in the input variable space by their class, either class 0 or class 1. In two-dimensions, you can visualize this as a line and let’s assume that all of our input points can be completely separated by this line. The SVM learning algorithm finds the coefficients that results in the best separation of the classes by the hyperplane.
The best or optimal hyperplane that can separate the two classes is the line that has the largest margin. Only these points are relevant in defining the hyperplane and in the construction of the classifier. These points are called the support vectors. They support or define the hyperplane. In practice, an optimization algorithm is used to find the values for the coefficients that maximizes the margin.
9.Bagging and random forest
The bootstrap is a powerful statistical method for estimating a quantity from a data sample. Such as a mean. You take lots of samples of your data, calculate the mean and then average all of your mean values to give you a better estimation of the true mean value.
In bagging, the same approach is used, but instead for estimating entire statistical models, most commonly decision trees. Multiple samples of your training data are taken then models are constructed for each data sample. When you need to make a prediction for new data, each model makes a prediction and the predictions are averaged to give a better estimate of the true output value.
Random forest is a tweak on this approach where decision trees are created so that rather than selecting optimal split points, sub-optimal splits are made by introducing randomness.
Boosting is an ensemble technique that attempts to create a strong classifier from a number of weak classifiers. This is done by building a model from the training data, then creating a second model that attempts to correct the errors from the first model. Models are added until the training set is predicted perfectly or a maximum number of models are added.
AdaBoost is used with short decision trees. After the first tree is created, the performance of the tree on each training instance is used to weight how much attention the next tree that is created should pay attention to each training instance. Training data that is hard to predict is given more weight, whereas easy to predict instances are given less weight. Models are created sequentially one after the other, each updating the weights on the training instances that affect the learning performed by the next tree in the sequence. After all the trees are built, predictions are made for new data, and the performance of each tree is weighted by how accurate it was on training data.
Congratulation guys, now you know the 10 most commonly used machine learning algorithms. Next post I am planning to write some commonly asked interview questions on machine learning algorithms. So stay tuned, will share the next link soon. And don’t forget to comment below for any suggestion and feedback. Till then bye, see you soon.