What is SVM? Machine Learning Algorithm Explained
There’s so much development, study, and confusion going on around Machine Learning Algorithms that we couldn’t miss out talking about it. Let’s start with what Machine Learning is. To be precise, Machine learning is a subset of artificial intelligence (AI) that provides systems with the ability to automatically learn and improve from experience without being explicitly programmed. Quite basic, right? Well, not really! Machine Learning algorithm like SVM opens a whole new world of possibilities and hence has many explored and unexplored dimensions.
Machine Learning Algorithm: SVM (Support vector machine)
Today we’ll be talking about one such machine learning algorithm – SVM (support vector machine).
To begin to understand this, we must know the areas where SVM is currently in use –
- Face detection
- Classification of images
- Text and hypertext categorization
- Bioinformatics
- Geo and Environmental Science
- Handwriting recognition
Yes, SVM has a role to play in all of that.
Now, that we know the applications, let’s dive right into the technicalities –
- Supervised – Here, we have the labeled/classified data to train the machines.
- Unsupervised – Here, we do not have labeled/classified data to train the machines.
- Reinforced – Here, we train the machines through rewards on the right decisions.
What is SVM?
It is a type of supervised machine learning algorithm. Here, Machine Learning models learn from the past input data and predict the output. Support vector machines are basically supervised learning models used for classification and regression analysis.
For example – Firstly, you train the machine to recognize what apples look like. After that, using that past data, it can always identify apples and give the output.
Get To Know Other Data Science Students
Hastings Reeves
Business Intelligence Analyst at Velocity Global
Mikiko Bazeley
ML Engineer at MailChimp
Leoman Momoh
Senior Data Engineer at Enterprise Products
Why do we need Support Vector Machine?
SVM is a model that can predict unknown data. For example, if we have a pre-labeled data of apples and strawberries, we can easily train our model to identify apples and strawberries. So, whenever we give it new data – an unknown one – it can classify it under strawberries or apples.
That’s SVM in play. It analyses the data and classifies it into one of the two categories based on the labeled data it already has. As per the previous example, it will sort the apples under the apple category and the strawberries under the strawberry category.
But how does the prediction take place?
- Here, we have our Support Vector Machine where we take the labeled sample of data as seen in the first graph.
- Further, we draw a line separating the two categories. This line is called the decision boundary. Herein one side of the decision boundary has apples and the other side has strawberries.
- Now when new data is taken as seen in the third graph, it automatically goes into the group it belongs to – the right or the left side of the decision boundary.
- And depending on which side of the line the unknown sample data goes, we can predict the unknown and classify it under the apple or strawberry category.
Let’s get into the details
The aforementioned is a simple and clear example.
The key here is –
To figure out if a new data point is a strawberry or an apple, we need to split our existing data in the best possible manner. We need to separate the two classes in a way that the decision boundary separates the two classes with maximum space between them. And the line that makes it possible best splits the data.
In the graph shown here, the blue line splits the two classes in the best possible manner. Why? Because (as shown by the green line) it creates the maximum distance between the two classes. The distance between sample points (the points closest to the dotted lines) and the (blue) line should be as far as possible. In technical terms, we can say that the distance between the support vectors and the hyperplane should be as far as possible.
So, now you know exactly what support vectors are. They are the extreme points in data sets. And these extreme points are separated by the maximum distance via the hyperplane. The unknown data sets falling on the left or right side of the hyperplane are classified into their respective categories. The dotted lines shown in the graph hold significance too.
The distance between the blue line and the dotted line on the right side is D+. Here, D+ is the shortest distance to the closest positive point. Whereas the distance between the blue line and dotted line on the left side is D-. Here, D- is the shortest distance to the closest negative point. The sum of d+ and d- then becomes the distance margin. And through the largest distance margin, an optimal hyperplane can be created and thus the classification of data can take place.
Let’s move to a more complicated example.
The previous example was a simple one because the data sets were nicely segregated in the first place. But what if we have a data set that looks like this –
Where one data set occurs with the other (there is red in one data set, and the red occurs again in the second data set with green). There is no clear segregation. Well, in that case, how do we draw a hyperplane?
Here we’ll shift from a 1-dimensional view to the 2-dimensional view of the data.
Wait, how is that possible?
Through kernel function. Kernel function takes the 1-d input and converts it into 2-d output.
Now that the data set is converted into 2-dimensional data, it becomes easy to draw a hyperplane and hence segregate the two classes. That is how a Support Vector Machine works. It comes with a whole gamut of advantages and disadvantages too. Let’s have a look.
Advantages of Support Vector Machine (SVM)
1. Regularization capabilities: SVM possesses the L2 Regularization (Ridge regression) feature. L2 Regularization adds the squared magnitude of coefficient as penalty term to the loss function. It can generalize well which prevents it from over-fitting (modeling error which occurs when a function is closely fit in a limited set of data points).
2. Handles non-linear data efficiently: SVM efficiently handles non-linear data (where data items are not organized sequentially) through Kernel function.
3. Solves both Classification and Regression problems: SVM is used for classification problems while SVR (Support Vector Regression) is used for regression problems.
4. Stability: If there’s a slight change in the data, it does not affect the hyperplane, thereby confirming the stability of the SVM model.
Disadvantages of Support Vector Machine (SVM)
1. Choosing an appropriate Kernel function is difficult: Choosing an appropriate Kernel function (to handle the non-linear data) involves complexity. What happens is – when you use a high dimensional kernel function, you might end up generating too many support vectors and that reduces the training speed.
2. Extensive memory requirement: You obviously need a lot of memory to store all the support vectors in the memory. This number keeps on growing with the training dataset size.
4. Long training time: SVM requires a long training time on large datasets.
What you learned here is only a fraction of the SVM’s potential. Machine Learning algorithm is a fascinating field to dive into. SVM, even more. You can imagine what exploring this field can do to you.
For further reading, learn more about data science here and see what data scientist does.
Since you’re here…
Thinking about a career in data science? Enroll in our Data Science Bootcamp, and we’ll get you hired in 6 months. If you’re just getting started, take a peek at our foundational Data Science Course, and don’t forget to peep our student reviews. The data’s on our side.