Handling Different Situations in Machine Learning –
While solving machine learning problems we encounter different problem and as we know, “the better you are at problem solving, the better Data Scientist you are”.
My aim is to make readers love Machine Learning before diving deep into it. It’s very easy to use difficult words, unnecessary codes and make you feel overwhelmed but that wont help you.
Machine Learning is a beautiful subject, it’s a world itself. So before loving it, before understanding what this subject wants from us, before understanding the intuitions behind particular topics, you won’t be able to solve problems of the real world.
This subject is directly connected to everything you use these days. Your phone, Your car, Your Favourite apps like Instagram, Facebook, Twitter etc. There are problems all around and Machine Learning is solving them.
So while solving problems there are particular situations where we can get stuck, this article will give you some intuition about those situations and will guide you to deal accordingly.
For Free, Demo classes Call: 8983120543
Registration Link:Click Here!
1. IMBALANCED DATASET VS BALANCED DATASET –
I think all of us are aware of balanced and unbalanced dataset, it simply means when we have almost equal amount of data in each set, all sets will be called BALANCED and when we have unequal sets, it is called IMBALANCED.
Let’s say you are solving a binary classification problem in which you have two classes, positive and negative. To be more precise let’s take an example of amazon food review model, lets say we are making a model which predicts the nature of review whether it is positive or negative. Positive is let’s say “1” and negative is “0”.
Now lets say while training data we found that number of positive reviews (n1) is “500” and number of negative reviews(n2) is “460”. Now this is the case of balanced data because n1 and n2 are almost equal.
Let’s take an another example, if n1 is 500 and n2 is 80, now this is the case of imbalanced data and there will be a problem in the model because if you think logically the model will be deviated to positive reviews and it wont predict accurate results.
For Free, Demo classes Call: 8983120543
Registration Link:Click Here!
WHAT TO DO THEN?
There are two ways to handle it, it totally depends on the particular problem which to choose.
1. UNDERSAMPLING –
It is a very simple approach, lets understand with our above example, lets say
n1= 500 and n2= 80, if we take n2 as it is and take 80 random samples from n1 , data will be balanced, now both n1 and n2 are “80”.
DISADVANTAGE –
There is huge loss of information, as we can see in the above example, 420 samples were wasted.
2. OVERSAMPLING –
Let’s say n1 is 500 and n2 is 100, if we repeat every point of n2 5 times, they will be 500 points, its again a simple technique, just by placing more points from minority class on dataset.
So simply by repetition we can handle the problem of imbalance data.
So these are very simple and logical way to handle imbalanced data.
Now the second situation in which a data scientist can get stuck is
“MULTI-CLASS CLASSIFICATION”
For Free, Demo classes Call: 8983120543
Registration Link:Click Here!
3.MULTI-CLASS CLASSIFICATION –
A binary classifier is that in which we have only two classes, we denote it by “1” or “0”. For example- if on Amazon food review is positive, class will be 1 and if review is negative, class will be 0.
So let’s say in the dataset D, we have data points{xi, yi) where i varies from 1 to n. if y belong to {0,1} then it is a binary classification problem.
if y belongs to {0,1,2,3,4,5…}, then it is a multi-class classification problem.
So to understand binary classification problem, lets think like this, suppose we have a black box which has been trained by our training data, lets say there is function F(X) in that black box which takes a query point Xq(for example- “a review by the new customer on the amazon website) and predicts whether it is “1” (positive) or “0” (negative).
Now in multi-class classification for a query point Xq, F(x) can predict “1”, “2”, “3” and so on…
Lets assume that data is linearly separable and we know that for binary classification we have to find a hyperplane to classify two classes but in multi-class classification, there are more than 2 classes so One hyperplane won’t solve the problem, we will need multiple hyperplanes.
So one very common and widely used technique in industry is “ONE VS ALL”, Let me give you the intuition behind “One VS All”.
Lets say we have 3 classes, class-1, class-2 and class-3(please refer the above image), Now one hyperplane can not separate these 3 classes together, so what we can do is we will make 3 binary classifiers.
First Classifier → Class 1 vs {Class2 and Class 3}
Second Classifier → Class 2 vs {Class1 and Class 3}
First Classifier → Class 3 vs {Class1 and Class 2}
When all of our binary classifiers are ready, just take a majority vote about the decision.
For Free, Demo classes Call: 8983120543
Registration Link:Click Here!
4.OVERFITTING AND UNDERFITTING –
Overfitting and Under-fitting are very common problems in Machine Learning, Lets understand Overfitting and Under-fitting.
Now let’s say we have a model which is giving us a Train Accuracy of 99% and Test Accuracy of 70%, it means your model is OverFitting on your data or in other words your model is trying to remember the data rather than understanding it and that is the reason at training it is giving us good accuracy but the moment we come at testing, it is giving us bad results.
Now let’s assume a case when Train Accuracy of the model is 50% and Test Accuracy is 48%, this means model is Under-fit or in other words, model is not able to understand the relationships in data at all.
Now to handle the problem of overfitting we have some algorithms like
- LOF(Local Outlier Factor)
- IQR(Inter Quartile Range)
To handle the problem of under-fitting, we just need to give more data to the model. I hope you have understood the basic situations, Thanks for Reading.
Author:-
Nishesh Gogia
© Copyright 2021 | Sevenmentor Pvt Ltd.
Call the Trainer and Book your free demo Class Call now!!!
| SevenMentor Pvt Ltd.