Classification of Algorithms in Machine Learning
Classification is a type of supervised learning task where the goal is to predict the category or class of an input data point. It involves learning a mapping from input variables to discrete output variables, typically represented as class labels. Classification of Algorithms in Machine Learning: Explore types like supervised, and unsupervised learning to understand how algorithms solve problems.
Classification Evaluation Models
In machine learning, evaluating the performance of a classification model is crucial to understanding how well it generalizes to unseen data and how effectively it predicts class labels. There are several evaluation metrics and techniques commonly used for assessing classification models:
- Accuracy: Accuracy measures the proportion of correctly classified instances out of the total instances. It’s calculated as the ratio of the number of correct predictions to the total number of predictions.
Accuracy = Number of correct predictions / Total Number of Prediction
or
Accuracy = (True Positive + True Negative)/ (Total Positive + False Negative + False Positive + True Negative)
- Precision: Precision measures the proportion of true positive predictions among all positive predictions. It’s the ratio of correctly predicted positive instances to the total predicted positive instances.
Precision = True Positive / (True Positive + False Positive )
- Recall (Sensitivity): Recall measures the proportion of true positive predictions among all actual positive instances. It’s the ratio of correctly predicted positive instances to the total actual positive instances.
Recall = True Positive / (True Positive + False Negative )
- F1 Score: The F1 score is the harmonic mean of precision and recall. It provides a balanced measure of a model’s performance by considering both precision and recall.
Score = 2 * [(Precesion * Recall)/(Precesion + Recall)]
- Confusion Matrix: A confusion matrix is a table that summarizes the performance of a classification model. It shows the counts of true positive, false positive, true negative, and false negative predictions.
- Receiver Operating Characteristic (ROC) Curve: ROC curves plot the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings. It helps visualize the trade-off between sensitivity and specificity.
- Area Under the ROC Curve (AUC-ROC): AUC-ROC measures the area under the ROC curve and provides an aggregated measure of the model’s performance across all possible threshold settings. A higher AUC-ROC value indicates better discrimination ability of the model.
- Precision-Recall Curve: Similar to ROC curves, precision-recall curves plot precision against recall at various threshold settings. They are particularly useful for imbalanced datasets.
- Cross-Validation: Cross-validation is a resampling technique used to assess how well a model generalizes to unseen data. Common methods include k-fold cross-validation and stratified k-fold cross-validation.
- Classification Report: A classification report provides a summary of evaluation metrics (such as precision, recall, and F1 score) for each class in the dataset.
from sklearn.metrics import classification_report
# Example true labels and predicted labels
actual_value = [1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0]
predicted_labels = [1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1,0,1]
# Generate classification report
report = classification_report(actual_value, predicted_labels)
# Print the classification report
print(“Classification Report:\n”, report)
Algorithms KNN:- K-Nearest Neighbors (KNN)
- K-Nearest Neighbors (KNN) is a simple and intuitive algorithm used for both classification and regression tasks in machine learning.
- It’s a type of instance-based or lazy learning algorithm because it doesn’t explicitly learn a model during training. Instead, it memorizes the training instances and makes predictions for new instances based on their similarity to existing instances.
- Non-parametric algorithms are a type of machine learning algorithm that doesn’t make assumptions about the data’s underlying distribution.
- It is one of the simplest and most widely used classifications. algorithms in which new data points, give competitive results.
1. Euclidean distance:-
The Euclidean distance is one of the most commonly used distance metrics to measure the similarity between data points. It calculates the straight-line distance between two points in Euclidean space
import numpy as np
# Example points in 2D space
data1 = np.array([4, 9])
data2 = np.array([5, 7])
# Calculate Euclidean distance
euclidean_distance = np.linalg.norm(data1 – data2)
print(“Euclidean Distance:”, euclidean_distance)
.Euclidean Distance: 2.23606797749979
Manhattan distance
- Manhattan distance, also known as city block distance or L1 distance, is another distance metric commonly used in the K-Nearest Neighbors (KNN) algorithm.
- Unlike the Euclidean distance, which calculates the straight-line distance between two points, the Manhattan distance measures the distance between two points as the sum of the absolute differences of their coordinates.
- It represents the distance traveled along axes at right angles.
import numpy as np
# Example points in 2D space
p1 = np.array([4, 9])
p2 = np.array([5, 7])
# Calculate Manhattan distance
manhattan_distance = np.sum(np.abs(p1 – p2))
print(“Manhattan Distance:”, manhattan_distance)
Minkowski distance
The Minkowski distance is a generalized distance metric that includes both the Manhattan distance and the Euclidean distance as special cases.
import numpy as np
# Example points in 2D space
p1 = np.array([4, 9])
p2 = np.array([5, 7])
# Define the value of p for Minkowski distance
p = 1 # For Euclidean distance, set p = 2; For Manhattan distance, set p = 1
# Calculate Minkowski distance
minkowski_distance = np.linalg.norm(p1 – p2, ord=p)
print(“Minkowski Distance:”, minkowski_distance)
For Free, Demo classes Call: 7507414653
Registration Link: Click Here!
KNN – Classification Algorithm
# Import lab’s
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings(“ignore”)
# Load CSV Files
df = pd.read_csv(“C:/Users/Administrator/Desktop/SevenMentor All Data/Data Sci/Machine Learning/Iphonerecords.csv”)
df.head()
df.isnull().sum()
df[‘Purchase Iphone’].value_counts()
257/400
143/400
X = df.drop(‘Purchase Iphone’, axis=1)
y = df[‘Purchase Iphone’]
from module import catconsep
cat, con = catconsep(X)
cat
con
from module import preprocessing
Xnew = preprocessing(X)
Xnew
from sklearn.model_selection import train_test_split
xtrain,xtest,ytrain,ytest = train_test_split(Xnew, y, test_size=0.2, random_state=21)
xtrain.shape
ytrain.shape
from sklearn.neighbors import KNeighborsClassifier
knn_cls = KNeighborsClassifier(n_neighbors =5)
knn_cls.fit(xtrain,ytrain)
knn_cls.score(xtest,ytest)
y_pred = knn_cls.predict(xtest)
y_pred
#Show Classification report
from sklearn.metrics import confusion_matrix, classification_report
confusion_matrix(ytest, y_pred)
print(classification_report(ytest, y_pred))
total = []
for i in range(2,111):
knn_cls = KNeighborsClassifier(n_neighbors =i)
knn_cls.fit(xtrain,ytrain)
score = knn_cls.score(xtest,ytest)
total.append(score)
total
plt.figure(figsize=(16,10))
plt.plot(range(2,111), total)
plt.xticks(range(2,111))
plt.show()
err = []
for i in range(2,111):
knn_cls = KNeighborsClassifier(n_neighbors =i)
knn_cls.fit(xtrain,ytrain)
y_pred = knn_cls.predict(xtest)
err.append(np.mean(ytest != y_pred))
err
plt.plot(range(2,111),err)
KNN – Regression Algorithm
# Import lab’s
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings(“ignore”)
# Load CSV Files
df = pd.read_csv(“C:/Users/Administrator/Desktop/SevenMentor All Data/Data Sci/Machine Learning/Bengaluru_House_Data.csv”)
df.head()
df.columns
df.describe()
df.info()
df.isna().sum()
from module import replacer
replacer(df)
df.isna().sum()
X = df.drop(‘price’, axis=1)
y = df[[‘price’]]
X
y
from module import preprocessing
Xnew = preprocessing(X)
Xnew
from sklearn.model_selection import train_test_split
xtrain,xtest,ytrain,ytest = train_test_split(Xnew, y, test_size=0.3, random_state=21)
xtrain.shape
ytrain.shape
from sklearn.neighbors import KNeighborsRegressor
knn_rs = KNeighborsRegressor(n_neighbors=6)
knn_rs.fit(xtrain,ytrain)
knn_rs.score(xtest,ytest)
knn_rs.score(xtrain,ytrain)
total = []
for i in range(2,20):
knn_cls = KNeighborsRegressor(n_neighbors =i)
knn_cls.fit(xtrain,ytrain)
score = knn_cls.score(xtest,ytest)
total.append(score)
total
plt.figure(figsize=(16,10))
plt.plot(range(2,20), total)
plt.xticks(range(2,20))
plt.show()
from sklearn.neighbors import KNeighborsRegressor
knn_rs = KNeighborsRegressor(n_neighbors=10)
knn_rs.fit(xtrain,ytrain)
knn_rs.score(xtest,ytest)
y_pred = knn_rs.predict(xtest)
y_pred
from sklearn.metrics import mean_absolute_error, mean_squared_error,r2_score
r2_score(ytest,y_pred)
mean_absolute_error(ytest,y_pred)
np.sqrt(mean_squared_error(ytest,y_pred))
Visit our channel to learn more: Click Here
Author:
Dipak Ghule
Call the Trainer and Book your free demo Class For Machine Learning Call now!!!
| SevenMentor Pvt Ltd.
© Copyright 2021 | SevenMentor Pvt Ltd.