Classification of Algorithms in Machine Learning

  • By Dipak Ghule
  • November 25, 2024
  • Machine Learning
Classification of Algorithms in Machine Learning

Classification of Algorithms in Machine Learning

Classification is a type of supervised learning task where the goal is to predict the category or class of an input data point. It involves learning a mapping from input variables to discrete output variables, typically represented as class labels. Classification of Algorithms in Machine Learning: Explore types like supervised, and unsupervised learning to understand how algorithms solve problems.

Classification of Algorithms in Machine Learning

 

Classification Evaluation Models

In machine learning, evaluating the performance of a classification model is crucial to understanding how well it generalizes to unseen data and how effectively it predicts class labels. There are several evaluation metrics and techniques commonly used for assessing classification models:

Classification Evaluation Models

 

  • Accuracy: Accuracy measures the proportion of correctly classified instances out of the total instances. It’s calculated as the ratio of the number of correct predictions to the total number of predictions.

Accuracy = Number of correct predictions / Total Number of Prediction

or

Accuracy = (True Positive + True Negative)/ (Total Positive + False Negative + False Positive + True Negative)

  • Precision: Precision measures the proportion of true positive predictions among all positive predictions. It’s the ratio of correctly predicted positive instances to the total predicted positive instances.

Precision = True Positive / (True Positive + False Positive )

  • Recall (Sensitivity): Recall measures the proportion of true positive predictions among all actual positive instances. It’s the ratio of correctly predicted positive instances to the total actual positive instances.

Recall = True Positive / (True Positive + False Negative )

  • F1 Score: The F1 score is the harmonic mean of precision and recall. It provides a balanced measure of a model’s performance by considering both precision and recall.

Score = 2 * [(Precesion * Recall)/(Precesion + Recall)]

  • Confusion Matrix: A confusion matrix is a table that summarizes the performance of a classification model. It shows the counts of true positive, false positive, true negative, and false negative predictions. 

    Confusion Matrix 

  • Receiver Operating Characteristic (ROC) Curve: ROC curves plot the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings. It helps visualize the trade-off between sensitivity and specificity.
  • Area Under the ROC Curve (AUC-ROC): AUC-ROC measures the area under the ROC curve and provides an aggregated measure of the model’s performance across all possible threshold settings. A higher AUC-ROC value indicates better discrimination ability of the model.
  • Precision-Recall Curve: Similar to ROC curves, precision-recall curves plot precision against recall at various threshold settings. They are particularly useful for imbalanced datasets.
  • Cross-Validation: Cross-validation is a resampling technique used to assess how well a model generalizes to unseen data. Common methods include k-fold cross-validation and stratified k-fold cross-validation.
  • Classification Report: A classification report provides a summary of evaluation metrics (such as precision, recall, and F1 score) for each class in the dataset.
    Classification Report

 

 

from sklearn.metrics import classification_report

# Example true labels and predicted labels

actual_value = [1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0]

predicted_labels = [1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1,0,1]

# Generate classification report

report = classification_report(actual_value, predicted_labels)

# Print the classification report

print(“Classification Report:\n”, report)

 

Algorithms KNN:- K-Nearest Neighbors (KNN)

  • K-Nearest Neighbors (KNN) is a simple and intuitive algorithm used for both classification and regression tasks in machine learning.
  • It’s a type of instance-based or lazy learning algorithm because it doesn’t explicitly learn a model during training. Instead, it memorizes the training instances and makes predictions for new instances based on their similarity to existing instances.
  • Non-parametric algorithms are a type of machine learning algorithm that doesn’t make assumptions about the data’s underlying distribution.
  • It is one of the simplest and most widely used classifications. algorithms in which new data points, give competitive results.

 

1. Euclidean distance:-

The Euclidean distance is one of the most commonly used distance metrics to measure the similarity between data points. It calculates the straight-line distance between two points in Euclidean space

Euclidean distance

Euclidean distance

 

import numpy as np

# Example points in 2D space

data1 = np.array([4, 9])

data2 = np.array([5, 7])

# Calculate Euclidean distance

euclidean_distance = np.linalg.norm(data1 data2)

print(“Euclidean Distance:”, euclidean_distance)

.Euclidean Distance: 2.23606797749979

 

Manhattan distance

  • Manhattan distance, also known as city block distance or L1 distance, is another distance metric commonly used in the K-Nearest Neighbors (KNN) algorithm.
  • Unlike the Euclidean distance, which calculates the straight-line distance between two points, the Manhattan distance measures the distance between two points as the sum of the absolute differences of their coordinates.
  • It represents the distance traveled along axes at right angles.

Manhattan distance

Manhattan distance

 

import numpy as np

# Example points in 2D space

p1 = np.array([4, 9])

p2 = np.array([5, 7])

# Calculate Manhattan distance

manhattan_distance = np.sum(np.abs(p1 p2))

print(“Manhattan Distance:”, manhattan_distance)

 

Minkowski distance

The Minkowski distance is a generalized distance metric that includes both the Manhattan distance and the Euclidean distance as special cases.

Minkowski distance

Minkowski distance

 

import numpy as np

# Example points in 2D space

p1 = np.array([4, 9])

p2 = np.array([5, 7])

# Define the value of p for Minkowski distance

p =# For Euclidean distance, set p = 2; For Manhattan distance, set p = 1

# Calculate Minkowski distance

minkowski_distance = np.linalg.norm(p1 p2, ord=p)

print(“Minkowski Distance:”, minkowski_distance)

 

For Free, Demo classes Call: 7507414653

Registration Link: Click Here!

 

KNN – Classification Algorithm

# Import lab’s

import pandas as pd 

import numpy as np

import seaborn as sns 

import matplotlib.pyplot as plt 

import warnings

warnings.filterwarnings(“ignore”)

# Load CSV Files 

df = pd.read_csv(“C:/Users/Administrator/Desktop/SevenMentor All Data/Data Sci/Machine Learning/Iphonerecords.csv”)

df.head()

df.isnull().sum()

df[‘Purchase Iphone’].value_counts()

257/400

143/400

X = df.drop(‘Purchase Iphone’, axis=1)

y = df[‘Purchase Iphone’]

from module import catconsep

cat, con = catconsep(X)

cat

con

from module import preprocessing

Xnew = preprocessing(X)

Xnew

from sklearn.model_selection import train_test_split

xtrain,xtest,ytrain,ytest = train_test_split(Xnew, y, test_size=0.2, random_state=21)

xtrain.shape

ytrain.shape

from sklearn.neighbors import KNeighborsClassifier

knn_cls = KNeighborsClassifier(n_neighbors =5)

knn_cls.fit(xtrain,ytrain)

knn_cls.score(xtest,ytest)

y_pred = knn_cls.predict(xtest)

y_pred

#Show Classification report

from sklearn.metrics import confusion_matrix, classification_report

confusion_matrix(ytest, y_pred)

print(classification_report(ytest, y_pred))

total = []

for i in range(2,111):

    knn_cls = KNeighborsClassifier(n_neighbors =i)

    knn_cls.fit(xtrain,ytrain)

    score = knn_cls.score(xtest,ytest)

    total.append(score)

total

plt.figure(figsize=(16,10))

plt.plot(range(2,111), total)

plt.xticks(range(2,111))

plt.show()

err = []

for i in range(2,111):

    knn_cls = KNeighborsClassifier(n_neighbors =i)

    knn_cls.fit(xtrain,ytrain)

    y_pred = knn_cls.predict(xtest)

    err.append(np.mean(ytest != y_pred))

err

plt.plot(range(2,111),err)

 

KNN – Regression Algorithm

# Import lab’s

import pandas as pd 

import numpy as np

import seaborn as sns 

import matplotlib.pyplot as plt 

import warnings

warnings.filterwarnings(“ignore”)

# Load CSV Files 

df = pd.read_csv(“C:/Users/Administrator/Desktop/SevenMentor All Data/Data Sci/Machine Learning/Bengaluru_House_Data.csv”)

df.head()

df.columns

df.describe()

df.info()

df.isna().sum()

from module import replacer

replacer(df)

df.isna().sum()

X = df.drop(‘price’, axis=1)

y = df[[‘price’]]

X

y

from module import preprocessing

Xnew = preprocessing(X)

Xnew

from sklearn.model_selection import train_test_split

xtrain,xtest,ytrain,ytest = train_test_split(Xnew, y, test_size=0.3, random_state=21)

xtrain.shape

ytrain.shape

from sklearn.neighbors import KNeighborsRegressor

knn_rs = KNeighborsRegressor(n_neighbors=6)

knn_rs.fit(xtrain,ytrain)

knn_rs.score(xtest,ytest)

knn_rs.score(xtrain,ytrain)

total = []

for i in range(2,20):

    knn_cls = KNeighborsRegressor(n_neighbors =i)

    knn_cls.fit(xtrain,ytrain)

    score = knn_cls.score(xtest,ytest)

    total.append(score)

total

plt.figure(figsize=(16,10))

plt.plot(range(2,20), total)

plt.xticks(range(2,20))

plt.show()

from sklearn.neighbors import KNeighborsRegressor

knn_rs = KNeighborsRegressor(n_neighbors=10)

knn_rs.fit(xtrain,ytrain)

knn_rs.score(xtest,ytest)

y_pred = knn_rs.predict(xtest)

y_pred

from sklearn.metrics import mean_absolute_error, mean_squared_error,r2_score

r2_score(ytest,y_pred)

mean_absolute_error(ytest,y_pred)

np.sqrt(mean_squared_error(ytest,y_pred))

 

Visit our channel to learn more: Click Here

 

Author:

Dipak Ghule

Call the Trainer and Book your free demo Class For Machine Learning Call now!!!
| SevenMentor Pvt Ltd.

© Copyright 2021 | SevenMentor Pvt Ltd.

Submit Comment

Your email address will not be published. Required fields are marked *

*
*