Percentile and Quartile in Statistics

  • By Mahesh Kankrale
  • January 27, 2025
  • Data Science
Percentile and Quartile in Statistics

Percentile and Quartile in Statistics

A percentile is a statistical measure used to describe the position of a particular value within a dataset relative to other values in the same dataset. It indicates the percentage of observations that are equal to or below a given value. Percentiles are commonly used in various fields, including education, healthcare, finance, and more. Understand Percentile and Quartile in Statistics to analyze data distribution effectively. Learn their differences, applications, and importance in data analysis!

 

Percentile Formula:

The formula to calculate the percentile (P) of a value (x) in a dataset is:

P = (n / N) x 100

Here,

n = the number of values in the dataset that are less than x.

N = the total number of values in the dataset.

 

Example:

Let’s say you have a dataset and you want to know the percentile Rank of 10 : 

Dataset {2, 2, 3, 4, 5, 5, 5, 6, 7, 8, 8, 8, 8, 8, 9, 9, 10, 11, 11, 12}

  1. Steps to find the percentile rank: 
  • Arrange the scores: First, order all the test scores in ascending order (from lowest to highest).
  • Identify relevant values: Find your score (10) and the total number of values (20).
  • Count lower values: Determine how many students scored lower than 18 (let’s say there are 16).
  • Apply the formula: Plug the values into the formula:

P = (n / N) x 100 = (16 / 20) x 100 = 80

Therefore, your percentile rank is 80. This means 80% of the class scored lower than 10 and 20% scored higher or the same.

 

  1. Steps to find the 25th  percentile value: 
  • Arrange the scores: First, order all the test scores in ascending order (from lowest to highest).
  • Determine the rank: Determine the rank of the percentile using the formula:

Percentile value = (P / 100) x (N+1)

          =  (25 / 100) x (20+1)

=  5.25

  • Find the value of the return index: Since 5.25 is not an integer, we take the 5th index value from the sorted dataset.

The 5th value in the sorted dataset is 5.

Quartile : 

Quartiles divide a set of ordered data into four equal parts. There are three quartiles, denoted by Q1, Q2, and Q3.

 

  • First Quartile (Q1): The value that separates the lowest 25% of the data from the highest 75%. It’s also known as the 25th percentile.
  • Second Quartile (Q2): This is the median, the middle value in the data set. It separates the lower half of the data from the upper half. Q2 is also sometimes denoted by Q.
  • Third Quartile (Q3): The value that separates the lower 75% of the data from the higher 25%. It’s also known as the 75th percentile.

 

For Free, Demo classes Call: 020-71173143

Registration Link: Click Here!

 

Box Model : 

A box plot, also called a box-and-whisker plot, is a way to visually summarize a set of numerical data. It shows important aspects of the data’s distribution, including its center, spread, and presence of outliers. Here’s a breakdown of the key elements of a box plot:

Box Model : 

Elements of Box Plot : 

A boxplot is a standardized way of displaying the dataset based on the five-number summary: the minimum, the maximum, the sample median, and the first and third quartiles.

 

  • Minimum (Q0 or 0th percentile): the lowest data point in the data set excluding any outliers
  • Maximum (Q4 or 100th percentile): the highest data point in the data set excluding any outliers
  • Median (Q2 or 50th percentile): the middle value in the data set
  • First quartile (Q1 or 25th percentile): also known as the lower quartile qn(0.25), it is the median of the lower half of the dataset.
  • Third quartile (Q3 or 75th percentile): also known as the upper quartile qn(0.75), it is the median of the upper half of the dataset.
  • Interquartile range (IQR): the distance between the upper and lower quartiles.

IQR = Q3 – Q1

 

Example : 

Finding the outlier by using the IQR method

Let’s say you have a dataset. 

Dataset = {1, 2, 2, 2, 3, 3, 4, 5, 5, 5, 6, 6, 6, 6, 7, 8, 8, 9, 27}

  1. Calculate the quartiles: 

The first quartile (Q1) is the median of the lower half of the data when ordered from least to greatest.

Q1 = (P / 100) x (N+1)

Q1 = (25/100) *(20) = 5 (index)

Q1 = 3 (5th index of the dataset)

The third quartile (Q3) is the median of the upper half of the data.

Q3 = (P / 100) x (N+1)

Q3 = (75/100) *(20)= 15 (index)

Q1 = 7 (15th index of the dataset)

  1. Calculate the IQR:

IQR = Q3 –  Q1

IQR = 7 – 3 = 4

  1. Find the lower and upper bounds:
  • The lower bound is calculated as Q1 – (1.5 * IQR).

3 – (1.5 *4)

             Lower bound = -3 

  • The upper bound is calculated as Q3 + (1.5 * IQR).

7 + (1.5 *4)

Upper bound = 13

  1. Identify outliers:

Any data points that fall outside the lower and upper bounds are considered outliers.

Here the value 27 in the data set falls outside of the upper bounds, hence we can say that 27 is the outlier in our dataset.

In conclusion, the IQR method identified 27 as an outlier in the dataset. This suggests that 27 is significantly larger than the rest of the data points.

 

Do watch our Channel to learn more: Click Here

Author:

Mahesh Kankrale

Call the Trainer and Book your free demo Class For Data Science Call now!!!
| SevenMentor Pvt Ltd.

© Copyright 2021 | SevenMentor Pvt Ltd.