Outlier Detection
An outlier is an observation that deviates significantly from other members of the sample in
which it occurs. stating that an outlier observation is the one which appears to be inconsistent
with the remainder of that set of data. Hawkins the Scientist defines an outlier as a distinct
observation that is likes to be generated by a different mechanism. The detection of outliers
requires domain knowledge for mining the normal behaviours or a pattern in data
The problem of finding patterns that are different from the other part of the data is called outlier
detection. These distinct patterns are also termed outliers or anomalies. Outlier detection
means finding the pattern or detecting noise from your data. Noise can appear as attribute noise class
noise or a combination of both. Pre-processing phases represent noise which should be repaired or eliminated from the data
Outliers can be classified into three categories:
For Free, Demo classes Call: 02071171500
Registration Link: Click Here!
Point Outliers An individual data point instance that is distinct from other data points of
the dataset can be considered as a point or global outlier. You will get hands-on experience at SevenMentor with Python Training in Pune This is the simple example
Contextual Outliers A data points that deviate significantly with respect to a particular
context or condition is referred to as a contextual or a conditional outlier. In a contextual
outlier, the data instances are evaluated by considering two groups of attributes: 1)
Contextual attributes define the context or neighbourhood for an instance. For example,
in time series data the notion of time for each instance represents its position
- Collective Outliers A collection or sequence of related data pints or records that
deviates significantly from the entire dataset. In a collective outlier, each data points in
For Free, Demo classes Call: 02071171500
Registration Link: Click Here!
Outlier detection techniques divided into three different groups and it is based on the availability
of the labelled data :
1) Doesn’t having prior knowledge of the data, unsupervised Machine learning methods are used
to determine outliers so our assumption is that normal data represents a most significant portion
of the data .This is distinguishable from other majority of data called noise
2) Having knowledge of labelled data, also check normality as well abnormality is modelled.
This approach refers to supervised Machine learning
3) We have prior knowledge of data and approach we are using semi-supervised learning since
the algorithm is trained by labelled data and now able to detect outliers or abnormalities. This
one called Semi-supervised outlier detection techniques and mostly used compared to supervised
techniques due to an imbalanced number of normal and abnormal labelled data.
For outlier detection discussing about following technique :
First technique is score
Outlier detection techniques means find something different belong to this category provide a
score quantifying the degree to which each data instance is considered as an outlier. This type of
techniques generates a ranked list of outliers, which uses domain experts to analyse the top one
outlier or specify a domain-specific threshold to select the outliers.
For Free, Demo classes Call: 02071171500
Registration Link: Click Here!
Second technique is Labels
Labels: These techniques use a binary label means high or low to each data point and also
indicates whether it is an outlier or not. Example is Fault Detection Fault is an abnormal state
where system that may cause a failure or a malfunction. Fault detection is the identification of an
unacceptable deviation of any one feature of the system and that deviates from expected. The
main aim of a fault detection and diagnosis system is to find an early detection of faults and
diagnoses of their causes to reduce the maintenance cost also excessive damage to other parts of
the system
Author:-
Call now!!! | SevenMentor Pvt Ltd.
© Copyright 2021 | Sevenmentor Pvt Ltd.