Correlation and
Types of Correlation
Introduction
• In statistics, correlation analysis quantifies the strength
of the association between two numerical variables.
• In other words Correlation is a statistical measure that
indicates the extent to which two or
more variables fluctuate together.
• A correlation coefficient is a statistical measure of the
degree to which changes to the value of one variable
predict change to the value of another.
www.DuloMix.com 2
Definitions
1) If two variables are so inter-related in such
manner that change in one variable brings
about in the other variable, then this type
of relation of variable known as
correlation.
2) If we change the value of one variable that
will make corresponding change in the
value of other variable on an average then
we can say two variables are correlation.
The value of correlation coefficient will
very from -1 to +1.
www.DuloMix.com 3
A positive correlation indicates the extent to
which variables increase or decrease in parallel.
A negative correlation indicates the extent to
which one variable increases as the other
decreases.
A zero correlation indicates no relation between
variables.
www.DuloMix.com 4
Importance of Correlation:
• Correlation is very important in the field of Psychology and Education as
a measure of relationship between test scores and other measures of
performance.
• With the help of correlation, it is possible to have a correct idea of the
working capacity of a person.
• With the help of it, it is also possible to have a knowledge of the various
qualities of an individual.
•In order to provide educational guidance to a student in selection of his
subjects of study, correlation is also helpful and necessary.
•Also useful to understanding unknown variables and economic behavior
www.DuloMix.com 5
Types of correlation
It is classified as follows on the basis of
Degree of Number of
Linearity
correlation variables
• Positive • Simple • Linear
• Negative • Partial • Non Linear
• No • Multiple
• Perfect
www.DuloMix.com 6
Positive correlation Negative correlation
❑ If one variable increases ❑ If one variable
with its impact other increases under its
variable also increases, impact the other variable
this is called Positive decreases , this is called
correlation. Negative correlation.
EXAMPLE: EXAMPLE:
The more time you spend If a train increases speed,
running on a treadmill, the the length of time to get to
more calories you will burn. the final point decreases.
www.DuloMix.com 7
Perfect correlation Simple correlation
A positive
A simple correlation is
one correlation indicates
one which involves
a perfect correlation that is
only 2 Variables.
positive, which means that
together, both variables
EXAMPLE:
move in the same direction.
Correlation between
demand and supply.
www.DuloMix.com 8
Partial Correlation Multiple Correlation
When 3 or more variables
When 3 or more variables
are part of analysis but only
are studied simultaneously
2 are studied and rest are
kept constant, it is partial it is multiple correlation.
correlation.
EXAMPLE: Correlation EXAMPLE: Rainfall ,
with demand supply and production of rice and
income where income is cost of rice studied
kept constant simultaneously
www.DuloMix.com 9
Linear Correlation Non Linear Correlation
If the changes in amount of one If the changes in amount of one
variable tends to make changes variable tends to make changes
in amount of other variable in amount of other variable but
bearing constant changing ratio not bearing constant changing
it is linear correlation. ratio it is non linear correlation.
EXAMPLE EXAMPLE
INCOME : 350 360 370 380 INCOME : 350 360 370 380
WEIGHT: 30 40 50 60 WEIGHT: 30 46 59 72
www.DuloMix.com 10
Coefficient of Correlation
When two variables are correlated with each other, it is
important to know the amount or extent of correlation between
them.
The numerical measure of correlation or degree of
relationship existing between two variables is called the
coefficient of correlation and
It is denoted by r and it is always lies between 1 and -1.
1. When r = 1, it represents perfect direct or positive correlation
2. When r = -1 it represents perfect inverse or negative
correlation.
3. When r = 0, there is no linear correlation or it shows absence of
correlation.
4. When the value of r is 0.9 or 0.8 etc, it shows high degree of
relationship between the vawrwiwa.DbulloMeixs.coamnd when r is small say 0.2 11
or 0.1 etc, it shows low degree of correlation.
www.DuloMix.com 12
Method of Measuring A Correlation
www.DuloMix.com 13
Scattered Diagram
Scatter diagram method is the simplest
method to ascertain whether the two variables
correlated and if they are correlated what is
the direction of correlation i.e. positive or
negative.
In Scatter diagram one variable is taken along
the axis of x and the other along the axis of y.
We then, plot these points on the graph paper
(on xy plane) and thus get the scattered
points. It is called scatter diagram.
The way in which the points are scattered on
xy plane show the degree and direction of the
correlation between the two variables.
www.DuloMix.com 14
1) Perfect Positive Correlation 2) Highly Positive Correlation
• All the points are in correlation. • If All the points are very near to
• The straight line in upward straight line in upward
direction , the correlation direction, then we say it as a
scatter diagram showing highly positive correlation.
positive correlation is perfect
positive.
www.DuloMix.com 15
3) Positive Correlation 4) Perfect Negative Correlation
• If all the points are near to the • If all the points in a scattered
straight line (but not very near) diagram lies in a straight line in
the correlation is positive. downward direction, then we
say it as a perfect negative
correlation.
www.DuloMix.com 16
5) High Negative : 6) Negative :
• f the points are very close to • If the points are close to
straight line in downward straight line (not very close) in
direction,the correlation is high downward direction, then we
negative say it is negative correlation.
www.DuloMix.com 17
7) Zero correlation :
• If the points are widely
scattered in a graph, the
correlation is said to be zero
www.DuloMix.com 18
Correlation Graph
In this method, we use the individual
values of two variables, which are potted
on the graph sheet and we obtain two
different curves on a graph sheet.
By the examination of properties of
plotted point, we conclude that they will
be correlated or not.
www.DuloMix.com 19
Example: Draw the diagram and examine the correlation between
variables X and Y. Data are given in the following table :
Year 1990 1995 2000 2005 2010
X 5 7 6 6 8
Y 1 4 5 4 7
Solution : first we draw the graph between variables.
www.DuloMix.com 20
Merits and Demerits of Graphical Method:
Merits :
a) It is popular method of measuring the relationship
between two variables.
b) It is very easiest method, without involving any
mathematical calculation.
c) Every one can easily understood and examine it.
Demerits :
a) We can not obtain the degree of correlation.
b) Graphical method is suitable only for small
number of data.
www.DuloMix.com 21
Karl Pearson’s Coefficient of Correlation:
Karl Pearson’s Coefficient of Correlation is used to
measure the degree of linear relationship between
two variables.
It is also called moment correlation coefficient.
This is a most widely used mathematical method of
finding the magnitude of linear correlation.
It gives not only the magnitude of the correlation but
also its direction.
www.DuloMix.com 22
Let (x1, y1), (x2,y2)…. (xn,yn,) be n pairs of observations of
two variables X and y.
The coefficient of correlation (r) between X and Y is
defined by
Cov (X, Y)
r =
𝜎 ∙ 𝜎
𝑥 𝑦
where
1
Cov (X. Y) = Covariance between X and Y = Ʃ ( x – ̅x) (Y – ̅Y)
𝑛
𝜎𝑥 = Standard deviation of X
𝜎𝑦= Standard deviation of Y
n = Number of pairs of observations (xi, yi )
The above formula can be written as
Ʃ 𝑋 −𝑋ത (𝑌−𝑌ത)
r =
𝑛𝜎 ∙ 𝜎
𝑥 𝑦
www.DuloMix.com 23
Ʃ 𝑋 −𝑋ത (𝑌−𝑌ത)
r =
𝑛𝜎 ∙ 𝜎
𝑥 𝑦
If we write 𝑋 − 𝑋ത = 𝑥 and 𝑌 − 𝑌ത = 𝑦 , then formula becomes
Ʃ 𝑥𝑦
r =
𝑛𝜎 ∙ 𝜎
𝑥 𝑦
This formula is known as the product moment formula of coefficient of
correlation.
Where dx = x-A & dy = y-B
A & B = assumed means
n= number of pairs of observawtwiow.nDusloMix.com 24
Example : Compute the correlation coefficient between X
and Y using following data:
X 2 4 5 6 8 11
Y 18 12 10 8 7 5
Solution :
X Y XY X2 Y2
2 18 36 4 324
4 12 48 16 144
6 293 − 36 (60)
5 10 50 25 100 =
6 266 − 36 2 6 706 − 60 2
6 8 48 36 64
8 7 56 64 49
r = – 0.9203
11 5 55 121 25
36 60 293 266 706
www.DuloMix.com 25
Example :The following data show the temperature (X) and the pulse rate (Y) of
the 8 patients. Compute the coefficient of correlation between X and Y.
Patient No. 1 2 3 4 5 6 7 8
X 98 97 102 100 99 101 99 101
Y 100 91 63 80 92 70 90 72
Solution : we construct the following table taking A = 95 and B= 80 As assumed
mean
X Y dx = X – 95 dy = Y-80 dxdy dx2 dy2
98 100 98-95 = 3 100-80=20 60 9 400
97 91 97-95= 2 91-80=11 22 4 121
102 63 102-95= 7 63-80=-17 -119 49 289
100 80 100-95= 5 80-80= 0 0 25 0
99 92 99-95= 4 92-80=12 48 16 144
101 70 101-95= 6 70-80=-10 -60 36 100
99 90 99-95= 4 90-80=10 40 16 100
101 72 101-95= 6 72-80 =-8 -48 36 64
total – 18 -57 191 1218
www.DuloMix.com 26
8 −57 −(37)(18)
=
8 191 − 37 2 8(1218− 18 2
−1122
=
12.6095 ×97.0657
= – 0.9168
www.DuloMix.com 27
Example : The following table shows data of diastolic blood pressure and cholesterol
levels od 10 randomly selected man. Find the coefficient of correlation between
diastolic blood pressure and cholesterol level.
Person 1 2 3 4 5 6 7 8 9 10
Diastolic B.P. 80 75 90 74 75 110 70 85 88 78
Cholesterol 307 259 341 317 274 416 267 320 274 336
Solution : Let X denote the diastolic blood pressure and Y denote the cholesterol
level of a man. We take dx= X-85 and dy = y-300 and construct the table.
X Y dx = X -85 dy = Y-300 dxdy dx2 dy2
Ans : 0.8088
www.DuloMix.com 28
Example: During a laboratory experiments muscular contractions of frog muscle
were measured against different doses of a given drug. The height of the curve
was considered as the response to the drug. Calculate the correlation coefficient
for the following data.
Serial No. Dose of the drug Response to the drug
1 0.3 54
2 0.4 59
3 0.6 60
4 0.8 65
5 0.9 70
Solution: Let X= Dose of the frug
Y = Response to the drug
www.DuloMix.com 29
We construct the following table taking dx= 10 (x – 0.6)
dy = Y – 60
X Y dx dy dxdy dx2 dy2
0.3 54 -3 -6 18 9 36
0.4 59 -2 -1 2 4 1
0.6 60 0 0 0 0 0
0.8 65 2 5 10 4 25
0.9 70 3 10 30 9 100
Total 0 8 60 26 162
300
r =
130 746
r = 0.9633
www.DuloMix.com 30
Example: Tablets were weighed and assayed for the drug content. Results are
given below. Find the correlation coefficient between the weight of tablet and
assay.
Weight 200 205 203 201 195 203 198 200 190 205 207 210
Assay 10.0 10.1 10.0 10.1 9.9 10.1 9.9 10.0 9.6 10.2 10.2 10.
3
solution : Let X= weight
Y = assay
The following table is constructed taking dx= X-200 and dy= 10 (Y-10)
X Y dx dy dxdy dx2 dy2
r = 0.9588
www.DuloMix.com 31
Example :Diclofenac sodium sustained release tablets were
analyzed in-vitro and in-vivo. The results are summerised in the
following table: find out both the methods of evaluation are
correlated or not..
Amount of Drug released (%)
Time in minutes In -vitro In-vivo
0 0 0
30 35.45 20.33
60 36.47 33.65
90 44.91 41.82
120 55.20 50.01
150 62.46 59.78
Let X= Amount of drug released in-vitro
Y = Amount of drug released in-vivo
N= 6
www.DuloMix.com 32
Amount of Drug released (%)
Time in In –vitro In-vivo XY
minutes (X) (Y)
0 0 0 0
30 35.45 20.33
60 36.47 33.65
90 44.91 41.82
120 55.20 50.01
150 62.46 59.78
Total 224.89 205.59 10130.62
r = 0.9973
www.DuloMix.com 33
Spearman’s rank correlation coefficient
Product moment correlation coefficient can be evaluated when both
the variables X and Y are quantitative.
But if one variable or both the variables are qualitative , we can not
use the formula of product moment correlation coefficient.
In such a situation, we can assign ranks according to the particular
characteristics under consideration and use Spearman’s rank
correlation coefficient. Spearman’s formula of rank correlations given
by
6 Ʃ 𝑑2
r = 1 –
𝑛 (𝑛2−1)
Where d= difference of ranks R1 and R2 given by two judges
n = number of pairs
The vaule of r lies between -1 and 1.
If r=positive = two judges have same line of thinking.
r = negative = two judges have opposite line of thinking.
www.DuloMix.com 34
Merits
1. It is simpler to understand and easy to
calculate as compared to Karl
Pearson’s method.
2. It is a useful method when the actual
data is not given but only ranks are
given.
3. It is useful for qualitative data such as
beauty, honesty. efficiency etc.
www.DuloMix.com 35
Demerits
1. It cannot be used for grouped
frequency distribution.
2. It is not as accurate as Karl Pearson’s
Coefficient of Correlation.
3. It cannot be used when continuous
series is given.
4. When the no. of items is more than 30
and if the ranks are not known, this
method consumes more time and
therefore can’t conveniently be used.
www.DuloMix.com 36
Example : A leading company engaged in the production of an antibiotic
drug has called 15 persons for interview, to fill up 10 vacancies of salesman.
The interview board consists of the sales manager and a psychologist. The
ranks given by the two to all the 15 candidates who attended the interview,
according to their serial number in the interview list, are given below. Find
the rank correlation coefficient
www.DuloMix.com 37
Sr no Rank given Rank given d 2
i di
by by (X-Y)
Sales Psychologis
manager(X) t (Y)
1 1 2 -1 1
2 3 3 0 0
3 2 1 1 1
4 4 5 -1 1
5 6 4 2 4
6 5 6 -1 1
7 7 8 -1 1
8 9 7 2 4
9 8 9 -1 1
10 11 10 1 1
11 10 12 -2 4
12 12 11 1 1
13 14 13 1 1
14 13 14 -1 1
15 15 15 0 0
www.DuloMix.com 38
6 Ʃ 𝑑2
r = 1 –
𝑛 (𝑛2−1)
6 (22)
= 1 –
15 (225−1)
132
= 1 –
3360
= 1- 0.0393
= 0.961
• The positive value of r indicates that the Sales Manger and the
Psychologist have same line of thinking.
• Also the value of r is very near to 1 which indicates that the
judgements given by both are almost same.
www.DuloMix.com 39
Example: Sixteen Pharmacy-industries of the Gujarat have been
ranked according to the profit in 2007-2008 and the working capital
for the year. Calculate the rank correlation coefficient.
Pharma A B C D E F G H I J K L M N O P
industry
Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
(profit)
Rank 13 16 14 15 10 12 4 11 5 9 8 3 1 6 7 2
(Workin
g
capital)
www.DuloMix.com 40
R1 R2 d d2
1 13 6 Ʃ 𝑑2
r = 1 –
2 16 𝑛 (𝑛2−1)
3 14
r = -0.8176
4 15
5 10
6 12
7 4
8 11
9 5
10 9
11 8
12 3
13 1
14 6
15 7
16 2
Total Ʃd2
www.DuloMix.com 41
Example: The competitors in a beauty contest are ranked by three judges in the
following order:
1st judge 1 5 4 8 9 6 10 7 3 2
2nd judge 4 8 7 6 5 9 10 3 2 1
3rd judge 6 7 8 1 5 10 9 2 3 4
Use rank correlation coefficient to discuss which pair of judges has the nearest
approach to beauty.
www.DuloMix.com 42
Solution : Let R1, R2, and R3 indicate the ranks given by three judges.
R1 R2 R3 d1= d2= d3= d 2
1 d 2
2 d 2
3
R1 – R2 R1 –R3 R2 –R3
1 4 6 -3 -5 -2 9 25 4
5 8 7 -3 -2 1 9 4 1
4 7 8 -3 -4 -1 9 16 1
8 6 1 2 7 5 4 49 25
9 5 5 4 4 0 19 16 0
6 9 10 -3 -4 -1 9 16 1
10 10 9 0 1 1 0 1 1
7 3 2 4 5 1 16 25 1
3 2 3 1 0 -1 1 0 1
2 1 4 1 -2 -3 1 4 9
Total 74 156 44
www.DuloMix.com 43
(1) Rank correlation coefficient between first and second judge is
6 Ʃ 𝑑 2
r12 = 1 – 1
= 0 5
𝑛 (𝑛2 . 5
−1)
(2) Rank correlation coefficient between first and third judge is
6 Ʃ 𝑑 2
r13 = 1 – 2
= 0 0
𝑛 (𝑛2 . 5
−1)
(3) Rank correlation coefficient between second and third judge
6 Ʃ 𝑑 2
r23 = 1 – 3
= 0 7
𝑛 (𝑛2 . 3
−1)
Since r23 has maximum positive value, we conclude that the second and third
judges have the nearest approach in judging beauty.
www.DuloMix.com 44
Multiple Correlation
The coefficient of multiple correlations (R) is a measure of how well a
particular variable can be predicted using a linear function of a set of
other variables.
It is correlation between the variable’s values and the best predictions
that can be determined linearly from the predictive variables.
The coefficient of multiple correlations ranges between 0.00 and 1.00.
A higher value indicates a high predictability of the dependent variable
from the independent variables.
A value 1 indicates that the predictions are exactly correct and a value 0
indicates that no linear combination of the independent variables is a
better predictor than the fixed mean of the dependent variables.
The coefficient of multiple correlations is also known as the square root
of the coefficient of determination under the particular assumptions that
an intercept is included and the best possible linear predictors are used.
The coefficient of determination is defined for more general cases, such
as for non-linear prediction and those in which the predicted values have
not been derived from a model-fitting procedure.
www.DuloMix.com 45
R is a scalar value that is defined as “the PCC between the
predicted and the actual values of the dependent variable in
a linear regression model that includes an intercept” Since
these regressions require two or more predictor variables, it
is called multiple regressions.
The multiple regression equation is presented as:
y = bxs + bx2 + … + x + C..
Where, b’s (i = 1, 2 n) are the regression coefficients, which
represent the value at which the criterion variable changes
when the predictor variable changes.
For example, the hardness of a tablet will be dependent on
various factors like amount of binder, properties of drug and
excipient and the amount of force applied during
compression. Using hardness test one can estimate the
appropriate relationship among these factors.
www.DuloMix.com 46
Properties of Multiple
Correlations
When more than two variables are related to each other, the
value of the coefficient of multiple correlation depends on
the choice of dependent variable as a regression of y on x
and z will have a different R than a regression of z on x and
y.
For example, suppose that in a particular sample the
variable is uncorrelated with both x and y, while x and y are
linearly related to each other. Then in such cases a
regression of z on y and x yields an R = 0, while a
regression of y on x and z will yield a strictly +R. This
follows since the correlation of y with its best predictor
based on and z is in all cases at least as large as the
correlation of y with best predictor based on alone and in
this case with z providing no explanatory power.