Correlation and

Types of Correlation

Introduction

• In statistics, correlation analysis quantifies the strength

of the association between two numerical variables.

• In other words Correlation is a statistical measure that

indicates the extent to which two or

more variables fluctuate together.

• A correlation coefficient is a statistical measure of the

degree to which changes to the value of one variable

predict change to the value of another.

www.DuloMix.com 2

Definitions

1) If two variables are so inter-related in such

manner that change in one variable brings

about in the other variable, then this type

of relation of variable known as

correlation.

2) If we change the value of one variable that

will make corresponding change in the

value of other variable on an average then

we can say two variables are correlation.

The value of correlation coefficient will

very from -1 to +1.

www.DuloMix.com 3

A positive correlation indicates the extent to

which variables increase or decrease in parallel.

A negative correlation indicates the extent to

which one variable increases as the other

decreases.

A zero correlation indicates no relation between

variables.

www.DuloMix.com 4

Importance of Correlation:

• Correlation is very important in the field of Psychology and Education as

a measure of relationship between test scores and other measures of

performance.

• With the help of correlation, it is possible to have a correct idea of the

working capacity of a person.

• With the help of it, it is also possible to have a knowledge of the various

qualities of an individual.

•In order to provide educational guidance to a student in selection of his

subjects of study, correlation is also helpful and necessary.

•Also useful to understanding unknown variables and economic behavior

www.DuloMix.com 5

Types of correlation

It is classified as follows on the basis of

Degree of Number of

Linearity

correlation variables

• Positive • Simple • Linear

• Negative • Partial • Non Linear

• No • Multiple

• Perfect

www.DuloMix.com 6

Positive correlation Negative correlation

❑ If one variable increases ❑ If one variable

with its impact other increases under its

variable also increases, impact the other variable

this is called Positive decreases , this is called

correlation. Negative correlation.

EXAMPLE: EXAMPLE:

The more time you spend If a train increases speed,

running on a treadmill, the the length of time to get to

more calories you will burn. the final point decreases.

www.DuloMix.com 7

Perfect correlation Simple correlation

A positive

A simple correlation is

one correlation indicates

one which involves

a perfect correlation that is

only 2 Variables.

positive, which means that

together, both variables

EXAMPLE:

move in the same direction.

Correlation between

demand and supply.

www.DuloMix.com 8

Partial Correlation Multiple Correlation

When 3 or more variables

When 3 or more variables

are part of analysis but only

are studied simultaneously

2 are studied and rest are

kept constant, it is partial it is multiple correlation.

correlation.

EXAMPLE: Correlation EXAMPLE: Rainfall ,

with demand supply and production of rice and

income where income is cost of rice studied

kept constant simultaneously

www.DuloMix.com 9

Linear Correlation Non Linear Correlation

If the changes in amount of one If the changes in amount of one

variable tends to make changes variable tends to make changes

in amount of other variable in amount of other variable but

bearing constant changing ratio not bearing constant changing

it is linear correlation. ratio it is non linear correlation.

EXAMPLE EXAMPLE

INCOME : 350 360 370 380 INCOME : 350 360 370 380

WEIGHT: 30 40 50 60 WEIGHT: 30 46 59 72

www.DuloMix.com 10

Coefficient of Correlation

When two variables are correlated with each other, it is

important to know the amount or extent of correlation between

them.

The numerical measure of correlation or degree of

relationship existing between two variables is called the

coefficient of correlation and

It is denoted by r and it is always lies between 1 and -1.

1. When r = 1, it represents perfect direct or positive correlation

2. When r = -1 it represents perfect inverse or negative

correlation.

3. When r = 0, there is no linear correlation or it shows absence of

correlation.

4. When the value of r is 0.9 or 0.8 etc, it shows high degree of

relationship between the vawrwiwa.DbulloMeixs.coamnd when r is small say 0.2 11

or 0.1 etc, it shows low degree of correlation.

www.DuloMix.com 12

Method of Measuring A Correlation

www.DuloMix.com 13

Scattered Diagram

Scatter diagram method is the simplest

method to ascertain whether the two variables

correlated and if they are correlated what is

the direction of correlation i.e. positive or

negative.

In Scatter diagram one variable is taken along

the axis of x and the other along the axis of y.

We then, plot these points on the graph paper

(on xy plane) and thus get the scattered

points. It is called scatter diagram.

The way in which the points are scattered on

xy plane show the degree and direction of the

correlation between the two variables.

www.DuloMix.com 14

1) Perfect Positive Correlation 2) Highly Positive Correlation

• All the points are in correlation. • If All the points are very near to

• The straight line in upward straight line in upward

direction , the correlation direction, then we say it as a

scatter diagram showing highly positive correlation.

positive correlation is perfect

positive.

www.DuloMix.com 15

3) Positive Correlation 4) Perfect Negative Correlation

• If all the points are near to the • If all the points in a scattered

straight line (but not very near) diagram lies in a straight line in

the correlation is positive. downward direction, then we

say it as a perfect negative

correlation.

www.DuloMix.com 16

5) High Negative : 6) Negative :

• f the points are very close to • If the points are close to

straight line in downward straight line (not very close) in

direction,the correlation is high downward direction, then we

negative say it is negative correlation.

www.DuloMix.com 17

7) Zero correlation :

• If the points are widely

scattered in a graph, the

correlation is said to be zero

www.DuloMix.com 18

Correlation Graph

In this method, we use the individual

values of two variables, which are potted

on the graph sheet and we obtain two

different curves on a graph sheet.

By the examination of properties of

plotted point, we conclude that they will

be correlated or not.

www.DuloMix.com 19

Example: Draw the diagram and examine the correlation between

variables X and Y. Data are given in the following table :

Year 1990 1995 2000 2005 2010

X 5 7 6 6 8

Y 1 4 5 4 7

Solution : first we draw the graph between variables.

www.DuloMix.com 20

Merits and Demerits of Graphical Method:

Merits :

a) It is popular method of measuring the relationship

between two variables.

b) It is very easiest method, without involving any

mathematical calculation.

c) Every one can easily understood and examine it.

Demerits :

a) We can not obtain the degree of correlation.

b) Graphical method is suitable only for small

number of data.

www.DuloMix.com 21

Karl Pearson’s Coefficient of Correlation:

Karl Pearson’s Coefficient of Correlation is used to

measure the degree of linear relationship between

two variables.

It is also called moment correlation coefficient.

This is a most widely used mathematical method of

finding the magnitude of linear correlation.

It gives not only the magnitude of the correlation but

also its direction.

www.DuloMix.com 22

Let (x1, y1), (x2,y2)…. (xn,yn,) be n pairs of observations of

two variables X and y.

The coefficient of correlation (r) between X and Y is

defined by

Cov (X, Y)

r =

𝜎 ∙ 𝜎

𝑥 𝑦

where

1

Cov (X. Y) = Covariance between X and Y = Ʃ ( x – ̅x) (Y – ̅Y)

𝑛

𝜎𝑥 = Standard deviation of X

𝜎𝑦= Standard deviation of Y

n = Number of pairs of observations (xi, yi )

The above formula can be written as

Ʃ 𝑋 −𝑋ത (𝑌−𝑌ത)

r =

𝑛𝜎 ∙ 𝜎

𝑥 𝑦

www.DuloMix.com 23

Ʃ 𝑋 −𝑋ത (𝑌−𝑌ത)

r =

𝑛𝜎 ∙ 𝜎

𝑥 𝑦

If we write 𝑋 − 𝑋ത = 𝑥 and 𝑌 − 𝑌ത = 𝑦 , then formula becomes

Ʃ 𝑥𝑦

r =

𝑛𝜎 ∙ 𝜎

𝑥 𝑦

This formula is known as the product moment formula of coefficient of

correlation.

Where dx = x-A & dy = y-B

A & B = assumed means

n= number of pairs of observawtwiow.nDusloMix.com 24

Example : Compute the correlation coefficient between X

and Y using following data:

X 2 4 5 6 8 11

Y 18 12 10 8 7 5

Solution :

X Y XY X2 Y2

2 18 36 4 324

4 12 48 16 144

6 293 − 36 (60)

5 10 50 25 100 =

6 266 − 36 2 6 706 − 60 2

6 8 48 36 64

8 7 56 64 49

r = – 0.9203

11 5 55 121 25

36 60 293 266 706

www.DuloMix.com 25

Example :The following data show the temperature (X) and the pulse rate (Y) of

the 8 patients. Compute the coefficient of correlation between X and Y.

Patient No. 1 2 3 4 5 6 7 8

X 98 97 102 100 99 101 99 101

Y 100 91 63 80 92 70 90 72

Solution : we construct the following table taking A = 95 and B= 80 As assumed

mean

X Y dx = X – 95 dy = Y-80 dxdy dx2 dy2

98 100 98-95 = 3 100-80=20 60 9 400

97 91 97-95= 2 91-80=11 22 4 121

102 63 102-95= 7 63-80=-17 -119 49 289

100 80 100-95= 5 80-80= 0 0 25 0

99 92 99-95= 4 92-80=12 48 16 144

101 70 101-95= 6 70-80=-10 -60 36 100

99 90 99-95= 4 90-80=10 40 16 100

101 72 101-95= 6 72-80 =-8 -48 36 64

total – 18 -57 191 1218

www.DuloMix.com 26

8 −57 −(37)(18)

=

8 191 − 37 2 8(1218− 18 2

−1122

=

12.6095 ×97.0657

= – 0.9168

www.DuloMix.com 27

Example : The following table shows data of diastolic blood pressure and cholesterol

levels od 10 randomly selected man. Find the coefficient of correlation between

diastolic blood pressure and cholesterol level.

Person 1 2 3 4 5 6 7 8 9 10

Diastolic B.P. 80 75 90 74 75 110 70 85 88 78

Cholesterol 307 259 341 317 274 416 267 320 274 336

Solution : Let X denote the diastolic blood pressure and Y denote the cholesterol

level of a man. We take dx= X-85 and dy = y-300 and construct the table.

X Y dx = X -85 dy = Y-300 dxdy dx2 dy2

Ans : 0.8088

www.DuloMix.com 28

Example: During a laboratory experiments muscular contractions of frog muscle

were measured against different doses of a given drug. The height of the curve

was considered as the response to the drug. Calculate the correlation coefficient

for the following data.

Serial No. Dose of the drug Response to the drug

1 0.3 54

2 0.4 59

3 0.6 60

4 0.8 65

5 0.9 70

Solution: Let X= Dose of the frug

Y = Response to the drug

www.DuloMix.com 29

We construct the following table taking dx= 10 (x – 0.6)

dy = Y – 60

X Y dx dy dxdy dx2 dy2

0.3 54 -3 -6 18 9 36

0.4 59 -2 -1 2 4 1

0.6 60 0 0 0 0 0

0.8 65 2 5 10 4 25

0.9 70 3 10 30 9 100

Total 0 8 60 26 162

300

r =

130 746

r = 0.9633

www.DuloMix.com 30

Example: Tablets were weighed and assayed for the drug content. Results are

given below. Find the correlation coefficient between the weight of tablet and

assay.

Weight 200 205 203 201 195 203 198 200 190 205 207 210

Assay 10.0 10.1 10.0 10.1 9.9 10.1 9.9 10.0 9.6 10.2 10.2 10.

3

solution : Let X= weight

Y = assay

The following table is constructed taking dx= X-200 and dy= 10 (Y-10)

X Y dx dy dxdy dx2 dy2

r = 0.9588

www.DuloMix.com 31

Example :Diclofenac sodium sustained release tablets were

analyzed in-vitro and in-vivo. The results are summerised in the

following table: find out both the methods of evaluation are

correlated or not..

Amount of Drug released (%)

Time in minutes In -vitro In-vivo

0 0 0

30 35.45 20.33

60 36.47 33.65

90 44.91 41.82

120 55.20 50.01

150 62.46 59.78

Let X= Amount of drug released in-vitro

Y = Amount of drug released in-vivo

N= 6

www.DuloMix.com 32

Amount of Drug released (%)

Time in In –vitro In-vivo XY

minutes (X) (Y)

0 0 0 0

30 35.45 20.33

60 36.47 33.65

90 44.91 41.82

120 55.20 50.01

150 62.46 59.78

Total 224.89 205.59 10130.62

r = 0.9973

www.DuloMix.com 33

Spearman’s rank correlation coefficient

Product moment correlation coefficient can be evaluated when both

the variables X and Y are quantitative.

But if one variable or both the variables are qualitative , we can not

use the formula of product moment correlation coefficient.

In such a situation, we can assign ranks according to the particular

characteristics under consideration and use Spearman’s rank

correlation coefficient. Spearman’s formula of rank correlations given

by

6 Ʃ 𝑑2

r = 1 –

𝑛 (𝑛2−1)

Where d= difference of ranks R1 and R2 given by two judges

n = number of pairs

The vaule of r lies between -1 and 1.

If r=positive = two judges have same line of thinking.

r = negative = two judges have opposite line of thinking.

www.DuloMix.com 34

Merits

1. It is simpler to understand and easy to

calculate as compared to Karl

Pearson’s method.

2. It is a useful method when the actual

data is not given but only ranks are

given.

3. It is useful for qualitative data such as

beauty, honesty. efficiency etc.

www.DuloMix.com 35

Demerits

1. It cannot be used for grouped

frequency distribution.

2. It is not as accurate as Karl Pearson’s

Coefficient of Correlation.

3. It cannot be used when continuous

series is given.

4. When the no. of items is more than 30

and if the ranks are not known, this

method consumes more time and

therefore can’t conveniently be used.

www.DuloMix.com 36

Example : A leading company engaged in the production of an antibiotic

drug has called 15 persons for interview, to fill up 10 vacancies of salesman.

The interview board consists of the sales manager and a psychologist. The

ranks given by the two to all the 15 candidates who attended the interview,

according to their serial number in the interview list, are given below. Find

the rank correlation coefficient

www.DuloMix.com 37

Sr no Rank given Rank given d 2

i di

by by (X-Y)

Sales Psychologis

manager(X) t (Y)

1 1 2 -1 1

2 3 3 0 0

3 2 1 1 1

4 4 5 -1 1

5 6 4 2 4

6 5 6 -1 1

7 7 8 -1 1

8 9 7 2 4

9 8 9 -1 1

10 11 10 1 1

11 10 12 -2 4

12 12 11 1 1

13 14 13 1 1

14 13 14 -1 1

15 15 15 0 0

www.DuloMix.com 38

6 Ʃ 𝑑2

r = 1 –

𝑛 (𝑛2−1)

6 (22)

= 1 –

15 (225−1)

132

= 1 –

3360

= 1- 0.0393

= 0.961

• The positive value of r indicates that the Sales Manger and the

Psychologist have same line of thinking.

• Also the value of r is very near to 1 which indicates that the

judgements given by both are almost same.

www.DuloMix.com 39

Example: Sixteen Pharmacy-industries of the Gujarat have been

ranked according to the profit in 2007-2008 and the working capital

for the year. Calculate the rank correlation coefficient.

Pharma A B C D E F G H I J K L M N O P

industry

Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

(profit)

Rank 13 16 14 15 10 12 4 11 5 9 8 3 1 6 7 2

(Workin

g

capital)

www.DuloMix.com 40

R1 R2 d d2

1 13 6 Ʃ 𝑑2

r = 1 –

2 16 𝑛 (𝑛2−1)

3 14

r = -0.8176

4 15

5 10

6 12

7 4

8 11

9 5

10 9

11 8

12 3

13 1

14 6

15 7

16 2

Total Ʃd2

www.DuloMix.com 41

Example: The competitors in a beauty contest are ranked by three judges in the

following order:

1st judge 1 5 4 8 9 6 10 7 3 2

2nd judge 4 8 7 6 5 9 10 3 2 1

3rd judge 6 7 8 1 5 10 9 2 3 4

Use rank correlation coefficient to discuss which pair of judges has the nearest

approach to beauty.

www.DuloMix.com 42

Solution : Let R1, R2, and R3 indicate the ranks given by three judges.

R1 R2 R3 d1= d2= d3= d 2

1 d 2

2 d 2

3

R1 – R2 R1 –R3 R2 –R3

1 4 6 -3 -5 -2 9 25 4

5 8 7 -3 -2 1 9 4 1

4 7 8 -3 -4 -1 9 16 1

8 6 1 2 7 5 4 49 25

9 5 5 4 4 0 19 16 0

6 9 10 -3 -4 -1 9 16 1

10 10 9 0 1 1 0 1 1

7 3 2 4 5 1 16 25 1

3 2 3 1 0 -1 1 0 1

2 1 4 1 -2 -3 1 4 9

Total 74 156 44

www.DuloMix.com 43

(1) Rank correlation coefficient between first and second judge is

6 Ʃ 𝑑 2

r12 = 1 – 1

= 0 5

𝑛 (𝑛2 . 5

−1)

(2) Rank correlation coefficient between first and third judge is

6 Ʃ 𝑑 2

r13 = 1 – 2

= 0 0

𝑛 (𝑛2 . 5

−1)

(3) Rank correlation coefficient between second and third judge

6 Ʃ 𝑑 2

r23 = 1 – 3

= 0 7

𝑛 (𝑛2 . 3

−1)

Since r23 has maximum positive value, we conclude that the second and third

judges have the nearest approach in judging beauty.

www.DuloMix.com 44

Multiple Correlation

The coefficient of multiple correlations (R) is a measure of how well a

particular variable can be predicted using a linear function of a set of

other variables.

It is correlation between the variable’s values and the best predictions

that can be determined linearly from the predictive variables.

The coefficient of multiple correlations ranges between 0.00 and 1.00.

A higher value indicates a high predictability of the dependent variable

from the independent variables.

A value 1 indicates that the predictions are exactly correct and a value 0

indicates that no linear combination of the independent variables is a

better predictor than the fixed mean of the dependent variables.

The coefficient of multiple correlations is also known as the square root

of the coefficient of determination under the particular assumptions that

an intercept is included and the best possible linear predictors are used.

The coefficient of determination is defined for more general cases, such

as for non-linear prediction and those in which the predicted values have

not been derived from a model-fitting procedure.

www.DuloMix.com 45

R is a scalar value that is defined as “the PCC between the

predicted and the actual values of the dependent variable in

a linear regression model that includes an intercept” Since

these regressions require two or more predictor variables, it

is called multiple regressions.

The multiple regression equation is presented as:

y = bxs + bx2 + … + x + C..

Where, b’s (i = 1, 2 n) are the regression coefficients, which

represent the value at which the criterion variable changes

when the predictor variable changes.

For example, the hardness of a tablet will be dependent on

various factors like amount of binder, properties of drug and

excipient and the amount of force applied during

compression. Using hardness test one can estimate the

appropriate relationship among these factors.

www.DuloMix.com 46

Properties of Multiple

Correlations

When more than two variables are related to each other, the

value of the coefficient of multiple correlation depends on

the choice of dependent variable as a regression of y on x

and z will have a different R than a regression of z on x and

y.

For example, suppose that in a particular sample the

variable is uncorrelated with both x and y, while x and y are

linearly related to each other. Then in such cases a

regression of z on y and x yields an R = 0, while a

regression of y on x and z will yield a strictly +R. This

follows since the correlation of y with its best predictor

based on and z is in all cases at least as large as the

correlation of y with best predictor based on alone and in

this case with z providing no explanatory power.