## Recommended      ###### Regression :- PDF / PPT

REGRESSION

(Curve fitting by the method of least squares, Fitting the lines y=a + bx and x= a + by,

Multiple regression, Standard error of regression)

www.DuloMix.com

Regression analysis

• Regression analysis is a form of predictive modelling technique which investigates the

relationship between a dependent (target) and independent variable(s) (predictor derived by

Francis Galton

• If two variables are involved, the variable that is the basis of the estimation, is conventionally

called the independent variable and the variable whose value is to be estimated is called the

dependent variable.

• In Simple words regression is technique concerned with predicting some variables by knowing

others.

• This technique is used for forecasting, time series modelling and finding the causal effect

relationship between the variables.

• The dependent variable is variously known as explained variables, predict and, response and

endogenous variables.

• While the independent variable is known as explanatory, regressor and exogenous variable.

www.DuloMix.com

2

WHY WE NEED REGRESSION ANALYSIS?

 Typically, a regression analysis is used for these purposes:

(1) Prediction of the target variable (forecasting).

(2) Modelling the relationships between the dependent variable and the explanatory variable.

(3) Testing of hypotheses.

 Benefits

1. It indicates the strength of impact of multiple independent variables on a dependent variable.

2. It indicates the significant relationships between dependent variable and independent variable.

These benefits help market researchers / data analysts / data scientists to eliminate and evaluate the

best set of variables to be used for building predictive models.

www.DuloMix.com

3

www.DuloMix.com

4

Types of Regression Analysis

 Types of regression analysis:

Regression analysis is generally classified into two kinds: simple and multiple.

REGRESSION

ANALYSIS

Simple Multiple

regression regression

www.DuloMix.com

5

The regression analysis is a statistical method to deal with the formulation of mathematical model depicting

relationship amongst variables which can be used for the purpose of prediction of the values of dependent

variable, given the values of the independent variable.

Example: The relationship between Estriol level and birthweight can be quantified by fitting a regression line

best studied through regression.

DEPENDENT VARIABLE

EXPECTED

VALUE

OBSERVED

VALUE

INDEPENDENT VARIABLE

www.DuloMix.com

6

If,

• x=Estriol level

• y= birthweight

• The line y = α + βx is the regression line, where α is the intercept and β is the slope of the line.

• The relationship y = α + βx is not expected to hold exactly for every woman. For example, not all

women with a given estriol level have babies with identical birthweights. Thus an error term e,

which represents the variance of birthweight among all babies of women with a given estriol level

x, is introduced into the model. Let’s assume e follows a normal distribution. The full linear

regression model then takes the following form: y = α + βx + e

• One interpretation of the regression line is that for a woman with estriol level x, the corresponding

birthweight will be normally distributed with mean α + βx and σ2 variance. If σ2 were 0, then

every point would fall exactly on the regression line, whereas the larger σ2 is, the more scatter

www.DuloMix.com

7

The effect of σ2 on the goodness of fit of a regression line

www.DuloMix.com

8

Interpretation of regression line for different values of β (Slope of the line)

Case 1: β > 0

If β is greater than 0, then as x

increases, the expected value of y will

increase.

www.DuloMix.com

9

Case 2: β < 0

If β is less than 0, then as x

increases, the expected value of y

will decrease.

www.DuloMix.com

10

Case 3: β = 0

If β is equal to 0, then there is no

linear relationship between x and y.

www.DuloMix.com

11

Methods of finding regression lines

1. Scatter diagram method

 In this method we plot each pair of observations on a graph paper and obtain a diagram

called a scatter diagram.

 Then we find a straight line passing through the points of the scatter diagram such that the

error in the estimation of Y are minimized. This is the line of regression of Y on X.

 Similarly, if we find a straight line passing through the points of the scatter diagram such

that the error in the estimation of X are minimized. This is the line of regression of X on

Y.

2. Method of Least squares.

www.DuloMix.com

12

CURVE FITTING BY LEAST SQUARE METHOD

Fitting the lines y = a + bx (Straight Line)

Definition:

• Curve fitting is a method of finding a specific relation

connecting the DEPENDENT and INDEPENDENT

VARIABLES for a given data so as to satisfy the data as

accurately as possible.

• The method of least square is most systematic procedure to fit

a unique curve through given point.

• Using ‘x’ and ‘y’ points we need to find a curve to be fitted in

the given data.

• One is observed points and expected points and their

difference is known as ‘ERROR’.

Types:

1. y = a + bx or x = a + by (Straight Line)

www.DuloMix.com

13

2. y = axb

www.DuloMix.com

14

www.DuloMix.com

15

TO DO:

Find the best value of ‘a’ and ‘b’ so that y = a + bx fits the data of Estriol concentration and

birthweight given in the table and also provide the equation of line.

Estriol mg/24hr 10 9 9 12 14 16 16 14

x

Birthweight g/100 y 25 25 25 27 27 27 24 30

www.DuloMix.com

16

www.DuloMix.com

17

Once these values are obtained and

have been put in the equation Y = a

+ bX, we say that we have fitted the

regression equation of Y on X to the

given data. In a similar fashion, we

can develop the regression equation

of X and Y viz., X = a + bY,

presuming Y as an independent

variable and X as dependent

variable].

www.DuloMix.com

18

Multiple Regression

• Multiple regression is the extension of simple linear regression.

Independent Dependent Simple Linear

Variable Variable regression one

(IV) (DV) to one.

(IV) (IV) Dependent Multiple

Variable regression

(DV) many to one.

(IV) (IV)

www.DuloMix.com

19

• Multiple regression is an extension of simple linear regression.

• Two or more independent variables are used to predict/explain the variance in one

dependent variable.

• Two problems may arise:

1.Overfitting : It is caused by adding too many independent variables; they account for

more variance but add nothing to the model.

2.Multicollinearity : It happens when some/all of the independent variables are

correlated with each other.

Remedy for solving problems:

1. Increasing the sample size is a common first step since when sample size is increased,

standard error decreases (all other things equal).

2. Remove the most intercorrelated variable(s) from analysis. This method is misguided if

the variables were there due to the theory of the model, which they should have been.

3. Take transformation of variables which is the best fit for model.

In multiple regression, each coefficient is interpreted as the estimated change in ‘y’

corresponding to a one unit change in a variable, when all other variables are held

constant. www.DuloMix.com

20

Also Also

denoted as denoted as

“a”

Find the line of regression having two independent variable (Birthweight and Age of child

and also calculate the following:

1. Calculate the predicted average SBP of a baby with birthweight 8 lb. (128 oz.)

measured at 3 days of life.

2. Calculate the predicted average SBP of a baby with birthweight 2 lb. (32 oz.) measured

at 5 days of life.

Birthweight 135 120 100 105 130

(oz.) (X1)

Age (days) 3 4 3 2 4

(X2)

Systolic BP 89 90 83 77 92

(mm-hg)

(Y) www.DuloMix.com

26

www.DuloMix.com

27

The regression equation tells us that

for a new born the average blood

pressure increases by an estimated

0.180 mm-hg per ounce of

birthweight and 4.976 mm-hg per

day of age.

www.DuloMix.com

28

www.DuloMix.com

29

Important Properties of Regression Coefficient

1. The regression coefficient is denoted by b.

2. We express it in the form of an original unit of data.

3. The regression coefficient of y on x is denoted by byx. The regression coefficient of x on y

is denoted by bxy.

4. If one regression coefficient is greater than 1, then the other will be less than 1.

5. They are not independent of the change of scale. There will change if the regression

coefficient if x and y are multiplied by any constant.

6. AM of both regression coefficients is greater than or equal to the coefficient of

correlation.

7. GM between the two regression coefficients is equal to the correlation coefficient.

www.DuloMix.com

8. If bxy is positive, then byx is also positive and vice versa. 30

www.DuloMix.com

31

Standard Error of Regression (S)

• The standard error of the regression or Standard error of estimate is a measure of the

accuracy of predictions.

• Note: The regression line is the line that minimizes the sum of squared deviations of

prediction (also called the sum of squares error).

• All of the observed values of (Y,X1,X2) do not fall on the regression line but they scatter

away from it.

• The degree of scatter of the observed values about the regression is measured by Standard

deviation of regression or the standard deviation, it measures the variations of observation

about the true regression line Y= α + β1X1 + β2X2 is denoted by σY.12

www.DuloMix.com

32

• The standard error of regression (S) represents the average distance that the observed

values fall from the regression line.

• Conveniently, it tells you how wrong the regression model is on average using the units

of the response variable.

• Smaller values are better because it indicates that the observations are closer to the fitted

line.

• Unlike R-squared, one can use the standard error of the regression to assess the precision

of the predictions.

• The standard error of the regression provides the absolute measure of the typical distance

that the data points fall from the regression line. S is in the units of the dependent

variable.

www.DuloMix.com

33

www.DuloMix.com

34

Standard Error of estimate for Simple Regression

www.DuloMix.com

35

www.DuloMix.com

36

Standard Error of estimate for Multiple Regression

##### Related Posts ## Important questions on respiratory system

In these lesson we study about the respiration or respiratory system  Organ use in respiratory system is lungs… ## Biotechnology of biology study of technology in bio

It is the application of scientific and engineering principles to the processing of materials by biological against to… ## Newton’s law of motion 1

In these lesson we study about the Newton’s law  Three types of Newton’s law First law 2nd law …