Simple linear regression from raw data

Let $(x_i, y_i), i=1,2, \cdots , n$ be $n$ pairs of observations.

The simple linear regression model of $Y$ on $X$ is

$$y_i=\beta_0 + \beta_1x_i +e_i$$ where,

  • $y$ is a dependent variable,
  • $x$ is an independent variable,
  • $\beta_0$ is an intercept,
  • $\beta_1$ is the slope,
  • $e$ is the error term.


By the method of least square, the model parameters $\beta_0$ and $\beta_1$ can be estimated as

The regression coefficients $\beta_0$ (intercept) and $\beta_1$ (slope) can be estimated as

$\hat{\beta}_1 = \frac{n \sum xy - (\sum x)(\sum y)}{n(\sum x^2) -(\sum x)^2}$



  • $\overline{x}=\dfrac{1}{n}\sum_{i=1}^n x_i$ is the sample mean of $X$,
  • $\overline{y}=\dfrac{1}{n}\sum_{i=1}^n y_i$ is the sample mean of $Y$,
  • $n$ is the number of data points.

Important Results

  • Explained variation $SSR = \sum(\hat{y}-\overline{y})^2$
  • Unexplained variation $SSE = \sum (y-\hat{y})^2$
  • Total variation $SST = \sum (y-\overline{y})^2$
  • Coefficient of determination $R^2 =\dfrac{SSR}{SST}$
  • Standard error of estimate $S_e = \sqrt{\dfrac{\sum(y-\hat{y})^2}{n-2}}=\sqrt{\dfrac{SSE}{n-2}}$

Suggestions and comments will be appreciated.

Related Resources