The most widely used statistical model in the world, and where it goes wrong.
You have data on 100 apartments: square footage and monthly rent. You want to know how much each additional square foot is worth. Plot the data and you see a cloud of points trending upward. Ordinary least squares regression finds the straight line through that cloud that minimizes a specific quantity, and the slope of that line is your estimate of how rent changes per square foot.
For each apartment in your dataset, the model predicts a rent based on its size. The prediction error for each observation is called the residual: the actual rent minus the predicted rent. OLS finds the line that minimizes the sum of squared residuals.
Why squared? Two reasons. Squaring makes all residuals positive (so negative and positive errors do not cancel each other out), and it penalizes large errors more than small ones. A single observation that is $500 off contributes $250,000 to the sum of squares; ten observations each $50 off contribute only $25,000 total. OLS is sensitive to outliers precisely because of this squaring.
The solution to the minimization problem has a closed form. For simple regression with one predictor:
slope = sum of [(xi - x-bar)(yi - y-bar)] / sum of [(xi - x-bar)^2]
This is the covariance of x and y divided by the variance of x. The slope measures how much y is expected to change for a one-unit change in x, based on the linear relationship in the data.
In the apartment example, a slope of $2.50 means that apartments with one more square foot rent for an average of $2.50 more per month, holding nothing else constant. If larger apartments also tend to be in better neighborhoods, have more bathrooms, and be on higher floors, then the $2.50 is capturing all of those correlated differences, not just the value of square footage alone.
This is the distinction between a partial effect and a total effect. Adding more predictors to the model allows you to estimate the effect of square footage holding neighborhood, floor, and bathroom count constant. The coefficient changes when you add controls because the controls absorb some of the variation that square footage was previously picking up.
OLS produces unbiased estimates with correct standard errors under specific assumptions. The relationship between the predictor and outcome must be linear. Residuals must have constant variance across all values of the predictor (homoscedasticity). Observations must be independent. And residuals must not be correlated with the predictors (no omitted variable bias).
When residuals fan out as the predictor increases, you have heteroscedasticity and your standard errors are wrong. When observations are clustered (students within schools, patients within hospitals), standard errors must be adjusted. When an omitted variable is correlated with both the predictor and the outcome, your coefficient is biased and no amount of additional data fixes this.
R-squared measures the proportion of variance in the outcome explained by the model. An R-squared of 0.6 means the model accounts for 60% of the variation in rent. It is tempting to interpret higher R-squared as a better model, but R-squared can be high while coefficients are biased, and it will always increase when you add predictors regardless of whether those predictors are meaningful. Adjusted R-squared penalizes for the number of predictors, which is more useful for model comparison.
The most widely used statistical model in the world, and where it goes wrong.
Lorem ipsum dolor sit amet consectetur at amet felis nulla molestie non viverra diam sed augue gravida ante risus pulvinar diam turpis ut bibendum ut velit felis at nisl lectus.