October 16, 2024
Linear model with one explanatory variable…
Caution
Note that for the next few examples we will be analyzing GDP per capita on a log scale.
\(\hat{Y} = a + b \times X\)
\(\hat{Y} = 0.13 + 0.12 \times X\)
What is the interpretation of our estimate of \(a\)?
\(a\) is our predicted level of democracy when GDP per capita is 0.
What is interpretation of our estimate of \(b\)?
Model: Democracy = 0.12 × log(Wealth)
Coefficient Interpretation:
In Dollar Terms:
Is this the causal effect of GDP per capita on liberal democracy?
No! It is only the association…
To identify causality we need other methods (beyond the scope of this course).
An economist is interested in the relationship between years of education and hourly wages. They estimate a linear model with estimates of \(a\) and \(b\) as follows:
\(\hat{Y} = 9 + 1.60*{YrsEdu}\)
Goal: Estimate Democracy score (\(\hat{Y_{i}}\)) of a country given level of GDP per capita (\(X_{i}\)).
Or: Estimate relationship between GDP per capita and democracy.
Call:
lm(formula = lib_dem ~ log_wealth, data = modelData)
Residuals:
Min 1Q Median 3Q Max
-0.57441 -0.14334 0.03911 0.18730 0.37017
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.13051 0.03806 3.429 0.000758 ***
log_wealth 0.12040 0.01471 8.188 5.75e-14 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.2233 on 172 degrees of freedom
(5 observations deleted due to missingness)
Multiple R-squared: 0.2805, Adjusted R-squared: 0.2763
F-statistic: 67.04 on 1 and 172 DF, p-value: 5.754e-14
In equation form… How do we interpret the model?
\[\widehat{Democracy}_{i} = 0.13 + 0.12 * {loggdppc}_{i}\]
How do we get the “best” values for the slope and intercept?
Residual for each point is: \(e_i = y_i - \hat{y}_i\)
Least squares regression line minimizes \(\sum_{i = 1}^n e_i^2\).
Why not take absolute value?
What should the slope and intercept be?
\(\hat{Y} = 0 + 1*X\)
What is the sum of squared residuals?
What is sum of squared residuals for \(y = 0 + 0*X\)?
What is sum of squared residuals for \(y = 0 + 0*X\)?
What is sum of squared residuals for \(y = 0 + 2*X\)?
What is sum of squared residuals for \(y = 0 + 2*X\)?
What is sum of squared residuals for \(y = 0 + -1*X\)?
What is sum of squared residuals for \(y = 0 + -1*X\)?
Sum of Squared Residuals as function of possible values of \(b\)
When we estimate a least squares regression, it is looking for the line that minimizes sum of squared residuals
In the simple example, I set \(a=0\) to make it easier. More complicated when searching for combination of \(a\) and \(b\) that minimize, but same basic idea
There is a way to solve for this analytically for linear regression (i.e., by doing math…)
– They made us do this in grad school…