Panel Data Models
January 28, 2026
Learning goals
Panel data tracks the same units over multiple time periods:
Example (CO\(_2\) panel): - Outcome: CO\(_2\) emissions per capita - Unit ID: iso_code - Time ID: year
Tip
Panel data gives us two kinds of variation: - Between variation: differences across countries - Within variation: changes within a country over time
Pooled OLS can be misleading because it ignores unobserved country differences that don’t change much over time, such as:
These factors can affect both
➡️ That creates omitted variable bias when we use Pooled OLS.
Suppose the true relationship includes an unobserved country component:
\[ y_{it} = \beta x_{it} + \underbrace{\alpha_i}_{\text{unobserved country trait}} + \epsilon_{it} \]
If we ignore \(\alpha_i\) and estimate Pooled OLS:
\[ y_{it} = \beta x_{it} + u_{it} \quad \text{where} \quad u_{it} = \alpha_i + \epsilon_{it} \]
⚠️ If \(Cov(\alpha_i, x_{it}) \neq 0\), then:
Fixed Effects is designed to solve exactly this problem.
Model \[ y_{it} = \beta_0 + \beta_1 x_{it} + \epsilon_{it} \]
Meaning
Uses all variation
When you estimate POLS, the slope \(\beta_1\) reflects a combination of:
So interpretation becomes:
“On average, countries with higher \(x\) have higher (or lower) \(y\), and/or when \(x\) rises over time, \(y\) moves…”
⚠️ That can be fine only if unobserved country traits don’t confound the relationship.
If richer countries also have different institutions, energy systems, or historical industrialization paths, then:
Panel methods help because they explicitly control for \(\alpha_i\).
Model \[ y_{it} = \beta x_{it} + \alpha_i + \epsilon_{it} \]
FE answers a within-country question:
“If a country’s \(x\) changes over time, how does \(y\) change within that same country?”
Because each country gets its own intercept \(\alpha_i\), FE controls for:
Examples (time-invariant or very slow-moving):
📌 FE is powerful because it controls for a lot without measuring those things.
FE does not use differences between countries:
So FE is best when:
⚠️ FE cannot estimate effects of variables that do not change over time, e.g.:
landlocked, continent, distance_to_equatorModel \[ y_{it} = \beta x_{it} + \alpha_i + \gamma_t + \epsilon_{it} \]
TWFE answers:
“Within a country, how does \(y\) change with \(x\), after removing global year shocks?”
Year effects capture shocks that hit many countries at once:
If we don’t include \(\gamma_t\), the regression may mistakenly attribute these common shocks to \(x_{it}\).
Year FE helps remove this confounding.
Model \[ y_{it} = \beta x_{it} + \alpha_i + \epsilon_{it} \]
Same structure as FE… but different assumption:
\[ Cov(\alpha_i, x_{it}) = 0 \]
If that assumption holds:
If \(Cov(\alpha_i, x_{it}) \neq 0\), then RE is biased.
In many economic settings, this correlation is very plausible:
So FE is often the safer default in applied work.
In panel data, observations inside a country are rarely independent:
That means residuals may satisfy serial correlation:
\[ Cov(\epsilon_{it}, \epsilon_{is}) \neq 0 \quad \text{for } t \neq s \]
If we ignore this, standard errors are often too small → false “significance”
Country-clustered SEs allow:
Interpretation:
“We allow any kind of serial correlation within each country.”
In R (plm), this is typically implemented using robust VCOV options, e.g. vcovHC() with clustering by group.
vcovHC: variance–covariance, heteroskedasticity-consistent| Model | Controls for country traits? | Controls for year shocks? | Uses between-country variation? |
|---|---|---|---|
| Pooled OLS | ❌ No | ❌ No | Yes |
| Country FE | Yes (time-invariant) | ❌ No | ❌ No |
| Two-way FE | Yes (time-invariant) | Yes | ❌ No |
| Random Effects | Yes (modeled as random intercept) | Optional | Yes |
We use panel data models to estimate relationships within units over time, while controlling for:
And we use clustered SEs because panel residuals are often correlated within each unit.
Note
Now we add the key EKC idea: a nonlinear relationship between income and pollution.
The Environmental Kuznets Curve (EKC) hypothesis suggests:
As a country becomes richer, pollution first increases,
but after some income level, pollution decreases.
It is often described as an inverted-U relationship between: - pollution (e.g., CO\(_2\) per capita) - income (GDP per capita)
A common EKC specification:
\[ \log(CO2pc_{it}) = \beta_1 \log(GDPpc_{it}) + \beta_2 \left[\log(GDPpc_{it})\right]^2 + \alpha_i + \gamma_t + \epsilon_{it} \]
If \(\beta_1 > 0\) and \(\beta_2 < 0\)
emissions rise at first, then eventually fall
→ EKC pattern (inverted-U)
If \(\beta_2 = 0\)
→ relationship is linear
If \(\beta_2 > 0\)
→ U-shape (emissions accelerate with income)
The turning point occurs where the slope becomes zero:
\[ \frac{\partial \log(CO2pc)}{\partial \log(GDPpc)} = \beta_1 + 2\beta_2 \log(GDPpc) = 0 \]
So the turning point in log income is:
\[ \log(GDPpc^*) = -\frac{\beta_1}{2\beta_2} \]
Convert back to the income level:
\[ GDPpc^* = \exp\left(-\frac{\beta_1}{2\beta_2}\right) \]
At \(GDPpc^*\), CO\(_2\) per capita stops rising and begins falling (if EKC holds).
As income rises, countries often experience:
EKC is the balance of:
EKC may be weaker for CO\(_2\) than for local air pollutants because:
So EKC is a hypothesis to test, not a guaranteed pattern.
EKC tests whether pollution follows an inverted-U with income.
Panel FE/TWFE help make that test more credible by controlling for: