Basic Two-Way FE DiD with Stargazer Table Output
Alex Marsella
2022-08-07
This tutorial goes over the basics of running an extremely common causal inference model: difference-in-differences with unit and time fixed effects. This will cover:
- Basic plotting of trends for control and treated (using base R).
- Understanding what is needed at minimum to run this kind of model.
- Basic
dplyr
aggregation. - Basic
felm
usage for regressions with fixed effects and clustered standard errors. - Basic
stargazer
usage for reporting results.
I am going to read in the data I use for my job market paper. It is a dataset with 3750 observations: 10 years of data for 375 census tracts. For each tract-year, it has various types of crime rates, population demographics, education levels, etc.
You, the reader, can grab the dataset using this code chunk.
library(curl)
download.file("https://raw.githubusercontent.com/alexmarsella/alexmarsella.github.io/master/assets/aggrework.rds", "aggrework.rds", method="curl")
agg <- readRDS("aggrework.rds")
# I am reading a file in from a separate working directory.
agg <- readRDS("C:/Users/Alex/Dropbox/PROJECTS/philly/aggrework.rds")
In 2014, 8 census tracts fell under a new initiative called the “Promise Zone”, which helps facilitate federal grant money into the area to improve education and reduce crime. Suppose we want to see how violent crime rates are affected in these eight tracts compared to the rest of Philadelphia; this creates a potential setting for a two-way fixed effects difference-in-differences model (TWFE DiD).
The most common setting for a TWFE DiD involves:
- Units
- Time
- A control group (made of units) that are never treated anywhere in time.
- A treated group (made of units) that are uniformly treated at a specific point in time, with at least a few untreated time periods before and after.
The key assumption you will often hear about in discussion of DiD is “parallel trends”. We want to be able to claim that, had the treated units never been treated, they would resemble the untreated units. Parallel trends is the idea that, before treatment occurred, the control units and (eventually) treated units experienced the same trend in the outcome variable.
If treatment is “staggered” (different units at different times) or some units are always treated, things get a little more complicated (requires consideration of some newer techniques that we will not cover in this tutorial).
At its most basic level, parallel trends can be checked visually. I can use dplyr to quickly make a dataset that takes the mean violent crime rate for the treated and control groups on a yearly basis.
library(dplyr)
yearplotdata <- agg %>% #name the data set I am working with, and what I want to read it to
group_by(fullzone,year) %>% #I want to group my data by year and treatment status
summarize(vcap=mean(vcap)) # I want the mean violent crime rate given the grouping provided
head(yearplotdata, n=20) #the data will only be 20 rows, may as well view all 20
## # A tibble: 20 × 3
## # Groups: fullzone [2]
## fullzone year vcap
## <dbl> <dbl> <dbl>
## 1 0 2010 31.5
## 2 0 2011 31.6
## 3 0 2012 30.6
## 4 0 2013 27.7
## 5 0 2014 26.8
## 6 0 2015 27.2
## 7 0 2016 27.3
## 8 0 2017 26.7
## 9 0 2018 26.7
## 10 0 2019 28.1
## 11 1 2010 42.0
## 12 1 2011 40.0
## 13 1 2012 37.7
## 14 1 2013 36.4
## 15 1 2014 33.1
## 16 1 2015 33.6
## 17 1 2016 30.6
## 18 1 2017 31.7
## 19 1 2018 27.4
## 20 1 2019 29.8
I can then use a plotting package like ggplot2
OR simply use base R to make a plot. Here, I will use base R.
#I'll plot year against crime for treated, using plot to create the plot in the first place
plot(yearplotdata$year[yearplotdata$fullzone==1], yearplotdata$vcap[yearplotdata$fullzone==1], type="b", col="blue", ylim=c(25,45))
#then I'll use `lines` to add a line to the plot, this time doing it for untreated.
lines(yearplotdata$year[yearplotdata$fullzone==0], yearplotdata$vcap[yearplotdata$fullzone==0], type="b", col="red")
#finally, I will put a vertical line where the treatment begins
abline(v=2013.5)
From a visual inspection, the pre-trend is likely parallel “enough”. In my actual paper, I confirm the parallel pre-trend using regression techniques (event study) and use some other methods that do not even require a naturally existing parallel pre-trend (synthetic control and synthetic DiD). Assuming the pre-trend is parallel and my model is correctly specified, we can move forward for now.1
A rather easy package to use for fixed effects regressions is called lfe
, which allows use of the command felm
.
My units are tracts, my time periods are years. For this particular exercise, our outcome variable of interest will be the violent crime rate per 1000 tract residents, called vcap
.
The general form of this kind of model involves a variable called post
, which is equal to 1 for any observation occurring after treatment, regardless of the unit. This means that any tract-year observation occurring in 2014 or later has post=1
. fullzone
, our treatment variable, is equal to 1 for any tract that lies fully within the Promise Zone, regardless of the year. Their interaction is only equal to 1 for treated tracts post-treatment. The reason we do not just include the interaction is that we want to control for anything unique to the eventually treated units that does not change across time. We also want to control for any changes in the post-period that are universal to all units. Note that, when we include unit and time fixed effects in a situation like this, Post
and Zone
disappear from our equation, as they are subsumed into the fixed effects. We are then left only with the interaction.
The functional form of this model is \(Crime_{it} = \beta_1Post_{t} + \beta_2Zone_{i} + \beta_3Post_{t} \times Zone_{i} + \alpha_{i} + \gamma_{t} + \mu_{i}\)
where \(\alpha_{i}\) and \(\gamma_{t}\) are unit and time fixed effects. \(Crime_{it}\) is the crime rate for unit i at time t.
The format for coding felm
takes on the form felm(y ~ x1+...+xn| fe1 + fe2 | Instrumental Variable | std. error clustering, data = data)
. Note that, if we choose not to use an instrumental variable, or fixed effects, or std. error clustering, we just put a 0 in that slot. Most of the time, you will be using fixed effects and possibly even standard error clustering, but here we will not be using an instrumental variable. When you have lfe
loaded, you could type ?felm
in your console for the full help menu on how to use it.2
Additionally, note that putting a *
between variables adds them individually AND interacts them, while a :
simply interacts them. Can you see why I would only use a colon in fe2
below? Think about what I said above about fixed effects. Even if I put the asterisk in equation 2, the first two variables would provide NA coefficients.
library(lfe)
library(stargazer)
# in the first equation, I do not use fixed effects, but I cluster my standard errors at the tract level since I believe that my errors might be correlated within tracts
fe1 <- felm(vcap ~ post*fullzone|0|0|tract, data=agg)
#in the second equation, I include both tract and year fixed effects
fe2 <- felm(vcap ~ post:fullzone|tract+year|0|tract , data=agg)
stargazer(fe1,fe2, type="text")
##
## ========================================================
## Dependent variable:
## ------------------------------------
## vcap
## (1) (2)
## --------------------------------------------------------
## post -3.213***
## (0.426)
##
## fullzone 8.680**
## (4.396)
##
## post:fullzone -4.764*** -4.764***
## (1.548) (1.548)
##
## Constant 30.345***
## (1.086)
##
## --------------------------------------------------------
## Observations 3,750 3,750
## R2 0.008 0.886
## Adjusted R2 0.007 0.873
## Residual Std. Error 20.920 (df = 3746) 7.474 (df = 3365)
## ========================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
Two things to note from (1) that shed some light on why we cannot just look at a plot to determine causation: the post-period saw a universal decline in crime (AKA across all tracts) and the zone tracts have a fundamentally higher crime rate, regardless of time. If we did not control for universal time trends that affect all tracts (post in (1) and year fixed effects in (2)), we may mistakenly ascribe a universal decline in crime to the Zone. By controlling for time, we can further narrow down the specific effect of the Promise Zone.
That being said, the treatment effect is significant. If our model is specified correctly, then it stands to reason that about 5 violent crimes per thousand residents per tract-year are prevented by the Promise Zone.
Let’s make a prettier stargazer table. We’ll use df=false
to save some horizontal space (no one needs to see your df), we’ll label our variables, we’ll make sure the reader can see which model has TWFE, and we’ll add a little note at the bottom pointing out that we clustered our std. errors at the tract level.
stargazer(fe1,fe2,
add.lines = list(c("TWFE?", "", "X")), title="Difference-in-Differences for Violent Crime", dep.var.labels="V. Crimes per 1k Tract Residents",
align=FALSE, notes="Std. Errors Clustered at tract level", df=FALSE, notes.append=TRUE, font.size = "small" ,
covariate.labels = c("Post","Zone","Post x Zone"), type = "text")
##
## Difference-in-Differences for Violent Crime
## =========================================================
## Dependent variable:
## -------------------------------------
## V. Crimes per 1k Tract Residents
## (1) (2)
## ---------------------------------------------------------
## Post -3.213***
## (0.426)
##
## Zone 8.680**
## (4.396)
##
## Post x Zone -4.764*** -4.764***
## (1.548) (1.548)
##
## Constant 30.345***
## (1.086)
##
## ---------------------------------------------------------
## TWFE? X
## Observations 3,750 3,750
## R2 0.008 0.886
## Adjusted R2 0.007 0.873
## Residual Std. Error 20.920 7.474
## =========================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
## Std. Errors Clustered at tract level
Now we have a well formatted table of the type that you will often see in academic journals. If you fail to use the command type="text"
, you will get a table coded in LaTeX. If you paste that output into a LaTeX processor like Overleaf, you will see an even nicer looking table.
If you are my student in Econ 482, keep in mind that we can only fit so much in a semester. Learning the more advanced techniques that have been developed in the past few years may require you to attend grad school or self-teach. Do not worry too much about trying to stay on the absolute cutting-edge of causal inference unless you want to get a PhD, become a professional data scientist, or dedicate your life to research. You are already way ahead of where I was when I took my senior capstone class!↩︎
Or, you can go to its vignette here: https://www.rdocumentation.org/packages/lfe/versions/2.8-8/topics/felm↩︎