Basic Two-Way FE DiD with Stargazer Table Output

This tutorial goes over the basics of running an extremely common causal inference model: difference-in-differences with unit and time fixed effects. This will cover:

  • Basic plotting of trends for control and treated (using base R).
  • Understanding what is needed at minimum to run this kind of model.
  • Basic dplyr aggregation.
  • Basic felm usage for regressions with fixed effects and clustered standard errors.
  • Basic stargazer usage for reporting results.

I am going to read in the data I use for my job market paper. It is a dataset with 3750 observations: 10 years of data for 375 census tracts. For each tract-year, it has various types of crime rates, population demographics, education levels, etc.

You, the reader, can grab the dataset using this code chunk.

library(curl)
download.file("https://raw.githubusercontent.com/alexmarsella/alexmarsella.github.io/master/assets/aggrework.rds", "aggrework.rds", method="curl")
agg <- readRDS("aggrework.rds")
# I am reading a file in from a separate working directory.
agg <- readRDS("C:/Users/Alex/Dropbox/PROJECTS/philly/aggrework.rds") 

In 2014, 8 census tracts fell under a new initiative called the “Promise Zone”, which helps facilitate federal grant money into the area to improve education and reduce crime. Suppose we want to see how violent crime rates are affected in these eight tracts compared to the rest of Philadelphia; this creates a potential setting for a two-way fixed effects difference-in-differences model (TWFE DiD).

The most common setting for a TWFE DiD involves:

  • Units
  • Time
  • A control group (made of units) that are never treated anywhere in time.
  • A treated group (made of units) that are uniformly treated at a specific point in time, with at least a few untreated time periods before and after.

The key assumption you will often hear about in discussion of DiD is “parallel trends”. We want to be able to claim that, had the treated units never been treated, they would resemble the untreated units. Parallel trends is the idea that, before treatment occurred, the control units and (eventually) treated units experienced the same trend in the outcome variable.

If treatment is “staggered” (different units at different times) or some units are always treated, things get a little more complicated (requires consideration of some newer techniques that we will not cover in this tutorial).

At its most basic level, parallel trends can be checked visually. I can use dplyr to quickly make a dataset that takes the mean violent crime rate for the treated and control groups on a yearly basis.

library(dplyr)
yearplotdata <- agg %>%  #name the data set I am working with, and what I want to read it to 
  group_by(fullzone,year) %>% #I want to group my data by year and treatment status
summarize(vcap=mean(vcap)) # I want the mean violent crime rate given the grouping provided

head(yearplotdata, n=20) #the data will only be 20 rows, may as well view all 20
## # A tibble: 20 × 3
## # Groups:   fullzone [2]
##    fullzone  year  vcap
##       <dbl> <dbl> <dbl>
##  1        0  2010  31.5
##  2        0  2011  31.6
##  3        0  2012  30.6
##  4        0  2013  27.7
##  5        0  2014  26.8
##  6        0  2015  27.2
##  7        0  2016  27.3
##  8        0  2017  26.7
##  9        0  2018  26.7
## 10        0  2019  28.1
## 11        1  2010  42.0
## 12        1  2011  40.0
## 13        1  2012  37.7
## 14        1  2013  36.4
## 15        1  2014  33.1
## 16        1  2015  33.6
## 17        1  2016  30.6
## 18        1  2017  31.7
## 19        1  2018  27.4
## 20        1  2019  29.8

I can then use a plotting package like ggplot2 OR simply use base R to make a plot. Here, I will use base R.

#I'll plot year against crime for treated, using plot to create the plot in the first place
plot(yearplotdata$year[yearplotdata$fullzone==1], yearplotdata$vcap[yearplotdata$fullzone==1], type="b", col="blue", ylim=c(25,45))

#then I'll use `lines` to add a line to the plot, this time doing it for untreated.
lines(yearplotdata$year[yearplotdata$fullzone==0], yearplotdata$vcap[yearplotdata$fullzone==0], type="b", col="red")

#finally, I will put a vertical line where the treatment begins
abline(v=2013.5)

From a visual inspection, the pre-trend is likely parallel “enough”. In my actual paper, I confirm the parallel pre-trend using regression techniques (event study) and use some other methods that do not even require a naturally existing parallel pre-trend (synthetic control and synthetic DiD). Assuming the pre-trend is parallel and my model is correctly specified, we can move forward for now.1

A rather easy package to use for fixed effects regressions is called lfe, which allows use of the command felm.

My units are tracts, my time periods are years. For this particular exercise, our outcome variable of interest will be the violent crime rate per 1000 tract residents, called vcap.

The general form of this kind of model involves a variable called post, which is equal to 1 for any observation occurring after treatment, regardless of the unit. This means that any tract-year observation occurring in 2014 or later has post=1. fullzone, our treatment variable, is equal to 1 for any tract that lies fully within the Promise Zone, regardless of the year. Their interaction is only equal to 1 for treated tracts post-treatment. The reason we do not just include the interaction is that we want to control for anything unique to the eventually treated units that does not change across time. We also want to control for any changes in the post-period that are universal to all units. Note that, when we include unit and time fixed effects in a situation like this, Post and Zone disappear from our equation, as they are subsumed into the fixed effects. We are then left only with the interaction.

The functional form of this model is \(Crime_{it} = \beta_1Post_{t} + \beta_2Zone_{i} + \beta_3Post_{t} \times Zone_{i} + \alpha_{i} + \gamma_{t} + \mu_{i}\)

where \(\alpha_{i}\) and \(\gamma_{t}\) are unit and time fixed effects. \(Crime_{it}\) is the crime rate for unit i at time t.

The format for coding felm takes on the form felm(y ~ x1+...+xn| fe1 + fe2 | Instrumental Variable | std. error clustering, data = data). Note that, if we choose not to use an instrumental variable, or fixed effects, or std. error clustering, we just put a 0 in that slot. Most of the time, you will be using fixed effects and possibly even standard error clustering, but here we will not be using an instrumental variable. When you have lfe loaded, you could type ?felm in your console for the full help menu on how to use it.2

Additionally, note that putting a * between variables adds them individually AND interacts them, while a : simply interacts them. Can you see why I would only use a colon in fe2 below? Think about what I said above about fixed effects. Even if I put the asterisk in equation 2, the first two variables would provide NA coefficients.

library(lfe)
library(stargazer)
# in the first equation, I do not use fixed effects, but I cluster my standard errors at the tract level since I believe that my errors might be correlated within tracts
fe1 <- felm(vcap ~ post*fullzone|0|0|tract, data=agg)

#in the second equation, I include both tract and year fixed effects
fe2 <- felm(vcap ~ post:fullzone|tract+year|0|tract , data=agg)

stargazer(fe1,fe2, type="text")
## 
## ========================================================
##                             Dependent variable:         
##                     ------------------------------------
##                                     vcap                
##                            (1)                (2)       
## --------------------------------------------------------
## post                    -3.213***                       
##                          (0.426)                        
##                                                         
## fullzone                 8.680**                        
##                          (4.396)                        
##                                                         
## post:fullzone           -4.764***          -4.764***    
##                          (1.548)            (1.548)     
##                                                         
## Constant                30.345***                       
##                          (1.086)                        
##                                                         
## --------------------------------------------------------
## Observations              3,750              3,750      
## R2                        0.008              0.886      
## Adjusted R2               0.007              0.873      
## Residual Std. Error 20.920 (df = 3746) 7.474 (df = 3365)
## ========================================================
## Note:                        *p<0.1; **p<0.05; ***p<0.01

Two things to note from (1) that shed some light on why we cannot just look at a plot to determine causation: the post-period saw a universal decline in crime (AKA across all tracts) and the zone tracts have a fundamentally higher crime rate, regardless of time. If we did not control for universal time trends that affect all tracts (post in (1) and year fixed effects in (2)), we may mistakenly ascribe a universal decline in crime to the Zone. By controlling for time, we can further narrow down the specific effect of the Promise Zone.

That being said, the treatment effect is significant. If our model is specified correctly, then it stands to reason that about 5 violent crimes per thousand residents per tract-year are prevented by the Promise Zone.

Let’s make a prettier stargazer table. We’ll use df=false to save some horizontal space (no one needs to see your df), we’ll label our variables, we’ll make sure the reader can see which model has TWFE, and we’ll add a little note at the bottom pointing out that we clustered our std. errors at the tract level.

stargazer(fe1,fe2,
add.lines = list(c("TWFE?", "", "X")), title="Difference-in-Differences for Violent Crime", dep.var.labels="V. Crimes per 1k Tract Residents",
 align=FALSE, notes="Std. Errors Clustered at tract level", df=FALSE, notes.append=TRUE,  font.size = "small" ,
covariate.labels = c("Post","Zone","Post x Zone"), type = "text")
## 
## Difference-in-Differences for Violent Crime
## =========================================================
##                              Dependent variable:         
##                     -------------------------------------
##                       V. Crimes per 1k Tract Residents   
##                             (1)                (2)       
## ---------------------------------------------------------
## Post                     -3.213***                       
##                           (0.426)                        
##                                                          
## Zone                      8.680**                        
##                           (4.396)                        
##                                                          
## Post x Zone              -4.764***          -4.764***    
##                           (1.548)            (1.548)     
##                                                          
## Constant                 30.345***                       
##                           (1.086)                        
##                                                          
## ---------------------------------------------------------
## TWFE?                                           X        
## Observations               3,750              3,750      
## R2                         0.008              0.886      
## Adjusted R2                0.007              0.873      
## Residual Std. Error       20.920              7.474      
## =========================================================
## Note:                         *p<0.1; **p<0.05; ***p<0.01
##                      Std. Errors Clustered at tract level

Now we have a well formatted table of the type that you will often see in academic journals. If you fail to use the command type="text", you will get a table coded in LaTeX. If you paste that output into a LaTeX processor like Overleaf, you will see an even nicer looking table.


  1. If you are my student in Econ 482, keep in mind that we can only fit so much in a semester. Learning the more advanced techniques that have been developed in the past few years may require you to attend grad school or self-teach. Do not worry too much about trying to stay on the absolute cutting-edge of causal inference unless you want to get a PhD, become a professional data scientist, or dedicate your life to research. You are already way ahead of where I was when I took my senior capstone class!↩︎

  2. Or, you can go to its vignette here: https://www.rdocumentation.org/packages/lfe/versions/2.8-8/topics/felm↩︎