Event Study Tutorial
2022-08-15
In this tutorial, we will revisit the Philly Promise Zone crime data from my JMP, which I used in this tutorial here.
Previously, we performed a difference-in-differences analysis with two-way fixed effects. Now, we will visit a concept called “Event Study”, or alternatively: “Dynamic Difference-in-Differences”. This model estimates a unique regression coefficient for some treatment status in each time period. This allows you to measure a differential effect of your variable of interest in different time periods relative to some base time period (usually the period before treatment begins). In addition, it allows you to implicitly check the parallel trends assumption by measuring a treatment effect before treatment begins. An event study is often presented as a plot, measuring treatment effect on the treated group at every time period in your data.
This type of model takes on the form
\(Crime_{it} = \sum_{t=-4}^{-2}\beta_k \times Treat_{i} + \sum_{t=0}^{5}\beta_k \times Treat_{i}+ A_{it} + \alpha_{i} + \gamma_{t} + \mu_{i}\)
\(A\), \(\alpha\), and \(\gamma\) are generalized notation for controls, unit fixed effects, and time fixed effects, respectively. In plain English, what this model does is essentially interact the eventually treated units with every time period. Hopefully, there is no observed treatment effect in the pre-period and an observed one in the post-period.
# I am reading a file in from a separate working directory. The next chunk is for you to load it in from online.
#agg <- readRDS("C:/Users/Alex/Dropbox/PROJECTS/philly/aggrework.rds")
You, the reader, can grab the dataset using this code chunk.
library(curl)
download.file("https://raw.githubusercontent.com/alexmarsella/alexmarsella.github.io/master/assets/aggrework.rds", "aggrework.rds", method="curl")
agg <- readRDS("aggrework.rds")
For this tutorial, we will use feols
from the package fixest
to perform our event study analysis.
First, we have to take some steps to prepare our data for the way feols
performs event studies. We must:
- Create a sequence for our time periods. In this case I have ten years.
- Create a variable defining time period in relation to treatment.
library(fixest)
agg$converted_int <- agg$year-2009
agg$time_to_treat <- ifelse(agg$fullzone==1, agg$converted_int - 5, 0)
This last line of code tells R to make a variable called time_to_treat, defined for tracts lying fully within the Promise Zone, where distance from treatment (which begins in 2014) is defined by the year number minus 5.
Simple example: for the year 2010, converted_int==1. By substracting 5, it takes on time_to_treat=-4. 2010 is four years prior to treatment. 2011 is 3 years prior, 2012 is 2 years prior, and so on.
When performing our feols
command, we use ref = -1
to define 2013 as our reference year. Generally speaking, the year before treatment begins is used as the reference year. Keep in mind that year fixed effects account for any 2013-specific universal effects that might bias our results.
mod_twfe = feols(vcap ~ i(time_to_treat, fullzone, ref = -1) +
+white+black+hisp+nofathercap+male15to29cap+lessthanhighcap+highcap+somecollegecap+bachelorcap+pcinc+nowork12cap|#controls
tract+year, # FEs
cluster = ~tract, # Clustered SEs. It is best practice to cluster SEs at the unit level.
data= agg)
summary(mod_twfe)
## OLS estimation, Dep. Var.: vcap
## Observations: 3,750
## Fixed-effects: tract: 375, year: 10
## Standard-errors: Clustered (tract)
## Estimate Std. Error t value Pr(>|t|)
## time_to_treat::-4:fullzone 1.102947 3.690799 0.298837 0.7652305
## time_to_treat::-3:fullzone -0.368614 1.788395 -0.206114 0.8368137
## time_to_treat::-2:fullzone -1.722138 1.718329 -1.002217 0.3168868
## time_to_treat::0:fullzone -2.466749 1.878570 -1.313099 0.1899545
## time_to_treat::1:fullzone -1.974754 3.055750 -0.646242 0.5185191
## time_to_treat::2:fullzone -5.048870 2.775409 -1.819144 0.0696889 .
## time_to_treat::3:fullzone -3.775540 2.693331 -1.401811 0.1618012
## time_to_treat::4:fullzone -7.686962 3.552741 -2.163671 0.0311223 *
## time_to_treat::5:fullzone -6.627231 3.454024 -1.918699 0.0557834 .
## white -2.208869 7.910278 -0.279240 0.7802147
## black 0.174311 9.502404 0.018344 0.9853743
## hisp -6.405225 7.381747 -0.867711 0.3861088
## nofathercap 2.126376 2.601386 0.817401 0.4142198
## male15to29cap -25.661636 8.504484 -3.017424 0.0027238 **
## lessthanhighcap 8.394764 9.433894 0.889851 0.3741177
## highcap 6.181971 8.247591 0.749549 0.4539978
## somecollegecap 6.767389 8.604503 0.786494 0.4320763
## bachelorcap 11.397834 7.304392 1.560408 0.1195092
## pcinc 0.000060 0.000103 0.583678 0.5597886
## nowork12cap -7.370267 7.915379 -0.931133 0.3523857
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## RMSE: 7.02082 Adj. R2: 0.87466
## Within R2: 0.018831
iplot(mod_twfe, #violence event study
xlab = 'Time to treatment',
main = 'Event study: Effect of treatment on violent crime over time')
The interactions between the pre-treatment years and the Promise Zone are insignificant. This suggests, but does not prove beyond a doubt, a parallel pre-trend. Think of the interaction between the treatment variable and each time period as a measure of the treatment effect in that specific time period. If there is no treatment effect before the treatment ostensibly begins, that is a sign of parallel trends.
The treatment effect clearly gets stronger over time, but is only significant at the 5% level in 2018 (4 years post-treatment). Clearly, the effect shown in the DiD tutorial is driven mainly by 2018. Keep in mind that analyses of results are always context dependent. The over-simplified explanation for why it takes years for crime to be reduced is that it simply takes time for the various programs of the Promise Zone to be fully implemented.