Title: | Two-Stage Difference-in-Differences Following Gardner (2021) |
---|---|
Description: | Estimates Two-way Fixed Effects difference-in-differences/event-study models using the approach proposed by Gardner (2021) <doi:10.48550/arXiv.2207.05943>. To avoid the problems caused by OLS estimation of the Two-way Fixed Effects model, this function first estimates the fixed effects and covariates using untreated observations and then in a second stage, estimates the treatment effects. |
Authors: | Kyle Butts [aut, cre] , John Gardner [aut] , Grant McDermott [ctb] , Laurent Berge [ctb] |
Maintainer: | Kyle Butts <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.1.0 |
Built: | 2024-11-10 16:36:27 UTC |
Source: | https://github.com/kylebutts/did2s |
State-wide panel data from 2000-2010 that has information on castle-doctrine, the so-called "stand-your-ground" laws that were implemented by 20 states.
castle
castle
A data frame with 550 rows and 5 variables:
state id, unit of observation
time in panel data
log of the number of homicides per capita
year that castle doctrine is passed
0/1 variable for when castle doctrine is active
time relative to castle doctrine being passed into law
Generated using the following call:
did2s::gen_data(panel = c(1990, 2020),
g1 = 2000, g2 = 2010, g3 = 0,
te1 = 2, te2 = 1, te3 = 0,
te_m1 = 0.05, te_m2 = 0.15, te_m3 = 0)
df_het
df_het
A data frame with 31000 rows and 15 variables:
individual in panel data
time in panel data
the year that treatment starts
outcome variable
T/F variable for when treatment is on
year relative to treatment start. Inf = never treated.
year relative to treatment start, but <=-6 and >=6 are binned.
Unit FE
Year FE
Random error component
Static treatment effect = te
Dynamic treatmet effect = te_m
State that unit is in
String name for group
Generated using the following call:
did2s::gen_data(panel = c(1990, 2020),
g1 = 2000, g2 = 2010, g3 = 0,
te1 = 2, te2 = 2, te3 = 0,
te_m1 = 0, te_m2 = 0, te_m3 = 0)
df_hom
df_hom
A data frame with 31000 rows and 15 variables:
individual in panel data
time in panel data
the year that treatment starts
outcome variable
T/F variable for when treatment is on
year relative to treatment start. Inf = never treated.
year relative to treatment start, but <=-6 and >=6 are binned.
Unit FE
Year FE
Random error component
Static treatment effect = te
Dynamic treatmet effect = te_m
String name for group
State that unit is in
Weight from runif()
Calculate two-stage difference-in-differences following Gardner (2021)
did2s( data, yname, first_stage, second_stage, treatment, cluster_var, weights = NULL, bootstrap = FALSE, n_bootstraps = 250, return_bootstrap = FALSE, verbose = TRUE )
did2s( data, yname, first_stage, second_stage, treatment, cluster_var, weights = NULL, bootstrap = FALSE, n_bootstraps = 250, return_bootstrap = FALSE, verbose = TRUE )
data |
The dataframe containing all the variables |
yname |
Outcome variable |
first_stage |
Fixed effects and other covariates you want to residualize
with in first stage.
Formula following |
second_stage |
Second stage, these should be the treatment indicator(s)
(e.g. treatment variable or event-study leads/lags).
Formula following |
treatment |
A variable that = 1 if treated, = 0 otherwise. The first
stage will be estimated for |
cluster_var |
What variable to cluster standard errors. This can be IDs or a higher aggregate level (state for example) |
weights |
Optional. Variable name for regression weights. |
bootstrap |
Optional. Should standard errors be calculated using bootstrap?
Default is |
n_bootstraps |
Optional. How many bootstraps to run.
Default is |
return_bootstrap |
Optional. Logical. Will return each bootstrap second-stage estimate to allow for manual use, e.g. percentile standard errors and empirical confidence intervals. |
verbose |
Optional. Logical. Should information about the two-stage
procedure be printed back to the user?
Default is |
fixest
object with adjusted standard errors
(either by formula or by bootstrap). All the methods from fixest
package
will work, including fixest::esttable
and
fixest::coefplot
Load example dataset which has two treatment groups and homogeneous treatment effects
# Load Example Dataset data("df_hom")
You can run a static TWFE fixed effect model for a simple treatment indicator
static <- did2s(df_hom, yname = "dep_var", treatment = "treat", cluster_var = "state", first_stage = ~ 0 | unit + year, second_stage = ~ i(treat, ref=FALSE)) #> Running Two-stage Difference-in-Differences #> - first stage formula `~ 0 | unit + year` #> - second stage formula `~ i(treat, ref = FALSE)` #> - The indicator variable that denotes when treatment is on is `treat` #> - Standard errors will be clustered by `state` fixest::esttable(static) #> static #> Dependent Var.: dep_var #> #> treat = TRUE 2.005*** (0.0202) #> _______________ _________________ #> S.E. type Custom #> Observations 46,500 #> R2 0.47520 #> Adj. R2 0.47520 #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Or you can use relative-treatment indicators to estimate an event study estimate
es <- did2s(df_hom, yname = "dep_var", treatment = "treat", cluster_var = "state", first_stage = ~ 0 | unit + year, second_stage = ~ i(rel_year, ref=c(-1, Inf))) #> Running Two-stage Difference-in-Differences #> - first stage formula `~ 0 | unit + year` #> - second stage formula `~ i(rel_year, ref = c(-1, Inf))` #> - The indicator variable that denotes when treatment is on is `treat` #> - Standard errors will be clustered by `state` fixest::esttable(es) #> es #> Dependent Var.: dep_var #> #> rel_year = -20 0.0043 (0.0322) #> rel_year = -19 0.0222 (0.0296) #> rel_year = -18 -0.0358 (0.0308) #> rel_year = -17 0.0043 (0.0337) #> rel_year = -16 -0.0186 (0.0353) #> rel_year = -15 -0.0045 (0.0346) #> rel_year = -14 -0.0393 (0.0384) #> rel_year = -13 0.0453 (0.0323) #> rel_year = -12 0.0324 (0.0309) #> rel_year = -11 -0.0245 (0.0349) #> rel_year = -10 -0.0017 (0.0241) #> rel_year = -9 0.0155 (0.0242) #> rel_year = -8 -0.0073 (0.0210) #> rel_year = -7 -0.0513* (0.0202) #> rel_year = -6 0.0269 (0.0237) #> rel_year = -5 0.0136 (0.0237) #> rel_year = -4 0.0381. (0.0223) #> rel_year = -3 -0.0228 (0.0284) #> rel_year = -2 0.0041 (0.0228) #> rel_year = 0 1.971*** (0.0470) #> rel_year = 1 2.050*** (0.0466) #> rel_year = 2 2.033*** (0.0441) #> rel_year = 3 1.966*** (0.0400) #> rel_year = 4 1.965*** (0.0430) #> rel_year = 5 2.030*** (0.0456) #> rel_year = 6 2.040*** (0.0447) #> rel_year = 7 1.995*** (0.0370) #> rel_year = 8 2.019*** (0.0485) #> rel_year = 9 1.955*** (0.0468) #> rel_year = 10 1.950*** (0.0455) #> rel_year = 11 2.117*** (0.0664) #> rel_year = 12 2.132*** (0.0741) #> rel_year = 13 2.019*** (0.0640) #> rel_year = 14 2.013*** (0.0522) #> rel_year = 15 1.961*** (0.0605) #> rel_year = 16 1.916*** (0.0584) #> rel_year = 17 1.938*** (0.0607) #> rel_year = 18 2.070*** (0.0666) #> rel_year = 19 2.066*** (0.0609) #> rel_year = 20 1.964*** (0.0612) #> _______________ _________________ #> S.E. type Custom #> Observations 46,500 #> R2 0.47577 #> Adj. R2 0.47533 #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# plot rel_year coefficients and standard errors fixest::coefplot(es, keep = "rel_year::(.*)")
Here's an example using data from Cheng and Hoekstra (2013)
# Castle Data castle <- haven::read_dta("https://github.com/scunning1975/mixtape/raw/master/castle.dta") did2s( data = castle, yname = "l_homicide", first_stage = ~ 0 | sid + year, second_stage = ~ i(post, ref=0), treatment = "post", cluster_var = "state", weights = "popwt" ) #> Running Two-stage Difference-in-Differences #> - first stage formula `~ 0 | sid + year` #> - second stage formula `~ i(post, ref = 0)` #> - The indicator variable that denotes when treatment is on is `post` #> - Standard errors will be clustered by `state` #> OLS estimation, Dep. Var.: l_homicide #> Observations: 550 #> Weights: weights_vector #> Standard-errors: Custom #> Estimate Std. Error t value Pr(>|t|) #> post::1 0.075142 0.03538 2.12387 0.034127 * #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #> RMSE: 263.4 Adj. R2: 0.052465
Uses the estimation procedures recommended from Borusyak, Jaravel, Spiess (2021); Callaway and Sant'Anna (2020); Gardner (2021); Roth and Sant'Anna (2021); Sun and Abraham (2020)
event_study( data, yname, idname, gname, tname, xformla = NULL, weights = NULL, estimator = c("all", "TWFE", "did2s", "did", "impute", "sunab", "staggered") ) plot_event_study(out, separate = TRUE, horizon = NULL)
event_study( data, yname, idname, gname, tname, xformla = NULL, weights = NULL, estimator = c("all", "TWFE", "did2s", "did", "impute", "sunab", "staggered") ) plot_event_study(out, separate = TRUE, horizon = NULL)
data |
The dataframe containing all the variables |
yname |
Variable name for outcome variable |
idname |
Variable name for unique unit id |
gname |
Variable name for unit-specific date of initial treatment (never-treated should be zero or NA) |
tname |
Variable name for calendar period |
xformla |
A formula for the covariates to include in the model.
It should be of the form |
weights |
Variable name for estimation weights. This is used in estimating Y(0) and also augments treatment effect weights |
estimator |
Estimator you would like to use. Use "all" to estimate all. Otherwise see table to know advantages and requirements for each of these. |
out |
Output from |
separate |
Logical. Should the estimators be on separate plots? Default is TRUE. |
horizon |
Numeric. Vector of length 2. First element is min and second element is max of event_time to plot |
event_study
returns a data.frame of point estimates for each estimator
plot_event_study
returns a ggplot object that can be fully customized
out = event_study( data = did2s::df_het, yname = "dep_var", idname = "unit", tname = "year", gname = "g", estimator = "all" ) plot_event_study(out)
out = event_study( data = did2s::df_het, yname = "dep_var", idname = "unit", tname = "year", gname = "g", estimator = "all" ) plot_event_study(out)
Generate TWFE data
gen_data( g1 = 2000, g2 = 2010, g3 = 0, panel = c(1990, 2020), te1 = 2, te2 = 2, te3 = 2, te_m1 = 0, te_m2 = 0, te_m3 = 0, n = 1500 )
gen_data( g1 = 2000, g2 = 2010, g3 = 0, panel = c(1990, 2020), te1 = 2, te2 = 2, te3 = 2, te_m1 = 0, te_m2 = 0, te_m3 = 0, n = 1500 )
g1 |
treatment date for group 1. For no treatment, set g = 0. |
g2 |
treatment date for group 2. For no treatment, set g = 0. |
g3 |
treatment date for group 3. For no treatment, set g = 0. |
panel |
numeric vector of size 2, start and end years for panel |
te1 |
treatment effect for group 1. Will ignore for that group if g = 0. |
te2 |
treatment effect for group 1. Will ignore for that group if g = 0. |
te3 |
treatment effect for group 1. Will ignore for that group if g = 0. |
te_m1 |
treatment effect slope per year |
te_m2 |
treatment effect slope per year |
te_m3 |
treatment effect slope per year |
n |
number of individuals in sample |
Dataframe of generated data
# Homogeneous treatment effect df_hom <- gen_data(panel = c(1990, 2020), g1 = 2000, g2 = 2010, g3 = 0, te1 = 2, te2 = 2, te3 = 0, te_m1 = 0, te_m2 = 0, te_m3 = 0) # Heterogeneous treatment effect df_het <- gen_data(panel = c(1990, 2020), g1 = 2000, g2 = 2010, g3 = 0, te1 = 2, te2 = 1, te3 = 0, te_m1 = 0.05, te_m2 = 0.15, te_m3 = 0)
# Homogeneous treatment effect df_hom <- gen_data(panel = c(1990, 2020), g1 = 2000, g2 = 2010, g3 = 0, te1 = 2, te2 = 2, te3 = 0, te_m1 = 0, te_m2 = 0, te_m3 = 0) # Heterogeneous treatment effect df_het <- gen_data(panel = c(1990, 2020), g1 = 2000, g2 = 2010, g3 = 0, te1 = 2, te2 = 1, te3 = 0, te_m1 = 0.05, te_m2 = 0.15, te_m3 = 0)
a helper function that takes a fixest feols object (likely
from did2s
) that plugs into honest_did
. Note this function assumes
the event study coefficients are using i()
syntax, e.g. i(rel_year)
.
This should also work for a TWFE event-study model estimated by feols
.
get_honestdid_obj_did2s(est, coef_name = "rel_year")
get_honestdid_obj_did2s(est, coef_name = "rel_year")
est |
A |
coef_name |
Character. The name of the event-study relative-year
variable name, from |
A list containing the vector of event-study coefficients beta
, the
variance-covariance matrix of beta
, V
, and a vector of relative years,
event_time
.
a function to compute a sensitivity analysis using the
approach of Rambachan and Roth (2021) when the event study is
estimated using the did2s
package. Note that you should first
use the helper function get_honestdid_obj_did2s
to create the
object, obj
, that you will then pass into this function with
honest_did(obj)
honest_did_did2s( es, e = 0, type = c("smoothness", "relative_magnitude"), method = NULL, bound = "deviation from parallel trends", Mvec = NULL, Mbarvec = NULL, monotonicityDirection = NULL, biasDirection = NULL, alpha = 0.05, parallel = FALSE, gridPoints = 10^3, grid.ub = NA, grid.lb = NA, ... )
honest_did_did2s( es, e = 0, type = c("smoothness", "relative_magnitude"), method = NULL, bound = "deviation from parallel trends", Mvec = NULL, Mbarvec = NULL, monotonicityDirection = NULL, biasDirection = NULL, alpha = 0.05, parallel = FALSE, gridPoints = 10^3, grid.ub = NA, grid.lb = NA, ... )
es |
an object of class |
e |
event time to compute the sensitivity analysis for.
The default value is |
type |
Options are "smoothness" (which conducts a sensitivity analysis allowing for violations of linear trends in pre-treatment periods) or "relative_magnitude" (which conducts a sensitivity analysis based on the relative magnitudes of deviations from parallel trends in pre-treatment periods). |
method |
String that specifies the choice of method for constructing robust confidence intervals. This must be one of "FLCI", "Conditional", "C-F" (conditional FLCI hybrid), or "C-LF" (conditional least-favorable hybrid). Default equals NULL and the function automatically sets method based on the recommendations in Rambachan & Roth (2021) depending on the choice of Delta. If Delta = DeltaSD, default selects the FLCI. If Delta = DeltaSDB or DeltaSDM, default delects the conditional FLCI hybrid. |
bound |
String that specifies the base choice of Delta (to which additional sign and shape restrictions will be incorporated if specified by the user). This must be either "deviation from parallel trends" or "deviation from linear trend". If bound equals "deviation from parallel trends", then the function will select Delta^RM(Mbar) as the base choice of Delta. If bound equals "deviation from linear trends", then the function will select Delta^SDRM as the base choice of Delta. By default, this is set to "deviation from parallel trends". See Section 2.3.1 and 2.3.2 of Rambachan & Roth (2021) for a discussion of these choices of Delta. |
Mvec |
Vector of M values for which the user wishes to construct robust confidence intervals. If NULL, the function constructs a grid of length 10 that starts at M = 0 and ends at M equal to the upper bound constructed from the pre-periods using the function DeltaSD_upperBound_Mpre if number of pre-periods > 1 or the standard deviation of the first pre-period coefficient if number of pre-periods = 1. Default equals null. |
Mbarvec |
Vector of Mbar values for which the user wishes to construct robust confidence intervals. If NULL, the function constructs a grid of length 10 that starts at Mbar = 0 and ends at Mbar = 2. Default equals null. |
monotonicityDirection |
This must be specified if the user wishes to add an additional monotonicity restriction to Delta^SD(M). If "increasing", underlying trend specified to be increasing, delta_t >= delta_t-1. If "decreasing", underlying trend specified to be decreasing delta_t <= delta_t-1. Default equals NULL |
biasDirection |
This must be specified if the user wishes to add an additional bias restriction to Delta^SD(M). If "positive", bias is restricted to be positive, delta >= 0. If "negative", bias is restricted to be negative, delta <= 0. Default equals NULL. |
alpha |
Desired size of the robust confidence sets. Default equals 0.05 (corresponding to 95% confidence interval) |
parallel |
Logical to indicate whether the user would like to construct the robust confidence intervals in parallel. This uses the Foreach package and doParallel package. Default equals FALSE. |
gridPoints |
Number of grid points used for the underlying test inversion. Default equals 1000. User may wish to change the number of grid points for computational reasons. |
grid.ub |
Upper bound of grid used for underlying test inversion. Default sets grid.ub to be equal to twenty times the standard deviation of the estimated target parameter, l_vec * betahat. User may wish to change the upper bound of the grid to suit their application. |
grid.lb |
Lower bound of grid used for underlying test inversion. Default sets grid.lb to be equal to negative twenty times the standard deviation of the estimated target parameter, l_vec * betahat. User may wish to change the lower bound of the grid to suit their application. |
... |
Ignored. |
fixest
object returned in sparse formatThis function creates the left-hand-side or the right-hand-side(s) of a femlm
, feols
or feglm
estimation. Note that this function currently does not accept a formula
sparse_model_matrix( object, data, type = "rhs", na.rm = TRUE, collin.rm = TRUE, combine = TRUE, ... )
sparse_model_matrix( object, data, type = "rhs", na.rm = TRUE, collin.rm = TRUE, combine = TRUE, ... )
object |
A |
data |
If missing (default) then the original data is obtained by evaluating the |
type |
Character vector or one sided formula, default is "rhs". Contains the type of matrix/data.frame to be returned. Possible values are: "lhs", "rhs", "fixef", "iv.rhs1" (1st stage RHS), "iv.rhs2" (2nd stage RHS), "iv.endo" (endogenous vars.), "iv.exo" (exogenous vars), "iv.inst" (instruments). |
na.rm |
Default is |
collin.rm |
Logical scalar, default is |
combine |
Logical scalar, default is |
... |
Not currently used. |
It returns either a single sparse matrix a list of matrices, depending whether combine
is TRUE
or FALSE
. The sparse matrix is of class dgCMatrix
from the Matrix
package.
Laurent Berge, Kyle Butts
See also the main estimation functions femlm
, feols
or feglm
. formula.fixest
, update.fixest
, summary.fixest
, vcov.fixest
.
est = feols(wt ~ i(vs) + hp | cyl, mtcars) sparse_model_matrix(est)
est = feols(wt ~ i(vs) + hp | cyl, mtcars) sparse_model_matrix(est)