Module 09: Fuzzy Regression Discontinuity Design

This module covers fuzzy regression discontinuity design (RDD) where the cutoff point increases the probability of an individual receiving a treatment but does not completely guarantee an individual above a threshold will receive the treatment. For example, a scholarship awardee may not accept the award by choice or for some other reason. In contrast, a sharp RDD a threshold completely determines a treatment. For example, a test score above a certain point automatically places the test taker in some other category.

Idea Source

The real allure of the RD Design is that it allows us to assign the treatment or program to those who most need or deserve it. Thus, the real attractiveness of the design is ethical – we don’t have to deny the program or treatment to participants who might need it as we do in randomized studies. - Web Center for Social Research Methods

Intended Learning Outcomes

By the end of this session, you should be able to:

Identify a research scenario that would be appropriate for a regression discontinuity analysis
Conduct a regression discontinuity analysis and draw appropriate conclusions

Have you used these methods before?

Why Regression Discontinuity Design

Please see An Introduction to Regression Discontinuity Design for additional detail. Information in this section was taken from this resource.

Regression Discontinuity methods emerged in the late 1990s and offer a strong design for “estimating a treatment effect in a non-experimental setting when the treatment is determined by whether an observed ‘assignment’ variable exceeds a known cutoff point.” The comparison of individuals slightly above and below the cutoff point is used to estimate the effect of the treatment.

e.g. Medical benefits begin a a specific age. Do the benefits have an impact on death rate? Compare age vs. death rate and examine individuals close to the cutoff value of 65 years.

Assumptions are critical: individuals on either side of the cutoff are similar aside from the treatment, individuals do not choose their assignment, variation is random around the cutoff value.

One disadvantage is possibly low external validity – results may not be generalizable beyond those individuals.

Two major types: sharp and fuzzy. Sharp - p(treatment) after cutoff = 1. Fuzzy - p(treatment) after cutoff increases but depends on other factors.

Graphs: Plot the average Y value for some bins of the X variable. (you have to choose the bin size to balance smoothness and being able to visualize the discontinuity.)

Additional reading:

Scenario

Data science students (n = 21) completed a national certification exam that included standard questions and bonus questions. Points from the bonus questions could be applied to the overall score under two conditions: 1) students correctly answered the bonus questions and 2) students scored a minimum of 95% on the base questions. Some students did score 95% or higher but did not complete the bonus questions due to time constraints. Note: students that reported a final overall score of 100% or higher were automatically considered for a prestigious internship.

Goal: Determine the expected amount of additional points by students who were able to successfully apply bonus points to their base score and thus gain an advantage toward the internship application. You may wish to use the library ‘rddtools’.

Analysis

library(MASS)
set.seed(1234)

#Generate sample data where students scoring less than 95 cannot apply bonus points and students scoring 95 or more base points apply bonus points with a probability of 90%. In this way, the application of bonus points is only partially determined by the base test score and the score threshold. 

#set up variables
mu <- c(0, 0)
sigma <- matrix(c(1, 0.7, 0.7, 1), ncol = 2)
offset<-95
n<-1000
boost<-3

#generate data frame
d <- as.data.frame(mvrnorm(n, mu, sigma))

#offset scores 
d <- d+offset

#rename data frame columns
colnames(d) <- c("W", "Y")

# introduce fuzziness
d$treatProb <- ifelse(d$W < offset, 0, 0.9) #was 0

fuzz <- sapply(X = d$treatProb, FUN = function(x) rbinom(1, 1, prob = x))

# treatment effect
d$Y <- d$Y + fuzz * boost

#check the simulated data set
head(d)

Plot the simulated data to visualize any patterns.

# generate a colored plot of treatment and control group
plot(d$W, d$Y,
     col = c("#00BBFF33", "#FF330033")[factor(fuzz)], 
     pch= 20, 
     cex = 2,
     xlim = c(offset-2, offset+2),
     ylim = c(offset-3, offset+7),
     xlab = "Base Score",
     ylab = "Final Score")

# add a dashed vertical line at cutoff
abline(v = offset, lty = 2)
abline(h = 100, lty = 2)

Estimate the regression discontinuity.

# estimate the Fuzzy RDD
data <- rdd_data(d$Y, d$W, 
                 cutpoint = offset,
                 z = d$treatProb)

frdd_mod <- rdd_reg_lm(rdd_object = data, 
                       slope = "same")
frdd_mod

## ### RDD regression: parametric ###
##  Polynomial order:  1 
##  Slopes:  same 
##  Number of obs: 1000 (left: 521, right: 479)
## 
##  Coefficient:
##   Estimate Std. Error t value  Pr(>|t|)    
## D  2.85667    0.11277  25.331 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Plot the estimated fuzzy regression discontinuity regression model.

# plot estimated FRDD function
plot(frdd_mod, 
     cex = 1, 
     lwd = 0.4,
     xlim = c(offset-2, offset+2),
     ylim = c(offset-3, offset+7),
     xlab = "Base Score",
     ylab = "Final Score")

# add a dashed vertical line at cutoff
abline(v = offset, lty = 2)

Conclusions

The application of bonus points leads to an increase in the overall score by about 2 points which is a significant result at \(\alpha = 0.05\). Student scoring \(\geq 95\) that do not apply the bonus points are called ‘no-shows’ as they have the opportunity for treatment but avoid receiving the treatment by choice or some other circumstances.

One student received an automatic invitation to be considered for a prestigious data science internship due to their final score in excess of 100 total points.

Review of Intended Learning Objectives

How do you feel about your ability to:

Identify a research scenario that would be appropriate for a regression discontinuity analysis? When would that be applicable?
Conduct a regression discontinuity analysis and draw appropriate conclusions?
Are you comfortable with these methods?

Data Analysis for Social Scientists