Propensity score matching implicitly weighs the matched treated observations to compute counterfactual outcomes.
The Stata command -psmatch2-
stores these weights in a variable called _weight
.
Someone pointed me to an old blog post somewhere on the Internet, which shows that there may be some confusion about what these weights are and where they come from.
K-neighbor matching estimates the counterfactual outcome for a treated observation by averaging the outcomes of its K
matches.
This means that every time an untreated observation is matched to a treated observation (and this can happen more than once when matching with replacement), it is used with “weight” 1/K
since one is dividing by K
when averaging.
If one uses a caliper (i.e. excludes matches that are farther away than a minimum distance called a “caliper”) it can happen that some matches involve less than K
neighbors
So more generally the weight is not 1/K
but rather 1/nr-of-matches
(-psmatch2- saves the nr of matches for a given treated observation in the variable _nn
)
The variable _weight
sums these weights every time a control observation is used to construct a counterfactual outcome.
So let’s say that we are matching two treated observations to two neighbors with a caliper.
Then we may have that the first treated has two matches and the second treated only one match as in the following:
_id _treated _n1 _n2 _nn
1 1 3 4 2
2 1 3 . 1
3 0 . . .
4 0 . . .
The matched outcome for the first treated will be averaged across observations 3 and 4 and these have thus each weight 1/2 here.
The matched outcome for the second treated obs will be averaged across observation 3 and which thus has weight 1.
Note that in each case the weights equal 1/_nn
.
Putting this together we can compute how often each matched untreated observation is used to construct the overall average counterfactual outcome by summing their weights:
_id _weight
3 1.5
4 0.5
For the example in the blog-post above the following code shows that this indeed gives the weights in the variable _weight
:
webuse cattaneo2, clear
set seed 795
g x=uniform()
sort x
psmatch2 mbsmoke prenatal1 fbaby mmarried medu fedu mage fage mrace frace, out(bweight) neighbor(5) caliper(.0295236) logit
tab _weight
rename _n* N* // otherwise reshape complains
reshape long N, i(_id) j(matchnr)
g altweight = 1 / Nn
collapse (sum) altweight, by(N)
tab altweight
The weights in _weight
are therefore not specific to -psmatch2-
, but they follow directly from the definition of a K
-neighbor matching estimator (independently of whether one matches on the propensity score or something else).