Understanding weight calculations in psmatch2

Feb 24, 2023

Propensity score matching implicitly weighs the matched treated observations to compute counterfactual outcomes.

The Stata command -psmatch2- stores these weights in a variable called _weight.

Someone pointed me to an old blog post somewhere on the Internet, which shows that there may be some confusion about what these weights are and where they come from.

K-neighbor matching estimates the counterfactual outcome for a treated observation by averaging the outcomes of its K matches.

This means that every time an untreated observation is matched to a treated observation (and this can happen more than once when matching with replacement), it is used with “weight” 1/K since one is dividing by K when averaging.

If one uses a caliper (i.e. excludes matches that are farther away than a minimum distance called a “caliper”) it can happen that some matches involve less than K neighbors

So more generally the weight is not 1/K but rather 1/nr-of-matches

(-psmatch2- saves the nr of matches for a given treated observation in the variable _nn)

The variable _weight sums these weights every time a control observation is used to construct a counterfactual outcome.

So let’s say that we are matching two treated observations to two neighbors with a caliper.

Then we may have that the first treated has two matches and the second treated only one match as in the following:

_id _treated _n1 _n2 _nn
      1   3   4   2
      1   3   .   1
      0   .   .   .
      0   .   .   .

The matched outcome for the first treated will be averaged across observations 3 and 4 and these have thus each weight 1/2 here.

The matched outcome for the second treated obs will be averaged across observation 3 and which thus has weight 1.

Note that in each case the weights equal 1/_nn.

Putting this together we can compute how often each matched untreated observation is used to construct the overall average counterfactual outcome by summing their weights:

_id _weight
  3     1.5
  4     0.5

For the example in the blog-post above the following code shows that this indeed gives the weights in the variable _weight:

webuse cattaneo2, clear
set seed 795     
g x=uniform()  
sort x   
psmatch2 mbsmoke prenatal1 fbaby mmarried medu fedu mage fage mrace frace, out(bweight) neighbor(5) caliper(.0295236) logit
tab _weight
rename _n* N* // otherwise reshape complains
reshape long N, i(_id) j(matchnr)
g altweight = 1 / Nn
collapse (sum) altweight, by(N)
tab altweight

The weights in _weight are therefore not specific to -psmatch2-, but they follow directly from the definition of a K-neighbor matching estimator (independently of whether one matches on the propensity score or something else).