Propensity score matching implicitly weighs the matched treated observations to compute counterfactual outcomes.

The Stata command `-psmatch2-`

stores these weights in a variable called `_weight`

.

Someone pointed me to an old blog post somewhere on the Internet, which shows that there may be some confusion about what these weights are and where they come from.

K-neighbor matching estimates the counterfactual outcome for a treated observation by averaging the outcomes of its `K`

matches.

This means that every time an untreated observation is matched to a treated observation (and this can happen more than once when matching with replacement), it is used with “weight” `1/K`

since one is dividing by `K`

when averaging.

If one uses a caliper (i.e. excludes matches that are farther away than a minimum distance called a “caliper”) it can happen that some matches involve less than `K`

neighbors

So more generally the weight is not `1/K`

but rather `1/nr-of-matches`

(-psmatch2- saves the nr of matches for a given treated observation in the variable `_nn`

)

The variable `_weight`

sums these weights every time a control observation is used to construct a counterfactual outcome.

So let’s say that we are matching two treated observations to two neighbors with a caliper.

Then we may have that the first treated has two matches and the second treated only one match as in the following:

```
_id _treated _n1 _n2 _nn
1 1 3 4 2
2 1 3 . 1
3 0 . . .
4 0 . . .
```

The matched outcome for the first treated will be averaged across observations 3 and 4 and these have thus each weight 1/2 here.

The matched outcome for the second treated obs will be averaged across observation 3 and which thus has weight 1.

Note that in each case the weights equal `1/_nn`

.

Putting this together we can compute how often each matched untreated observation is used to construct the overall average counterfactual outcome by summing their weights:

```
_id _weight
3 1.5
4 0.5
```

For the example in the blog-post above the following code shows that this indeed gives the weights in the variable `_weight`

:

```
webuse cattaneo2, clear
set seed 795
g x=uniform()
sort x
psmatch2 mbsmoke prenatal1 fbaby mmarried medu fedu mage fage mrace frace, out(bweight) neighbor(5) caliper(.0295236) logit
tab _weight
rename _n* N* // otherwise reshape complains
reshape long N, i(_id) j(matchnr)
g altweight = 1 / Nn
collapse (sum) altweight, by(N)
tab altweight
```

The weights in `_weight`

are therefore not specific to `-psmatch2-`

, but they follow directly from the definition of a `K`

-neighbor matching estimator (independently of whether one matches on the propensity score or something else).