This op-ed was originally published in the Washington Post on April 7, 2022.

The percentages the CDC reports don’t necessarily line up with census data. Here’s why.

Millions of Americans are now eligible for a second covid-19 booster shot. By all accounts, efforts to vaccinate older people in many states have gone well — unbelievably well, in fact. According to official Centers for Disease Control and Prevention (CDC) counts of vaccinations among those above age 65 as compared with census data, 117 percent of those in that demographic in Massachusetts have had at least one shot of a coronavirus vaccine. New Hampshire, not to be outdone, would show that no less than 140 percent of that group are vaccinated.

Remarkably, data for 26 states (including all of New England) and Washington, D.C., would indicate total numbers of vaccinated individuals 65 and older are running above 100 percent. How is it possible that government figures appear to show that more people have gotten vaccinated than in fact exist in that age group?

The CDC seems embarrassed about its own statistics

The CDC, however, is not actually reporting in any states that any populations are more than 100 percent vaccinated. The CDC reporting on vaccination rates has changed over time, in ways that seem to reflect its discomfort with the underlying statistics. A few months ago, the CDC would report that for subpopulations where vaccinations exceeded the size of that group that “just” 99.9 percent of that group was vaccinated. Thus, for example, for the states shown in the figure below, 99.9 percent of 65+ year-olds were reported to be vaccinated. Starting in late 2021, the CDC is reporting for these states that 95 percent of this age group is vaccinated, implying that vaccination rates somehow declined by 4.9 percent.

So why did the CDC change the way that it presents these numbers — and why don’t they report the actual statistics of above 100 percent? The CDC website gives a hint, noting that the “population coverage metrics are capped at 95%.” This could perhaps be interpreted as the CDC implicitly admitting that they are not reporting the percentages that correspond with the raw data because they would be higher than 100 percent and therefore mathematically impossible.

The figures on the left are the percentages of those above age 65 in each New England state with at least one covid-19 shot, as currently reported by the CDC. The figures on the right are contemporaneous, and calculated by taking the number of people aged 65 and older with at least one dose as reported by the CDC, compared to the number of people above age 65 for each state as of 2021, as reported by Figure by Zhen Guo, Northeastern University.

The CDC likely decided that reporting figures of 99.9 percent was too implausible, so they settled on not reporting percentages higher than 95 percent because they wanted people to believe the numbers they were reporting. Or, more charitably, the decision was that the 95 percent was closer to the right answer than 99.9 percent (which surely it is). However, this is as if, when presented with a thermometer that registered 150 degrees for a patient, a doctor notes it on the patient’s chart as 105 degrees. That notation would surely be closer to the right answer than 150 degrees, but the information would be useless for diagnosis, and such a practice would obscure the core problem of the broken instrument.

So what is going on?

Buried deeper in the CDC website is an explanation of why the figures are so weird: Sometimes the data that the CDC has access to fail to link individuals to doses. This means that first doses are overestimated, because second and third doses are attributed as being a first dose for someone else. (See this page, and the detailed footnotes at the bottom.)

These reporting challenges will only get worse as people line up for a second booster shot. Very likely, the CDC’s underlying figures will soon show that more than 100 percent of those above age 65 across every U.S. state have had at least one shot.

The bigger issue here is that all the data we have on U.S. vaccinations are subject to these distortions. It is especially visible for those above age 65 because the vaccination levels of older Americans are much higher than younger Americans, resulting in percentages that are impossible. The reported vaccination rates among younger Americans are more plausible, but there is no reason to believe that the data are any better. Aggravating the issue is that different states likely have different patterns of errors, making interstate comparisons problematic.

This has major implications for science, politics and policy. These data are used by outlets from the New York Times and The Washington Post to the Mayo Clinic to portray who has been vaccinated across the United States. A recent paper in Nature evaluating various survey methods used the CDC data as the “ground truth.” (However, its findings are based on pre-booster data, which seem to have fewer issues).

And the data may misinform estimates of how effective vaccines are, and how vulnerable various populations may be to another covid-19 wave. The accuracy of these data is vitally important, in part because policymakers use them to inform life-or-death decisions about vaccination funding and priorities, as well as overall pandemic response measures.

It’s notable that the U.S. government is far better at collecting data in other areas. A lot of attention has been paid to the problem of how to count well in the area of economic statistics. The results are still imperfect, but they are far better than for public health. It is inconceivable, for instance, that our system of economic statistics would yield obviously impossible statistics, such as negative unemployment.

If federal agencies and state governments were willing, some of the problems with vaccination data could be tackled through modest changes such as requiring the provision of a middle name on vaccination cards. That would make it far easier to link individuals to their vaccinations. Even when the data are messy, as they currently are, there are ways to clean them. The transparent application of well-known statistical methods for record linkage could produce much better estimates of vaccination rates than the raw aggregate data the CDC currently provides. When it comes to counting with covid-19, we should be as accurate as we possibly can be — or else we risk losing the lives that hang in the balance.

David Lazer (@davidlazer) is University Distinguished Professor of Political Science and Computer Sciences at Northeastern University and co-director of the COVID States Project.

Read journal articleRead reportLearn more