At midnight on Monday Morning, MIT health economist Jeffrey Harris posted an article claiming the New York City subway “was a major disseminator—if not the principal transmission vehicle—of coronavirus infection” that has killed upwards of 10,000 people. By Wednesday, the article started to make rounds among the New York media, sparking a debate about how much the city’s subway can be blamed for the coronavirus crisis in the city. The Metropolitan Transportation Authority, which runs the subway, said the study is “flawed—period.” On Thursday morning, Mayor Bill de Blasio (who does not control the subway; Governor Andrew Cuomo does) tacitly acknowledged the study’s validity although cautioned that it “appears to look at broad data and draw some initial conclusions.”
Several transit and health experts Motherboard spoke to disagreed that the study has much validity.
“This article is a bit confusing to me,” said Joshua Santarpia, an associate professor in pathology and microbiology and the University of Nebraska, who reviewed Harris’s study at Motherboard’s request. “The premise is reasonable, and in general, I agree with the premise that usage of a crowded subway during an outbreak of infectious disease will likely increase individual risk, and potentially contribute to the spread of the disease. However, I don’t see in this manuscript any definitive proof of the direct relationship.”
There are lots of bad studies out there, particularly ones being rushed out in the fog of coronavirus confusion. The lesson from this study is we should be extremely cautious about accepting any single explanation for coronavirus’s spread. The overwhelming likelihood is that there are many reasons this is happening to us, not one big reason, and this study is a perfect demonstration of the dangers of getting intoxicated with one explanation.
The study makes two basic claims to support the provocative title that “The Subways Seeded the Massive Coronavirus Epidemic in New York City.” First, it uses station entry data to demonstrate a correlation between a 65 percent drop in Manhattan subway swipes over the first half of March and reduced infection rates in the borough. This, Harris claims, is evidence the subway is responsible for coronavirus spread, although he calls it “no more than an indicator” that falls far short of causation. Nevertheless, he adds, this is enough evidence for him that the subway may be responsible for the virus’s spread.
The second claim overlays the subway map on a map of the city showing infection rates by zip code, with a zoomed in map specifically of the 7 and M/R lines in Manhattan and Queens. The implication of the paper is that once these maps are superimposed one on top of the other, the evidence is obvious and overwhelming that the subway is the culprit because various hotspots overlap with certain subway lines.
But it never provides any actual statistical evidence to back this up. Alon Levy, a mathematician and transportation expert, wrote a detailed critique including presenting the infection rates map without any subway superimposition. This map shows “clumps and clusters,” especially in the outer boroughs, but bears no resemblance to a subway map. Neighborhoods in Staten Island, the north Bronx, eastern Queens, and the Rockaways have high infection rates, but those are areas where the subway is either lightly used compared to the rest of the city or non-existent.
In fact, as one looks at the infection rate map, it’s easier to eyeball an inverse correlation with the subway map. All subway lines (except for the G and Staten Island Railway) converge in Manhattan, but Manhattan has the lowest infection rates of any borough.
Harris spins this around as support for his argument that the subway spreads the infection.
“As to the case of Manhattan, I think the evidence is remarkably clear,” he wrote to me in an email. “Most residents of Manhattan have had the resources to self-isolate, including avoiding the subways. The data clearly show that Manhattan subway use declined more rapidly and plummeted to less than 10% of regular peak.”
It’s worth digging into this borough-level argument because it is the only quantitative claim for the subway’s role in spreading coronavirus in the entire paper. And it is an especially hollow one.
There is an easy explanation for why Manhattan subway use declined much more rapidly than other boroughs that has little to do with infection rates of Manhattan residents. People stopped commuting. The time period of the analysis—March 2 through 16—coincides with when businesses started telling people to work from home and tourists cancelled trips. According to a 2012 NYU study, Manhattan’s daytime population “consists of approximately 1.61 million commuting workers, 1.46 million local residents, 404,000 out-of-town visitors, 374,000 local day-trip visitors, 17,000 hospital patients, and 70,000 commuting students.”
As Levy pointed out, any commuter who lives in another borough but works in Manhattan would have counted as both a lost trip in their home borough and Manhattan, but a Manhattan resident would essentially be double-counted. In other words, the larger decline in Manhattan station entries is not evidence Manhattanites specifically used the subway at a lower rate than all New Yorkers. It is evidence New Yorkers as a whole used the subway less, whether they lived in Manhattan or commuted there.
The problem here, as far as Harris’s study is concerned, is he is linking subway station entry data—which reflects where people work and live—to residential zip code infection rates, which only reflects where people live. Perhaps if we saw a similar pattern in other boroughs we might take this analysis more seriously. The problem is, we don’t. In the other four boroughs, there is no correlation between subway station entries and infection rates.
On top of that, Harris points to a coronavirus “hotspot” in Midtown West (between West 36th St and West 41st from Fifth Avenue to the Hudson River, but not including Hudson Yards) and the fact that the 7 train runs underneath it and into parts of Queens with high infection rates as evidence of the subway’s culpability. What he doesn’t explain is why just that one zip code in Manhattan is a hotspot but not any of the other subway-dense zip codes in the borough.
But anyone familiar with that hotspot area knows very few people actually live there. In fact, that zip code has a population of just 9,687 people, meaning it is a “hotspot” on a per capita basis even though it has just 157 total cases (a rate of 16.2 per 1,000). It looks bad on a map, but is a statistical anomaly due to its very low population. Its neighboring zip code to the east, with a population of 51,000 and which the 7 train also runs beneath—not to mention the Lexington Avenue subway line, one of the most crowded subway lines in North America that runs from the Bronx through Manhattan and into central Brooklyn—has an infection rate of 8.65 per 1,000, much more consistent with other Manhattan zip codes. Just 83 fewer cases in the “hotspot” would make it not a statistical anomaly, but a perfect fit with its surrounding neighborhoods.
Now, in no way am I denying that some people probably got infected with coronavirus on the subway. Obviously, cramming some 150 people into a metal box with recirculated air for prolonged periods during a pandemic will get some people sick. This is what makes the study so vexing. It is turning a very obvious observation about New York City life during a pandemic—most people take the subway; the virus is transmitted via people; therefore, the virus is transmitted partly via people on the subway—and trying to prove entirely too much from that.
“If I were acting as a peer reviewer for this manuscript,” Santarpia told Motherboard, “I would need several more days to really dig into it to truly analyze it, but there are several initial ‘red flags’ for me: 1. The lack of a straightforward logical development of the data into conclusions. 2. The style of writing makes it difficult to separate the author’s opinions from what he has demonstrated. 3. The lack of statistics that might effectively demonstrate the author’s point.”
One of the oddest but most revealing aspects of Harris’s study is also the one most important to keep in mind as more and more arguments about the source or cause of coronavirus inevitably float into the ether. Although the study is relatively short—17 pages including charts and graphs—Harris references John Snow and the water pump four times.
Snow was a physician in London who conducted a landmark study in 1855 tracing a cholera outbreak to a single contaminated well on Broad Street. This study, which involved interviewing hundreds of victims’ families, was the founding event of epidemiology, the definitive proof that cholera is transmitted via contaminated water, and one of medical history’s favorite anecdotes. It is a good, clean story about a disease outbreak shut down through hard work, intelligence, determination, and a deference to experts. It is, as Harris seems to suggest, an especially appealing story right now.
But it is also, as Harris points out himself, not a good analogue for coronavirus. “We cannot point to a definitive intervention comparable to the removal of the handle on the Broad Street Pump in St. James’s parish,” Harris wrote, although that doesn’t stop him from shoehorning Snow into the article three more times, once as a lamentation that nobody shut down the 7 train in February. It’s true that shutting down the subway in February would almost certainly have reduced infections, just as shutting down everything else would have done the same. To shut down the subway is to prevent people from going anywhere. It’s not at all clear that people get sick on the trains themselves or all the various places they go via the train, whether it be schools, offices, restaurants, or concert venues.
“The author clearly has an opinion about the role of the subway in spreading the disease in NY, and it seems as though he has tried to assemble data and anecdotes to support his conclusion, rather than asking the data a question and seeing if the answer is there,” Santarpia said.
There is no single explanation for why things have happened the way they have, no unifying theory for why New York City is suffering worse than most other places, including places like Seoul and Tokyo, cities with robust and packed mass transportation that have very low infection rates. The process of fully understanding New York City’s outbreak is not going to be a matter of superimposing one map on top of another, but combining standard epidemiological approaches with higher level analyses of government action and coordination (or lack thereof).
I asked Harris about this, and the problem of international comparisons specifically. Surely, I thought, the fact that other cities with even more crowded and widely used mass transportation is evidence mass transportation itself is not the cause of coronavirus’s uncontrollable spread as he asserts.
“Those cities and their social and culture systems differ in numerous ways from New York City,” Harris replied. “I would recommend that if you want to rebut the evidence in the manuscript, then you should look at the evidence presented in the manuscript.”
As confused by this remark as I was by the study as a whole—if those “social and culture systems” are different, perhaps therein lie the variables that explain the outbreak such as widespread mask, testing, and hand sanitizer availability combined with rapid government action?—I took Harris’s advice and returned to his manuscript for another review, just to make sure I didn’t miss anything.
To my surprise, I caught something I hadn’t noticed during my first several reads. But it didn’t prove the point Harris wanted it to.
Harris wrote that Snow’s well study “dramatically shut down a cholera outbreak.” But Snow’s own study says otherwise. Snow was right about the water pump, and the local council did remove the pump handle out of an abundance of caution. But, as Snow wrote in his landmark study, the infections and deaths had already stopped prior to the pump removal due to the “flight of the population, which commenced soon after the outbreak,” not unlike what happened in Manhattan in March. The peak of the London cholera outbreak was September 1, 1854; by the time the handle was actually removed from the pump nine days later, the daily death count had plummeted by 81 percent. It’s a small detail—that the removal of the pump handle was not the “definitive intervention” Harris thinks it was—but getting the little details right is often what matters most.
For his part, Harris was undeterred by the criticism of his study.
“As a researcher, I believe I have an obligation to put out data when the data point strongly to a conclusion that can have enormous — potentially disastrous — public consequences if ignored,” Harris said. “Should we ignore the data on the relationship between subway ridership and the spread of coronavirus because definitive proof has not been established? And then run the risk that the ‘renormalization’ will end up with a second wave? I did not think so at the time I posted this article at midnight on Monday morning, and I don’t think so now. And that goes double.”
As a New Yorker living under this horrid situation, I would love nothing more than for someone to determine the true cause, the “seed,” of the coronavirus outbreak, so we can remove whatever the equivalent of the pump handle is and all live something approaching normal lives again. If only it were so simple.