Get involved
Back to all

Estimation of Infections Based on Wastewater Data (US)

Methodology:

Scripts and data can be found on Github.

Our methodology integrates wastewater surveillance data with epidemiological models to present a view of COVID-19’s prevalence across the United States, down to individual states. We sourced our wastewater data from Biobot Analytics [1], which reports the “effective concentration” of SARS-CoV-2 in sewage. Recognizing the variability in testing rates and accuracy, we devised an approach that has already been used in a similar way by Michael Hoerger (2020) [2] to estimate daily infections. We selected the initial five months of 2021—when testing was believed to be robust—and divided Biobot’s wastewater concentrations by IHME’s infection estimates to derive a base conversion factor.

This period exhibited a strong correlation (0.99) between the two data sources, instilling confidence in our method. However, acknowledging the impact of SARS-CoV-2 mutations, particularly those associated with the Omicron variant, we adjusted our conversion factors from 2022 onwards. These adjustments—x1.53 for the onset of Omicron on 12/17/2021 and x2.28 for subsequent mutations on 08/01/2022—account for potential underestimations in viral concentration due to genetic changes in the N1 area affecting detection [3]. To smooth these transitions and avoid abrupt shifts in our estimates, we implemented a 30-day smoothing period for each adjustment phase.

Our visualizations, therefore, offer a dual perspective: the raw national and weighted average (over counties) state wastewater concentrations and our adjusted estimates of new daily infections.

Limitations:

Our analysis relied on IHME infection data and Biobot wastewater data. However, wastewater data was available in the first 5 months of 2021 in the following subset of 10 states: MA, FL, IN, CA, KY, PA, MT, CT, VA, NV. Additionally, correlations higher than 0.75 happened in the following subset of 8 states: MA, FL, IN, CA, KY, PA, MT, VA. Therefore, interpretation of infection estimates for a number of states should be made with care because of the following reasons:

(1) The following 42 states did not have (continuous) wastewater data in the first 5 months of 2021, the period used to calculate conversion factors. Instead, national wastewater concentrations were used as an approximation [1].

CO, NJ, OR, DE, IL, WY, AZ, MN, NY, OK, HI, ID, TN, ME, WA, RI, VT, VI, AR, MO, TX, IA, NM, AL, WV, KS, MD, NH, NC, WI, LA, SC, MI, UT, GA, MS, NE, OH, DC, SD, ND, AK

An alternative could have been to calculate an estimate of wastewater data based on geographically close states. However such an approach would introduce assumptions that might bias our dataset towards the distribution of the geographically close states. Despite using national wastewater concentrations for our estimations, we acknowledge that this approach may not capture individual state specificities. However, we consider it allows for meaningful overall trends to be visible in our estimations.

(2) The following 16 states’ IHME infection data and Biobot data showed a correlation of less than 0.75 for the time period used to calculate the conversion factor.

NJ, CT, IL, NV, GA, VT, AK, ME; less than 0.5: ND, WA, OR, CO; less than 0.25: HI, MN, MI, VI

Out of these 16 states, 14 states used the national wastewater average as described in (1). The low correlation likely implies a difference between the national average and the missing local data, because of the previously mentioned strong correlation in the rest of the dataset. However, for CT, NV there was wastewater data available but the correlation was below 0.75. We consider the inclusion of this data still valuable, since it could represent diverse dynamics outside of the scope of our linear model. A low correlation does not necessarily imply a lack of relationship between variables, but rather it could imply a non-linear relationship. Therefore, we still include the data in our estimations since we believe it better represents the real-world complex system.

Sources:

Biobot Analytics SARS-CoV-2 Effective Concentration (SARS-CoV-2 copies / ml sewage)

IHME Infection estimates

References:

[1] Biobot web dashboard – COVID-19 Wastewater Monitoring in the U.S. https://biobot.io/data/covid-19. 2020. Accessed on April 2, 2024

[2] Pandemic Mitigation Collaborative – COVID-19 Forecasting Model. https://www.pmc19.com/data/index.php. 2020. Accessed on April 2, 2024

[3] Underestimation of SARS-CoV-2 in wastewater due to single or double mutations in the N1 qPCR probe binding region. 2024 Preprint. Jianxian Sun, Minqing Ivy Yang, Jiaxi Peng, Ismail Khan, Jhoselyn Jaramillo Lopez, Ronny Chan, Elizabeth Edwards, Hui Peng, https://doi.org/10.1101/2024.02.03.24302274

Last reviewed on April 18, 2024

Together We Have the Power to Make a Difference

You can read more about how we work and are organized
Get involved Together We Have the Power to Make a Difference Together We Have the Power to Make a Difference