Get involved
Back to all

Estimation of Infections Based on Wastewater Data (US)

Methodology:

Estimation based on Biobot data (until April 2024)

Scripts and data can be found on Github.

Our methodology integrates wastewater surveillance data with epidemiological models to present a view of COVID-19’s prevalence across the United States, down to individual states. We sourced our wastewater data from Biobot Analytics [1], which reports the “effective concentration” of SARS-CoV-2 in sewage. Recognizing the variability in testing rates and accuracy, we devised an approach that has already been used in a similar way by Michael Hoerger (2020) [2] to estimate daily infections. We selected the initial five months of 2021—when testing was believed to be robust—and divided Biobot’s wastewater concentrations by IHME’s infection estimates to derive a base conversion factor.

This period exhibited a strong correlation (0.99) between the two data sources (IHME and Biobot), instilling confidence in our method. However, acknowledging the impact of SARS-CoV-2 mutations, particularly those associated with the Omicron variant, we adjusted our conversion factors from 2022 onwards. These adjustments—x1.53 for the onset of Omicron on 12/17/2021 and x2.28 for subsequent mutations on 08/01/2022—account for potential underestimations in viral concentration due to genetic changes in the N1 area affecting detection [3]. To smooth these transitions and avoid abrupt shifts in our estimates, we implemented a 30-day smoothing period for each adjustment phase.

Our visualizations, therefore, offer a dual perspective: the weighted average state and national wastewater concentrations derived from treatment plant data and our adjusted estimates of new daily infections, each represented on separate y-axes for comparison.

Estimation based on NWSS data (from April 2024)

After the end of regional data availability from Biobot, we switched to source data from CDC’s National Wastewater Surveillance System [4]. We only considered data normalized by flow (in gene copies / capita / day) for our infection estimates.

Similar to the previous approach, we derived a conversion factor from wastewater data to infections by dividing aggregated, weighted and smoothed NWSS wastewater data by the infection estimates we had obtained previously from Biobot data, and averaged over the selected period. We selected data from the most recent four months for this period, as the number of treatment plants contributing to the NWSS dataset has increased over time.

To provide a seamless historical perspective, Biobot’s earlier data (in copies per ml) was converted into the NWSS metric (copies per person) using region-specific conversion factors, allowing for a unified visualization across the entire time period.

Limitations:

Estimation based on Biobot data (until April 2024)

Our analysis relied on IHME infection data and Biobot wastewater data. However, wastewater data was available in the first 5 months of 2021 in the following subset of 10 states: MA, FL, IN, CA, KY, PA, MT, CT, VA, NV. Additionally, correlations higher than 0.75 happened in the following subset of 8 states: MA, FL, IN, CA, KY, PA, MT, VA. Therefore, interpretation of infection estimates for a number of states should be made with care because of the following reasons:

(1) The following 42 states did not have (continuous) wastewater data in the first 5 months of 2021, the period used to calculate conversion factors. Instead, national wastewater concentrations were used as an approximation [1].

CO, NJ, OR, DE, IL, WY, AZ, MN, NY, OK, HI, ID, TN, ME, WA, RI, VT, VI, AR, MO, TX, IA, NM, AL, WV, KS, MD, NH, NC, WI, LA, SC, MI, UT, GA, MS, NE, OH, DC, SD, ND, AK

An alternative could have been to calculate an estimate of wastewater data based on geographically close states. However such an approach would introduce assumptions that might bias our dataset towards the distribution of the geographically close states. Despite using national wastewater concentrations for our estimations, we acknowledge that this approach may not capture individual state specificities. However, we consider it allows for meaningful overall trends to be visible in our estimations.

(2) The following 16 states’ IHME infection data and Biobot data showed a correlation of less than 0.75 for the time period used to calculate the conversion factor.

NJ, CT, IL, NV, GA, VT, AK, ME; less than 0.5: ND, WA, OR, CO; less than 0.25: HI, MN, MI, VI

Out of these 16 states, 14 states used the national wastewater average as described in (1). The low correlation likely implies a difference between the national average and the missing local data, because of the previously mentioned strong correlation in the rest of the dataset. However, for CT, NV there was wastewater data available but the correlation was below 0.75. We consider the inclusion of this data still valuable, since it could represent diverse dynamics outside of the scope of our linear model. A low correlation does not necessarily imply a lack of relationship between variables, but rather it could imply a non-linear relationship. Therefore, we still include the data in our estimations since we believe it better represents the real-world complex system.

Estimation based on NWSS data (from April 2024)

Due to the time required for data collection and submission from all treatment plants, the most recent two weeks of visualizations should be interpreted with caution. For instance, the infection estimates for two weeks ago might be based on data from over 1,000 treatment plants, covering approximately one-third of the U.S. population. In contrast, the estimates for the most recent day may rely on mostly repeated data from treatment plants whose last reports were within the last two weeks, and any changes in estimates might be due to the newest data from only a few treatment plants, representing less than 1% of the population.

The visualization includes nationwide data and state-level data for states that met the following criteria during the period used to calculate the conversion factor: a minimum of 3 contributing plants, at least 25% of the population covered, and a correlation of at least 0.6 between NWSS wastewater data and Biobot infection estimates.

Sources:

Biobot Analytics SARS-CoV-2 Effective Concentration (SARS-CoV-2 copies / ml sewage)

IHME Infection estimates

NWSS Normalized SARS-CoV-2 Concentration (SARS-CoV-2 copies / capita)

References:

[1] Biobot web dashboard – COVID-19 Wastewater Monitoring in the U.S. https://biobot.io/data/covid-19. 2020. Accessed on April 2, 2024

[2] Pandemic Mitigation Collaborative – COVID-19 Forecasting Model. https://www.pmc19.com/data/index.php. 2020. Accessed on April 2, 2024

[3] Underestimation of SARS-CoV-2 in wastewater due to single or double mutations in the N1 qPCR probe binding region. 2024 Preprint. Jianxian Sun, Minqing Ivy Yang, Jiaxi Peng, Ismail Khan, Jhoselyn Jaramillo Lopez, Ronny Chan, Elizabeth Edwards, Hui Peng, https://doi.org/10.1101/2024.02.03.24302274

[4] NWSS Public SARS-CoV-2 Concentration in Wastewater Data. https://data.cdc.gov/Public-Health-Surveillance/NWSS-Public-SARS-CoV-2-Concentration-in-Wastewater/g653-rqe2/about_data. 2022. Accessed on July 31, 2024

Last reviewed on August 16, 2024

Together We Have the Power to Make a Difference

You can read more about how we work and are organized
Get involved Together We Have the Power to Make a Difference Together We Have the Power to Make a Difference