Get involved
Back to all

Estimation of Infections Based on Wastewater Data

Estimated case counts based on wastewater data are rough estimates. No major decisions should be made upon these estimates, nor should favorable trends be cause for abandoning one’s precautions.

Methodology:

Scripts and data can be found on Github.

To obtain 3-day averaged per capita wastewater SARS-CoV-2 signal, daily figures are transformed to millions of gene copies/day, interpolated (see point 3 in the description of how the estimation of daily new cases were obtained), and divided by the sewershed’s population, if necessary.

To obtain estimation of daily new cases:

  1. The equation displayed on figure S4 in Gerrity et al, 2021[1] was used to approximate the amount of SARS-CoV-2 RNA shed per gram of feces. Resulting figures were converted from log10 gene copies per gram of feces to billion gene copies per gram of feces, and then to billion gene copies per day, by multiplying the figures by 128g (the median fecal wet mass produced per person per day[2]). Fecal shedding in the 3 days preceding the peak was ignored, since it is estimated that the increase is steep, increasing in orders of magnitude[3] day-by-day. Therefore, the amount shed the day before fecal shedding peaks, would still pale in comparison to the amount shed the next day. Figures obtained using the process detailed here, are available on this spreadsheet. The day on which shedding peaks is day 0 on the spreadsheet, the day preceding it day -1, and the one proceeding it day 1; they will be referred to as such throughout the rest of this section.
  2. Daily SARS-CoV-2 RNA load figures were obtained from country-specific sources, see below.
  3. Daily SARS-CoV-2 RNA load figures were interpolated using Python. Through a process of trial and error, it was found that cubic splines resulted in overfitting, at times plunging the interpolated values into the negative. Therefore, we used linear interpolations for dates where no wastewater signal was reported. This assumes a constant rate of change from one reported value to another.
  4. The amount of new infections Infx at any day t (i.e., the number of people for whom RNA shedding peaked on that day) at each sewershed, were calculated as follows:
    \( \begin{align*} Infx(t) = \frac{C_{ww}(t) V_{ww}(t) – \sum_{i=1}^{13} Infx(t-i) S(i)}{S_0} \end{align*} \)

    with Cww the SARS-CoV-2 concentration in gc/L, Vww the volume of wastewater flow in L, and S the shedding in gc/person, for which at any day after infection:

    \( \begin{align*} S(i) = p\bullet f(i) \approx 101.674e^{-0.806i} \end{align*} \)

    Here, p is the constant (128g/person) representing the average amount of feces produced by a person in a day[2] and f is the time dependent viral fecal shedding rate in gc/g after infection, based on the equation from Gerrity et al.[1]

    \( “Assumed\ SARSCoV2\ Fecal\ Shedding\ Rate\ ({log}_{10}gc/gram)=\ -0.35\times Day+8.9” \)

    By subtracting the preceding 13 days’ RNA load from the Day 0 amount, the formula would account for those days’ infections’ contribution to the Day 0 wastewater SARS-CoV-2 signal.
    We considered only data from the last 14 days, as the exponential decline in shedding did not justify the inclusion of data further in the past.
  5. The figures resulting from step 4 were averaged against the figures obtained and to be obtained from the preceding and proceeding days’ calculations. This was done in order to smooth the output, making it easier to visualize trends. From here onward, the figures resulting from step 4 will be called the “3-day average”.

Over- or under-estimation may result from combining the contribution of pre- or post-peak cases, with that of cases during peak RNA shedding.

Uncertainty is also introduced by using average values for daily per capita fecal output and gene copies per gram of feces. If either figure is too high, overestimation would result, and vice versa if either figure is too low. Diurnal fluctuations in wastewater RNA load may also be a source of uncertainty. If samples are taken at a time of day when few people are awake, or when a substantial portion of a sewershed has left to work elsewhere, the daily wastewater SARS-CoV-2 signal would be too low, as it is extrapolated by multiplying per litre figures against the flow rate of a sewershed. The reverse may also be true.

Underestimation would result from the following:

  • Vaccinations lowering the amount of RNA shed per unit feces
  • SARS-CoV-2 infections being above average in the portion of a region not covered by the monitored sewersheds
  • Inaccuracies in E gene detection, especially if mutations in the SARS-CoV-2 genome have affected test sensitivity
  • Loss of RNA during transport.

References:

1. Gerrity D, Papp K, Stoker M, Sims A, Frehner W. Early-pandemic wastewater surveillance of SARS-CoV-2 in Southern Nevada: Methodology, occurrence, and incidence/prevalence considerations. Water Research X. 2021;10:100086. doi:10.1016/j.wroa.2020.100086

2. Rose C, Parker A, Jefferson B, Cartmell E. The Characterization of Feces and Urine: A Review of the Literature to Inform Advanced Treatment Technology. Crit Rev Environ Sci Technol. 2015;45(17):1827-1879. doi:10.1080/10643389.2014.1000761

3. Phan T, Brozak S, Pell B, et al. A simple SEIR-V model to estimate COVID-19 prevalence and predict SARS-CoV-2 transmission using wastewater-based surveillance data. Science of The Total Environment. 2023;857:159326. doi:10.1016/j.scitotenv.2022.159326

Country-Specific Information:

Finland

Source: Wastewater data from Finland, provided by THL under CC 4.0. For several regions, official estimates of cases for the catchment areas were available and are included in the graphs for comparison.

Canada

Sources: Metro Vancouver. Testing for the COVID-19 Virus in Wastewater; BCCDC. Wastewater. See detailed methodology regarding the different Health Regions here.

Netherlands

Source: RIVM. Aggregated data for provinces are shown first in the dropdown menu. Loading of data might need a few seconds.

United States (Shown are estimates based on historical data until September 2023)

Source: CDC, Public Health Surveillance, NWSS Public SARS-CoV-2 Concentration in Wastewater Data and NWSS Public SARS-CoV-2 Wastewater Metric Data. Estimations of new infections for states are based on extrapolation from data for the population served in these states. In several cases, this results in a overdependence on data from a few treatment plants, which is especially impactful and visible for the first Omicron wave. Coverage ratio and population density are both factors adding to uncertainty with regions like the District of Columbia and Hawaii showing a significant excess in population served vs their total population while estimates for states like Wyoming are still based on data from 11% of the total population. The estimate for new infections can at times starkly be impacted by the addition or removal of only one treatment plant to the dataset. Wastewater concentrations per capita also vary significantly between states, with a few consistently reporting less than 100 million gc/capita/day, leading, for example, to an underestimation of new cases in Michigan where official case numbers are higher than our estimates. New York data has been aggregated to include data from New York City.

Last reviewed on November 30, 2023

Together We Have the Power to Make a Difference

You can read more about how we work and are organized
Get involved Together We Have the Power to Make a Difference Together We Have the Power to Make a Difference