Estimation of Infections Based on Wastewater Data (Finland/Canada/Netherlands)
Estimated case counts based on wastewater data are rough estimates. No major decisions should be made upon these estimates, nor should favorable trends be cause for abandoning one’s precautions.
Methodology:
Scripts and data can be found on Github.
To obtain 3-day averaged per capita wastewater SARS-CoV-2 signal, daily figures are transformed to millions of gene copies/day, interpolated (see point 3 in the description of how the estimation of daily new cases were obtained), and divided by the sewershed’s population, if necessary.
To obtain estimation of daily new cases:
- The equation displayed on figure S4 in Gerrity et al, 2021[1] was used to approximate the amount of SARS-CoV-2 RNA shed per gram of feces. Resulting figures were converted from log10 gene copies per gram of feces to billion gene copies per gram of feces, and then to billion gene copies per day, by multiplying the figures by 128g (the median fecal wet mass produced per person per day[2]). Fecal shedding in the 3 days preceding the peak was ignored, since it is estimated that the increase is steep, increasing in orders of magnitude[3] day-by-day. Therefore, the amount shed the day before fecal shedding peaks, would still pale in comparison to the amount shed the next day. Figures obtained using the process detailed here, are available on this spreadsheet. The day on which shedding peaks is day 0 on the spreadsheet, the day preceding it day -1, and the one proceeding it day 1; they will be referred to as such throughout the rest of this section.
- Daily SARS-CoV-2 RNA load figures were obtained from country-specific sources, see below.
- Daily SARS-CoV-2 RNA load figures were interpolated using Python. Through a process of trial and error, it was found that cubic splines resulted in overfitting, at times plunging the interpolated values into the negative. Therefore, we used linear interpolations for dates where no wastewater signal was reported. This assumes a constant rate of change from one reported value to another.
- The amount of new infections Infx at any day t (i.e., the number of people for whom RNA shedding peaked on that day) at each sewershed, were calculated as follows:\( \begin{align*} Infx(t) = \frac{C_{ww}(t) V_{ww}(t) – \sum_{i=1}^{13} Infx(t-i) S(i)}{S_0} \end{align*} \)
with Cww the SARS-CoV-2 concentration in gc/L, Vww the volume of wastewater flow in L, and S the shedding in gc/person, for which at any day after infection:\( \begin{align*} S(i) = p\bullet f(i) \approx 101.674e^{-0.806i} \end{align*} \)
Here, p is the constant (128g/person) representing the average amount of feces produced by a person in a day[2] and f is the time dependent viral fecal shedding rate in gc/g after infection, based on the equation from Gerrity et al.[1]\( “Assumed\ SARSCoV2\ Fecal\ Shedding\ Rate\ ({log}_{10}gc/gram)=\ -0.35\times Day+8.9” \)
By subtracting the preceding 13 days’ RNA load from the Day 0 amount, the formula would account for those days’ infections’ contribution to the Day 0 wastewater SARS-CoV-2 signal.
We considered only data from the last 14 days, as the exponential decline in shedding did not justify the inclusion of data further in the past. - The figures resulting from step 4 were averaged against the figures obtained and to be obtained from the preceding and proceeding days’ calculations. This was done in order to smooth the output, making it easier to visualize trends. From here onward, the figures resulting from step 4 will be called the “3-day average”.
Over- or under-estimation may result from combining the contribution of pre- or post-peak cases, with that of cases during peak RNA shedding.
Uncertainty is also introduced by using average values for daily per capita fecal output and gene copies per gram of feces. If either figure is too high, overestimation would result, and vice versa if either figure is too low. Diurnal fluctuations in wastewater RNA load may also be a source of uncertainty. If samples are taken at a time of day when few people are awake, or when a substantial portion of a sewershed has left to work elsewhere, the daily wastewater SARS-CoV-2 signal would be too low, as it is extrapolated by multiplying per litre figures against the flow rate of a sewershed. The reverse may also be true.
Underestimation would result from the following:
- Vaccinations lowering the amount of RNA shed per unit feces
- SARS-CoV-2 infections being above average in the portion of a region not covered by the monitored sewersheds
- Inaccuracies in E gene detection, especially if mutations in the SARS-CoV-2 genome have affected test sensitivity
- Loss of RNA during transport.
References:
1. Gerrity D, Papp K, Stoker M, Sims A, Frehner W. Early-pandemic wastewater surveillance of SARS-CoV-2 in Southern Nevada: Methodology, occurrence, and incidence/prevalence considerations. Water Research X. 2021;10:100086. doi:10.1016/j.wroa.2020.100086
2. Rose C, Parker A, Jefferson B, Cartmell E. The Characterization of Feces and Urine: A Review of the Literature to Inform Advanced Treatment Technology. Crit Rev Environ Sci Technol. 2015;45(17):1827-1879. doi:10.1080/10643389.2014.1000761
3. Phan T, Brozak S, Pell B, et al. A simple SEIR-V model to estimate COVID-19 prevalence and predict SARS-CoV-2 transmission using wastewater-based surveillance data. Science of The Total Environment. 2023;857:159326. doi:10.1016/j.scitotenv.2022.159326
Country-Specific Information:
Finland
Source: Wastewater data from Finland, provided by THL under CC 4.0. For several regions, official estimates of cases for the catchment areas were available and are included in the graphs for comparison.
Canada
Sources: Metro Vancouver. Testing for the COVID-19 Virus in Wastewater; BCCDC. Wastewater. See detailed methodology regarding the different Health Regions here.
Netherlands
Source: RIVM. Aggregated data for provinces are shown first in the dropdown menu. Loading of data might need a few seconds.