Predicting Influenza-Like Illness in Kent County, Michigan

Background

Seasonal influenza presents major challenges for public health systems worldwide. As reported by the World Health Organization (2023), approximately one billion cases of seasonal influenza occur each year. During the 2022-2023 season, there were approximately 31 million cases of symptomatic influenza in the United States, with approximately 375,000 cases requiring medical treatment and 21,000 leading to death (CDC, 2023). While influenza affects all populations, older adults, individuals with chronic health conditions, and pregnant persons are among the sub-populations vulnerable to more severe outcomes, including death (CDC, 2023).

Severe influenza seasons pose a risk to healthcare infrastructure in the United States by causing medical surges in emergency departments, resulting in patient crowding and treatment delays (King, et al., 2017). In such emergent situations, public health leaders must make critical decisions in often rapidly shifting environments, with little to no information. While traditional surveillance systems regularly provide updates on flu transmission in the community, the data is inherently delayed due to the time required to test and confirm illness.

Influenza forecasting helps fill these gaps by offering near real-time insight to guide public health intervention and response. Lutz et al. (2019) defines a forecast as a “quantitative, probabilistic statement about an unobserved event, outcome, or trend and its surrounding uncertainty, conditional of previously observed data.” Forecasting models often include secondary, real-time digital data streams such as satellite imagery, weather/climate conditions, flight records, internet search trends, social media discourse, and web traffic to predict influenza activity (Lutz, et al., 2019). The Centers for Disease Control and Prevention (CDC) supports national flu forecasting through its annual FluSight challenge.

Despite this growing interest in disease forecasting, only a handful of studies assess the applicability in local public health. Models using national or regional data to forecast influenza have been shown to have limited utility in localized outbreaks; Kandula, et al., (2017) demonstrated significant improvements in subregional forecasting accuracy with models based on subregional data compared to those extrapolated from the national models. Further research and applied development in local public health forecasting and analytics is clearly necessary. This study focuses on the applicability of local-level digital data in local public health influenza forecasting, using Kent County, Michigan as a case study.

Hypothesis: Local-level influenza models utilizing digital data streams (weather/climate conditions, internet search trends, emergency department visits) produce forecasts with significantly more accuracy than those relying on historical incidence alone.

Literature Review

Google Search Trends is a widely used tool for gathering online information in healthcare research. Online search behaviors, as reflected in Trends data, often correlate with disease incidence and prevalence. For instance, Mavragani & Ochoa (2018b) used linear regression to describe significant associations between Google Trends data and national and state-level Chlamydia and Hepatitis incidence, with further associations found between AIDS-related Google search terms and AIDS prevalence in the US (Mavragani & Ochoa, 2018a). Similarly, Husnayain et al., (2019) used a moving average analysis to describe the relationship between lagged search trends and nationally reported Dengue incidence in Indonesia. Likewise, Kandula et al., (2017), developed ILI nowcasting models at state and regional levels based on Google search trends data, using both a simple autoregressive integrated moving average (ARIMA) model and a random forest.

Variables related to weather and pollution may also serve as useful early indicators of respiratory disease incidence. For instance, Ku, et al. (2022) successfully nowcasted ER visits for respiratory diseases using climatic and air-pollution factors in urban Seoul. Their findings, which illustrated positive associations between atmospheric pressure, carbon monoxide, PM2.5 exposure, and respiratory disease patients, were consistent with prior research by Lu et al. (2020), who demonstrated similar correlations in Beijing.

The literature reveals wide variation in the methods and variables used for ILI forecasting, with a common focus on improving model performance through machine learning. Examples include Yang et al., (2023), who deployed a neural network integrating climate/weather data, internet search trends, and social media discourse for influenza prediction, and Athanasiou et al., (2023) who leveraged Twitter data alongside local weather for accurate modeling in Greece.

In situations where communicable disease forecasting results in multiple models with varying predictive accuracies, ensemble-based modeling methods may be employed to harness the strengths of each model. While relatively few studies describe the application of this technique at the local level, notable successes include Lu et al., (2018), who fused multiple models based on search trends, Twitter data, and insurance claims to produce accurate influenza forecasts for the City of Boston. Similarly, Soliman et al., (2019), used several machine learning techniques to construct and fuse influenza forecasting models for Dallas County, Texas, based on Google search trends and local weather.

As emphasized by Kandula et al. (2017), influenza forecasting models built at a local scale, based on local data, tend to predict incidence more accurately than their state-level counterparts. The emerging field of local public health disease forecasting holds promise, with county-specific models showing significant potential to drastically improve health outcomes through timely public health intervention and response.

This paper describes research into the application of county-specific ILI-forecasting methods in Kent County, Michigan, which is a novel study for the state. While the models and specific results presented here may have limited applicability in other counties, the methodologies described in this research may hold generalizable value to surrounding health departments in the state of Michigan.

Kent County, the fourth most populous Michigan county with approximately 655k residents, is characterized by a vibrant manufacturing industry centered around the City of Grand Rapids. The ethnic makeup consists of 78% White, 10% Black, and 7% Multiracial residents. At the time of this writing, the Kent County Health Department uses emergency department syndromic surveillance to monitor and assess influenza transmission in the community (J. Payne, personal communication, May 2023). Given the county’s diverse population, industrial profile, and established public health practices, it serves as an excellent candidate for a case study in disease forecasting at the local level.

The rest of this paper evaluates the effectiveness of machine learning models using health-related online data for influenza forecasting in Kent County, following the methodology of Soliman et al. (2019) closely. County-specific data, including Google search trends for influenza-related terms, weather-related variables (wind, precipitation, snow, temperature), air quality measures (AQI, CO, ozone, PM10, PM2.5, and NO2), and ED visits are used as model regressors in the standard ARIMA model and machine learning elastic net and random forest models. The presented research findings not only contribute to improved public health prevention and mitigation in the county, but also support further capacity-building in analytics and disease forecasting in local public health moving forward.

Methods

Influenza Population-Based Surveillance Data Collection

ILI case counts, the outcome measure of interest, are obtained from the Michigan Disease Surveillance System (MDSS) for the period between 2005-2019. The case definition involves a fever over 100 °F along with sore throat, cough, or both, with no alternative explanation of illness besides ILI. While this case definition may capture other respiratory disease cases, influenza typically drives sudden increases in ILI reporting in the community (Michigan Influenza Sentinel Provider Surveillance, n.d.). Sentinel physicians and laboratories actively participate with the State of Michigan Outpatient Influenza-Like Illness Surveillance Network (ILINet) to report confirmed cases of ILI (Michigan Influenza Sentinel Provider Surveillance, n.d.).

Aggregated ILI-related ED visits from 2005 through 2019 are sourced from the Michigan Syndromic Surveillance System (MSSS), which collects real-time chief complaint data from participating emergency departments and urgent care centers (Utilizing the MSSS, n.d.). ILI-related visits are defined as visits with case notes containing keywords flu, fever, cough, or sore throat, and excluding terms like stomach, shot, imm, tamiflu, vaccin, inject, mist, reflux, fluid, flutter, fluctuat, flushed, fluent.

Internet Search Trends Collection

Influenza-related internet search trends are acquired through the Google Extended Trends (GET) Application Programming Interface (API). GET is distinct from the more widely known online Google Trends tool and requires researcher approval from the Google team (Google, n.d., Trends API, n.d.). Search trends data from the API represent the raw probability (x107) of specific terms being searched in a specific time and Designated Market Area (DMA), drawn from a representative sample of all searches in that period and location (Getting Started Guide, 2020). For this study, Google category code 419 was used to filter for health-related searches and DMA code US-MI-563 was used to represent Kent County, MI. Weekly trends for flu, fever, cough, and cold were collected from 2005 through 2019.

Meteorological Data Collection

Weather and meteorological data were collected via the National Centers for Environmental Information (NCEI) Access Data Service API, which provides programmatic access to the NCEI Global Historical Climate Network – Daily dataset (Menne, et al., 2012). Daily measurements for observed minimum, maximum, and mean temperatures, alongside daily precipitation and average windspeed were collected for the years 2005-2019, then aggregated to represent the weekly average of each measurement. The station id used was GHCND:USW00094860, which represents the Grand Rapids Gerald R. Ford International Airport station.

Air quality Data Collection

Air quality data are provided by the Environmental Protection Agency (EPA) Air Quality System (AQS), a data repository for daily measurements of ambient levels of certain pollutants mandated by The Clean Air Act (AQS User Guide, 2022). Daily measurements from 2005 to 2019 for overall air quality index (AQI), carbon monoxide, ozone, PM10, and PM2.5, along with the main contributing pollutant, were collected. Weekly average daily pollutant exposures are calculated by averaging daily numerical measurements, and days exposed to moderate or bad air quality were recorded according to EPA’s AQI categories:

0-50 - Good
51-100 - Moderate
101-150 - Unhealthy for Sensitive Groups
151-200 - Unhealthy
201-300 - Very Unhealthy
301-500 - Hazardous for Health

Ethical considerations

The publicly available data included in this research does not meet the Institutional Review Board (IRB) definition of research involving human subjects because it lacks individual identifiers and is compiled in aggregate (Human Research Protection Program, 2019). Subsequently, this study did not require review and approval from an IRB.

Data Preparation       

Several preprocessing steps are applied to the variables in this analysis to ensure model compatibility and interpretability. The continuous variables, total ED visits and average temperature, precipitation, windspeed, and AQI, are scaled with the StandardScaler function in scikit-learn, and the cyclical nature of the variable Epiweek (week of the epidemiologic year) is addressed through a sine-cosine transformation, as described in Stolwijk, et al., (1999). Seasonality in the data is examined through a stacked bar chart of monthly continuous variable averages, and a Chi-Square test of association is performed between the main pollutant variable and the current month.

Next, the data is split into training (2005-2015), validation (2016-2017), and testing (2018-2019) sets. Lagged continuous variables (up to 3 weeks) are incorporated as additional regressors to account for potential delayed effects, as was done in Soliman, et al. (2019). Finally, a baseline model is established using seasonal naïve forecasting, which predicts ILI counts based on the average of observed values one and two years prior to the current observation, for use as a reference to determine if the more complex forecasting models are worth implementing.

Model Development

Three methods of autoregressive forecasting are employed and compared to the seasonal naïve model:

SARIMAX Model: Frequently employed in disease forecasting, the Seasonal autoregressive integrated moving-average with exogenous regressors model is specified as SARIMAX(p, d, q)(P, D, Q, s), where p, d, and q represent the autoregressive, differencing, and moving average orders; P, D, and Q represent their seasonal equivalents; and s represents the length of the season (52 for weekly data) (Kandula & Shaman, 2019). To assess if the SARIMAX modeling technique is suitable for Kent County influenza forecasting, ILI data undergoes autocorrelation analysis.

Elastic Net Regression: an extension of the generalized linear model, elastic net regression handles multicollinearity among a large number of exogenous variables by incorporating penalty terms to remove covariates with coefficients close to zero (Soliman, et al., 2019).

Random Forest: random forest is an ensemble-based machine learning regression method that leverages the average output of a “forest” of regression decision trees to arrive at a final result (Kane, et al., 2014).

Random forest and elastic net regression algorithms include coefficient adjustment steps to include only the variables with a significant impact on the dependent variable. Following Soliman et al. (2019) methods to improve comparability among the forecasting models, a similar process is manually applied to the SARIMAX model, selecting only the variables with coefficients greater than 0.05.

Training, Validation, and Assessment

The Python libraries SciKit-Learn and StatsModels are used to build the models, and the Skforecast library is used for iterative training and validation. A simulated prospective analysis is used for testing, employing a rolling origin approach to successively refit on actual data from each period before proceeding to the next forecast. In the evaluation stage, model mean absolute errors (MAEs) and root mean squared errors (RMSEs) are recorded and compared for 1-week and 2-week ILI forecasts.

Results

Descriptives

Continuous exogenous variables ED visits, weather-related variables (wind, precipitation, snow, temperature), air quality measures (AQI, CO, ozone, PM10, PM2.5, and NO2), and Google search trends for flu, cough, and cold exhibit seasonal patterns, as illustrated in Figure 1. Higher temperatures are observed during the summer months, while colder temperatures and more ILI-related Google searches occur in the winter. Boxplots of ILI-related internet searches, ED visits, and environmental variables in Figure 2 have skewed distributions with notable upper outliers. In particular, Google searches for “flu” exhibit a significant spike in Fall 2009, which may be related to the 2009 Swine Flu epidemic (Lazer, et al., 2014). Categorical main contributing pollutants have seasonal associations similar to the continuous variables, as shown in Figure 3. A Chi-Square test confirms that Ozone is more frequent in the summer, while PM2.5 is more frequent in the winter. Significant multicollinearity is apparent in the continuous model variables. The heatmap in Figure 4 illustrates correlation coefficients among the variables, with the strongest associations between the temperature-related variables and Google search trends.

 

 Forecasting models

The SARIMAX, elastic net regression, and random forest models each identify different key variables for influenza-like illness (ILI) forecasting. The SARIMAX model focuses on variables such as epiweek, a moving average, Main Pollutant PM2.5, average temperature lagged by 2 weeks, and Main Pollutant Ozone. On the other hand, elastic net regression emphasizes variables including cases lagged 52 weeks, cases lagged 1 week, epiweek, minimum temperature, and average temperature. Meanwhile, the random forest model highlights cases lagged 52 weeks, cases lagged 1 week, epiweek, overall AQI lagged 2 weeks, and cases lagged 2 weeks as significant predictors.  Figures 5 and 6 present one-week and two-week ILI forecasts, respectively, with corresponding model error measurements in Tables 1 and 2. The random forest outperformed the others for both 1-week and 2-week ILI forecasts. It achieved a mean absolute error (MAE) of 195 and a root mean square error (RMSE) of 323 for 1-week predictions, and an MAE of 204 and RMSE of 347 for 2-week predictions. In contrast, the SARIMAX model was the least accurate.

Discussion

This study demonstrates the utility of emergency department visit data, local air/meteorological quality, and Google search trends, paired with population-based surveillance data, to forecast ILI at the county level. The findings illustrate the importance of timely and geo-specific analyses to inform public health preparedness, healthcare delivery capacity, and response during influenza season.

As one of only a handful of studies examining the application of digital epidemiology disease forecasting methods at a local scale, and the only one in Michigan, comparison with existing literature is limited, but its findings are consistent with broader research. The random forest model’s superior accuracy in ILI prediction supports previous findings that random forest modeling techniques are more efficient and accurate than other machine learning techniques (Kandula, et al., 2017, Kane, et al., 2014, Kumar, et al., 2024).The poorer performance of the SARIMAX model also aligns with previous literature (Dai, et al., 2023, Xu, et al., 2017, Lu, et al., 2018, Cheng, 2020, Soliman et al., 2019). These findings suggest that ARIMA-type models are less effective when analyzing digital data from multiple online sources, perhaps due to its inherent assumption of linearity between regressors and target variables.

Taken as a whole, the results of this study show promising potential for the application of digital epidemiology disease forecasting methods in local public health. For Kent County, recommended next steps include the development of a public-facing data dashboard to ensure that epidemiological data and analysis results are shared with community members, public health officials, and healthcare providers. By providing accessible data about disease trends, a Kent County Communicable Disease dashboard has the potential to improve public health preparedness and response.

Conclusion

This study explored the application of influenza forecasting methods in local public health, using diverse online data and traditional surveillance data to model ILI in Kent County, Michigan. Machine learning techniques SARIMAX, elastic net, and random forest were evaluated against a baseline model for predictive accuracy; the random forest model emerged as most accurate, while the SARIMAX model emerged as least accurate.

Strengths

One strength of this study is the use of the Google Extended Trends API, which differs from the more widely used Google Trends (GT) website. While both tools extract random search samples, their outputs are different. GT scales output relative to the peak result in a given timeframe, which requires researchers to extract the entire period of interest for consistent scaling. In contrast, the GET API provides absolute search probabilities to enable more reliable comparisons across various time periods. This is especially helpful in disease forecasting, where models are regularly updated with new data. Another strength in this research is the focus on influenza forecasting at the local level in Michigan. By focusing on Kent County, this study helps address a growing need for localized disease forecasting strategies, which are more effective than their broader national counterparts due to regional differences in disease patterns.

Limitations

These results illustrate the promising potential of leveraging online health data and machine learning methods for local influenza forecasting. However, it is important to note the impact of the COVID-19 pandemic on the interpretability of this case study. Pandemic-related social distancing and reluctance to visit medical facilities led to near-zero ILI cases reported in Kent County in 2020 (J. Payne, personal communication, June 2023). After the vaccine rolled out and restrictions were lifted, ILI case reports rose. Further, symptomatic overlaps between COVID and influenza complicate public health surveillance for both. While COVID cases may artificially inflate ILI case reporting by masquerading as flu, influenza outbreaks may enhance COVID surveillance by increasing viral testing in medical facilities.

In addition to issues related to the changing post-pandemic landscape, there are also issues with the representativeness and generalizability of these results to all communities in Kent County. The data used to develop and test ILI forecasting in Kent County likely underrepresent specific sub-populations in the community, such as those with low healthcare access or distrust in medical institutions. Moreover, Google search trends may not effectively capture information-seeking behaviors in individuals lacking internet access due to homelessness or other institutional barriers. In addition, machine learning models are complex and require advanced technical expertise; maintenance requirements of forecasting models may place a heavy burden on already resource-constrained health departments.

Implications

As COVID-19 progresses towards endemic status in the United States, some public health officials are developing a unified COVID/Influenza-like Illness (CLI/ILI) surveillance strategy. While the specific forecasting models presented here may not directly apply in the post-pandemic era, their methodologies may be adapted to predict ILI and CLI at the local level. Indeed, emerging literature exploring ILI/CLI syndromic surveillance has already leveraged established forecasting methods based on digital data and CDC-reported information to successfully forecast state and national trends (Ma, et al., 2023).

The ILI forecasting models created here should not be used for forecasting in Kent County due to the aforementioned limitations. Rather, these findings suggest that infectious disease modeling has promising applications in Michigan local public health. This study illustrates the need for investment in analytical capacity, and aligns with the CDC Center for Forecasting and Outbreak Analytics (CFA) 2023-2028 Strategic Plan as follows:

Predict: Enhancing outbreak preparedness through actionable analyses and response-ready modeling tools. Investments in local public health capacities are emphasized to develop and implement these tools effectively at the local level.

Inform: Prioritizing effective communication to leaders with practical decision support products. Increased investment in local public health will enable the creation of locally relevant communication products to guide decision-making.

Innovate: Driving technological innovation by providing additional funding for analytics capacity-building. This allows local public health departments to adopt advanced methods and tools for better outbreak prediction and response.

Advance: Building a world-class analytics organization by attracting a technically skilled workforce and promoting career development and training in advanced analytic methods. This ensures a robust and competent workforce at the local level through local public health investments.

As the first line of defense against public health emergencies, local health departments must continue implementing forecasting and outbreak analytics tools to coordinate with partners for outbreak prediction and response.

References

2023 Annual Report. (n.d.). CDC Center for Forecasting & Outbreak Analytics. Retrieved October 18, 2023, from https://www.cdc.gov/forecast-outbreak-analytics/pdf/cdc-cfa-annual-report-2023.pdf

2023-2028 Strategic Plan. (n.d.). CDC Center for Forecasting & Outbreak Analytics. Retrieved October 18, 2023, from https://www.cdc.gov/forecast-outbreak-analytics/pdf/CFA-Strategic-Plan.pdf

Athanasiou, M., Fragkozidis, G., Zarkogianni, K., & Nikita, K. S. (2023). Long Short-term Memory–Based Prediction of the Spread of Influenza-Like Illness Leveraging Surveillance, Weather, and Twitter Data: Model Development and Validation. Journal of Medical Internet Research, 25, e42519. https://doi.org/10.2196/42519

Caldwell, W. K., Fairchild, G., & Del Valle, S. Y. (2020). Surveilling Influenza Incidence With Centers for Disease Control and Prevention Web Traffic Data: Demonstration Using a Novel Dataset. Journal of Medical Internet Research, 22(7), e14337. https://doi.org/10.2196/14337

Centers for Disease Control and Prevention. (2023). Disease Burden of Flu. Accessed 2/1/2024 from https://www.cdc.gov/flu/about/burden/index.html

Chen, Y., Hou, W., Hou, W., & Dong, J. (2023). Lagging effects and prediction of pollutants and their interaction modifiers on influenza in northeastern China. BMC Public Health, 23(1), 1826. https://doi.org/10.1186/s12889-023-16712-6

Cheng, H. Y., Wu, Y. C., Lin, M. H., Liu, Y. L., Tsai, Y. Y., Wu, J. H., Pan, K. H., Ke, C. J., Chen, C. M., Liu, D. P., Lin, I. F., & Chuang, J. H. (2020). Applying Machine Learning Models with An Ensemble Approach for Accurate Real-Time Influenza Forecasting in Taiwan: Development and Validation Study. Journal of Medical Internet Research, 22(8), e15394. https://doi.org/10.2196/15394

Clemente, L., Lu, F., & Santillana, M. (2019). Improved Real-Time Influenza Surveillance: Using Internet Search Data in Eight Latin American Countries. JMIR Public Health and Surveillance, 5(2), e12214. https://doi.org/10.2196/12214

Dai, S., & Han, L. (2023). Influenza surveillance with Baidu index and attention-based long short-term memory model. PLOS ONE, 18(1), e0280834. https://doi.org/10.1371/journal.pone.0280834

Get help with Trends API [online form]. (n.d.). Google Support. Accessed 2/1/2024 from https://support.google.com/trends/contact/trends_api

Google Trends API – Getting Started Guide. (2020). Google Support. Accessed 2/1/2024 from https://docs.google.com/document/d/1Ybu3gHUHtcSXXzgDJ-m7PPto9tw0QG8A5oOBsFP2jao

Google Trends. (n.d.). Google. Accessed 2/1/2014 from https://trends.google.com/trends/?geo=US

Human Research Protection Program. (2019). HRPP Manual Section 4-3: Determination of Human Subject Research. Michigan State University Office of Research Regulatory Support. Accessed 2/1/2014 from https://hrpp.msu.edu/help/manual/4-3.html

Kandula, S., & Shaman, J. (2019). Near-term forecasts of influenza-like illness: An evaluation of autoregressive time series approaches. Epidemics, 27, 41–51. https://doi.org/10.1016/j.epidem.2019.01.002

Kandula, S., Hsu, D., & Shaman, J. (2017). Subregional Nowcasts of Seasonal Influenza Using Search Trends. Journal of Medical Internet Research, 19(11), e370. https://doi.org/10.2196/jmir.7486

Kane, M. J., Price, N., Scotch, M., & Rabinowitz, P. (2014). Comparison of ARIMA and Random Forest time series models for prediction of avian influenza H5N1 outbreaks. BMC Bioinformatics, 15(1), 276. https://doi.org/10.1186/1471-2105-15-276

Ku, Y., Kwon, S. B., Yoon, J.-H., Mun, S.-K., & Chang, M. (2022). Machine Learning Models for Predicting the Occurrence of Respiratory Diseases Using Climatic and Air-Pollution Factors. Clinical and Experimental Otorhinolaryngology, 15(2), 168–176. https://doi.org/10.21053/ceo.2021.01536

Kumar, R., Maheshwari, S., Sharma, A., Linda, S., Kumar, S., & Chatterjee, I. (2024). Ensemble learning-based early detection of influenza disease. Multimedia Tools and Applications, 83(2), 5723–5743. https://doi.org/10.1007/s11042-023-15848-2

Lu, F. S., Hattab, M. W., Clemente, C. L., Biggerstaff, M., & Santillana, M. (2019). Improved state-level influenza nowcasting in the United States leveraging Internet-based data and network approaches. Nature Communications, 10(1), 147. https://doi.org/10.1038/s41467-018-08082-0

Lu, F. S., Hou, S., Baltrusaitis, K., Shah, M., Leskovec, J., Sosic, R., Hawkins, J., Brownstein, J., Conidi, G., Gunn, J., Gray, J., Zink, A., & Santillana, M. (2018). Accurate Influenza Monitoring and Forecasting Using Novel Internet Data Streams: A Case Study in the Boston Metropolis. JMIR Public Health and Surveillance, 4(1), e4. https://doi.org/10.2196/publichealth.8950

Lu, J., Bu, P., Xia, X., Yao, L., Zhang, Z., & Tan, Y. (2020). A New Deep Learning Algorithm for Detecting the Lag Effect of Fine Particles on Hospital Emergency Visits for Respiratory Diseases. IEEE Access, 8, 145593–145600. https://doi.org/10.1109/ACCESS.2020.3013543

Ma, S., Ning, S., & Yang, S. (2023). Joint COVID-19 and influenza-like illness forecasts in the United States using internet search information. Communications Medicine, 3(1), Article 1. https://doi.org/10.1038/s43856-023-00272-2

Mavragani, A., & Ochoa, G. (2018a). Forecasting AIDS prevalence in the United States using online search traffic data. Journal of Big Data, 5(1), 17. https://doi.org/10.1186/s40537-018-0126-7

Mavragani, A., & Ochoa, G. (2018b). Infoveillance of infectious diseases in USA: STDs, tuberculosis, and hepatitis. Journal of Big Data, 5(1), 30. https://doi.org/10.1186/s40537-018-0140-9

Menne, M. J., Durre, I., Korzeniewski, B., McNeill, S., Thomas, K., Yin, X., Yin, Anthony, S., Ray, R., Vose, R., S., Gleason, B. E., Houston, T. G. (2012). Global Historical Climatology Network – Daily (GHCN-Daily), Version 3, [GHCND:USW00094860]. NOAA National Climatic Data Center. doi:10.7289/V5D21VHZ. Accessed 1/24/2024.

Menne, M. J., Durre, I., R., Vose, R., S., Gleason, B. E., Houston, T. G. (2012). An Overview of the Global Historical Climatology Network-Daily Database. Journal of Atmospheric and Oceanic Technology, 29(897-910). doi:10.1175/JTECH-D-11-00103.1

Michigan Influenza Sentinel Provider Surveillance. (n.d.). Michigan Department of Health and Human Services. Accessed 2/1/2024 from https://www.michigan.gov/mdhhs/inside-mdhhs/statisticsreports/communicable-diseases/michigan-influenza-sentinel-provider-surveillance

Utilizing the Michigan Syndromic Surveillance System (MSSS) [PowerPoint slides]. (October 2023). Michigan Department of Health and Human Services. https://www.michigan.gov/mdhhs/-/media/Project/Websites/mdhhs/MDSS_Syndromic/MI-Train-MSSS-Training-Final.pdf

Poirier, C., Lavenu, A., Bertaud, V., Campillo-Gimenez, B., Chazard, E., Cuggia, M., & Bouzillé, G. (2018). Real Time Influenza Monitoring Using Hospital Big Data in Combination with Machine Learning Methods: Comparison Study. JMIR Public Health and Surveillance, 4(4), e11361. https://doi.org/10.2196/11361

Rodrigo, J. A., Ortiz, J. E. (2023). skforecast (Version 0.11.0) [Computer software]. https://doi.org/10.5281/zenodo.8382788

Rodrigo, J. A., Ortiz, J. E. (December 2023). Skforecast: time series forecasting with Python and Scikit-learn. Accessed 2/1/2024 from https://cienciadedatos.net/documentos/py27-time-series-forecasting-python-scikitlearn

Rodrigo, J. A., Ortiz, J. E. (July 2023). Forecasting web traffic with machine learning and Python. Accessed 2/1/2024, from https://cienciadedatos.net/documentos/py37-forecasting-web-traffic-machine-learning.html

Rodrigo, J. A., Ortiz, J. E. (September 2023). ARIMA and SARIMAX models with Python. Accessed 2/1/2024 from https://cienciadedatos.net/documentos/py51-arima-sarimax-models-python

Santillana, M., Nguyen, A. T., Dredze, M., Paul, M. J., Nsoesie, E. O., & Brownstein, J. S. (2015). Combining Search, Social Media, and Traditional Data Sources to Improve Influenza Surveillance. PLOS Computational Biology, 11(10), e1004513. https://doi.org/10.1371/journal.pcbi.1004513

Soliman, M., Lyubchich, V., & Gel, Y. R. (2019). Complementing the power of deep learning with statistical model fusion: Probabilistic forecasting of influenza in Dallas County, Texas, USA. Epidemics, 28, 100345. https://doi.org/10.1016/j.epidem.2019.05.004

World Health Organization. (2023). Influenza (Seasonal). Accessed 2/1/2024 from https://www.who.int/news-room/fact-sheets/detail/influenza-(seasonal)

Xu, Q., Gel, Y. R., Ramirez, L. L. R., Nezafati, K., Zhang, Q., & Tsui, K.-L. (2017). Forecasting influenza in Hong Kong with Google search queries and statistical model fusion. PLOS ONE, 12(5), e0176690. https://doi.org/10.1371/journal.pone.0176690

Yang, J., Fan, G., Zhang, L., Zhang, T., Xu, Y., Feng, L., & Yang, W. (2023). The association between ambient pollutants and influenza transmissibility: A nationwide study involving 30 provinces in China. Influenza and Other Respiratory Viruses, 17(7), e13177. https://doi.org/10.1111/irv.13177

Yang, L., Li, G., Yang, J., Zhang, T., Du, J., Liu, T., Zhang, X., Han, X., Li, W., Ma, L., Feng, L., & Yang, W. (2023). Deep-Learning Model for Influenza Prediction From Multisource Heterogeneous Data in a Megacity: Model Development and Evaluation. Journal of Medical Internet Research, 25, e44238. https://doi.org/10.2196/44238

Yang, W., Olson, D. R., & Shaman, J. (2016). Forecasting Influenza Outbreaks in Boroughs and Neighborhoods of New York City. PLOS Computational Biology, 12(11), e1005201. https://doi.org/10.1371/journal.pcbi.1005201

Yang, Y., Tsao, S. F., Basri, M. A., Chen, H. H., & Butt, Z. A. (2023). Digital Disease Surveillance for Emerging Infectious Diseases: An Early Warning System Using the Internet and Social Media Data for COVID-19 Forecasting in Canada. Studies in Health Technology and Informatics, 302, 861–865. https://doi.org/10.3233/SHTI230290