Skip to main content

How the CDC could use Google, AI, and even Twitter to forecast flu outbreaks

vaccine blowing nose
Eugenio Marongiu/Getty Images
As summer gives way to fall, flu season is about to be upon us. Proper preparation is essential if there’s to be enough medical professionals and vaccinations to go around. The Centers for Disease Control and Prevention play a huge role in making sure practices and hospitals around the country know what to expect.

The CDC needs all the information that it can get to do this important work. Now, machine learning is bringing together a staggering amount of data — comprising everything from retail sales of flu medication to Google searches about symptoms — to create the best possible picture of the spread of the virus, as it happens. If it works, it could make predicting the spread of disease as commonplace as forecasting tomorrow’s thunderstorms.

Forecast face-off

Over the last four years, the CDC has run a forecasting research initiative intended to build better methods of predicting what flu season will bring.

Participants are invited to submit their own forecasting systems, which are judged stringently based on their accuracy. Each system needs to forecast when the season will start, when it’s going to peak, how bad it will be at its peak, and how bad it will be in one week’s time, two weeks’ time, three weeks’ time, or four weeks’ time.

The scope of this research goes well beyond the flu.

After that, participants are asked to submit a new forecast for each of these seven criteria every week through flu season, using new data that has been collected. Forecasts need to be made for each of ten regions comprising the U.S.

Once the flu season comes to an end, the forecasts are compared with the actual data that was collected. A total of 28 different systems were submitted to the CDC this year. Two of them were developed by Carnegie Mellon University’s Delphi research group, led by Roni Rosenfeld — and those two projects took both the number one and number two spots in the final ranking.

The CDC currently tracks the flu using a surveillance system. The key difference is that surveillance only looks at what’s happening right now, while forecasting can make a probabilistic statement about what’s going to happen in the future. The work being done by the Delphi group, among others, is poised to make a huge impact on the organization’s ability to plan for flu season – and the scope of this research goes well beyond the flu.

Sources of infection

There’s two main strands to the work the Delphi group is doing in conjunction with the CDC. The first is an improvement to the organization’s current surveillance techniques, which Rosenfeld refers to as ‘nowcasting.’ The aim is to make this data available in as close to real-time as possible, without sacrificing any accuracy.

“It takes a while to collate all these numbers, compile them, check them, and publish them,” Rosenfeld explained in a phone call with Digital Trends. “So as a result, when the CDC publishes their surveillance numbers online, they actually refer to the previous week, not the week that we’re in. So, they’re already between one and two weeks old.”

flu forecasting shot
Vladimir Gerdo/Getty Images
Vladimir Gerdo/Getty Images

The researchers are supplementing the data that the CDC collects with various other sources. They’re taking information from Google Trends, statistics regarding how many people access the organization’s online resources pertaining to the flu, and Wikipedia access logs. They’re even starting to take tweets about the flu into account, as well as retail sales of flu medication.

However, some of these sources don’t always measure how many people are getting the flu. They might instead indicate the level of flu awareness.

“If there’s unusual news coverage of flu — maybe because a celebrity got the flu, or something — you would expect to see that influencing how many people search for flu on Wikipedia, or on Google,” said Rosenfeld. “But it would not influence how many people are hospitalized for flu.” The system is being refined so that fake peaks, like the surge of web searches described above, aren’t considered.

They’re even starting to take tweets about the flu into account.

In terms of forecasting, the team is using a combination of three methods that have been developed over the past few years, bringing together models of flu dynamics with time series analysis methodology that’s commonly used by economists.

The results speak for themselves. Information released by the CDC gave Delphi’s Epicast system a “skill score” of 0.451, and its Stat project scored 0.438 — where perfect predictions would have earned 1.00. For comparison, assumptions of what was going to happen based on a simple average of previous data would have only scored 0.237.

That score might not seem like much compared to an ideal of 1.00, but it’s easier to see the strength of the Delphi team’s work when its compared to that of other groups taking part in the initiative. Typically, when different systems are averaged together, they cover for one another’s weaknesses and score better. However, even when all 28 submissions were combined to create an ensemble forecast, the system could only score 0.430 – a hair below Delphi Stat on its own, and well below Delphi Epicast.

Trickle Down

For the purposes of the CDC’s initiative, the Delphi group is working with the organization’s needs in mind. Its primary interest in a new forecasting platform is its capacity to improve its ability to time its response to the flu season.

“Flu can be very deadly for older people.”

The CDC needs to make public announcements about the flu season, and commence its vaccination campaigns at just the right time. If they’re too early or too late, they’re not going to be as effective as they could be.

For now, the CDC is the “main driver” behind the project, according to Rosenfeld. Going forward, he says that he can see the platform being used at state and county levels. Hospitals could even use its forecasting capabilities to help determine what their staffing and equipment needs might be.

Rosenfeld is excited about the prospect of individuals being able to use the forecasts to inform their own behavior. “If you have a mother or a mother-in-law who is 90 years old and wants to go visit their sister in Cleveland, if you know that flu is going to peak in Cleveland two weeks from now, it would be useful to be able to advise her not to go,” he explained. “Because flu can be very deadly for older people.”

It’s important to note that the forecasting isn’t exact — you’re not going to be told, definitively, whether you will or will not contract the flu virus by stepping foot in Cleveland. Rosenfeld compares it a weather station’s precipitation reports, in that it offers a general idea of where it will rain, and how much, over the coming days and weeks.

The Delphi group is working on influenza forecasting because the need is imminent, and data is plentiful, but its platform is capable of much more. The team is already using its technology to look at dengue fever, which kills thousands of people every year, and there are plans to apply the same tools to diseases and conditions including HIV, Ebola, and Zika.

This is a field known as epidemiological forecasting — and it’s blossoming.

Under the Weather

To put the current state of epidemiological forecasting into context, Rosenfeld compares it to weather forecasting, which entered its infancy in the U.S. in the 1860s.

“At the time that it started, people didn’t realize how useful it would be economically and socially, and how much it could progress,” he said of weather forecasting’s early years. “It took many, many years — many, many decades — of development across multiple dimensions.”

flu forecasting sick woman drinking tea

Meteorologists had to put infrastructure in place to collect measurements and readings, first around the country, and then around the world. They had to develop new statistical models, and do other mathematical work to put this data to use. New technology was needed to analyze their findings. Weather forecasting was among the first applications for early supercomputers.

“If you compare that to epidemiological forecasting, we’re at the very beginning,” Rosenfeld said. “We do have the computing power, we have a head start in that regard. But we need to develop the theory, and we need to develop the measurements.”

Rosenfeld hopes that the research that’s being done as part of this CDC initiative will demonstrate the broader potential for epidemiological forecasting. “It will take quite a few years to grow, and a significant investment,” he acknowledged. “We’re trying to make the case for it. We’re trying to start the work and show the vital benefits of forecasting.”

Rosenberg and his team have no small task ahead of them. Just as the benefits of weather forecasting weren’t immediately obvious, it’s difficult to accrue the necessary infrastructure and theoretical frameworks without the proper backing.

Working with the CDC has helped the Delphia group make some major advances in terms of influenza. The next step is to look at more infectious diseases, and continue to improve upon the forecasting being done. With any luck, the results will help medical practitioners see the thunderhead of an outbreak before it occurs.

Brad Jones
Brad is an English-born writer currently splitting his time between Edinburgh and Pennsylvania. You can find him on Twitter…
Trying to buy a GPU in 2023 almost makes me miss the shortage
Two AMD Radeon RX 7000 graphics cards on a pink surface.

The days of the GPU shortage are long over, but somehow, buying a GPU is harder than ever -- and that sentiment has very little to do with stock levels. It's just that there are no obvious candidates when shopping anymore.

In a generation where no single GPU stands out as the single best graphics card, it's hard to jump on board with the latest from AMD and Nvidia. I don't want to see another GPU shortage, but the state of the graphics card market is far from where it should be.
This generation is all over the place

Read more
HP printers are heavily discounted in Best Buy’s flash sale
The HP - OfficeJet Pro 8034e Wireless All-In-One Inkjet Printer on a desk with a smartphone.

There’s good news in store if you’re looking to land a new printer at a discount this weekend. Best Buy is having a 48-hour flash sale on HP printers, with several that can compete with the best printers seeing some good prices. HP is almost always one of the best laptop brands, and it’s one of the same when it comes to printers. So if you’re looking for a new home or office printer, read onward on how to save on an HP printer at Best Buy.
HP DeskJet 2755e — $60, was $85

The HP DeskJet 2755e is a good entry-level printer. It’s got you covered if your printing needs are pretty basic, or if you don’t need to print in mass. This is a color InkJet printer, which makes it good for almost all uses. It can also make copies and scan in color, and it has mobile and wireless printing functionality. You can get set up quickly and easily with the HP Smart app that guides you through the setup process, and you can also use this app to print, scan and copy documents from your phone.

Read more
This tiny ThinkPad can’t quite keep up with the MacBook Air M2
Lenovo ThinkPad X1 Nano Gen 3 rear view showing lid and logo.

While the laptop industry continues to move toward 14-inch laptops and larger, the 13-inch laptop remains an important category. One of the best is the Apple MacBook Air M2, with an extremely thin and well-built chassis, great performance, and incredibly long battery life.

Lenovo has recently introduced the third generation of its ThinkPad X1 Nano, one of the lightest laptops we've tested and a good performer as well. It's stiff competition, but which of these two diminutive laptops stands apart?
Specs and configurations

Read more