Predictive Models: More Data, Better Insights on COVID-19

Shared by  McKenzie Foley  on June 23, 2020

in AI, Data Science, covid-19

In late December, pneumonia of unknown cause was detected in Wuhan, China. That same illness would later be declared a Public Health Emergency of International Concern by the World Health Organization. The impacts of the COVID-19 pandemic have been felt by everybody, and the ripple effects are still cascading outward.

When times are challenging, it's often difficult to step back and appreciate the positive outcomes. Although, the coronavirus has separated us physically, it has also brought us together to build creative, innovative solutions. We have seen everyday citizens, multinational organizations, and governments at every level come together, pouring resources and expertise into the pandemic and how to mitigate its effects. 

Pandemic pivots

Many companies have shifted their subscription and business models to help consumers. As academic institutions have moved online, Zoom and Top Hat have both shifted to a freemium model to make their services accessible to students and teachers.

Ford, Chrysler and Fiat have all used their production capacities to go from churning out cars to masks and ventilators. Distilleries like Pernod-Ricard, Bacardi, Tito's and others have converted their operations to produce large volumes of hand sanitizer. 

The importance of data

As a leader in data technologies, we knew that data would be crucial to understanding and combatting the pandemic. It has been our mission to ensure that the research community has as much data as possible to build potential solutions for this pandemic. (Read more about Using Data to Combat COVID-19 here). Our team at ThinkData Works understood the need for reliable, accessible, and centralized data on COVID-19, so we built the largest repository and made it all available through Namara.

The data is pulled directly from primary sources, leaving no room for new errors and bias. What’s more, our data constantly updates without your team having to spend valuable time building and maintaining scripts and connectors. Our repository grows every day so that data professionals can build better models with more data to create stronger solutions.

COVID-Canada

ThinkData Works COVID-19 Data Repository 

What can predictive models do for us?

In an outbreak, models are often used by governments to predict the spread and help determine the best strategies for policy. With COVID-19 specifically, it is a novel virus, meaning this strain hasn’t been seen before, so the medical community had no historical basis for predictions.

Despite the age of the data, it’s important to not discredit historic data as being obsolete when creating predictive models. For example, Taiwan used data from the 2003 SARS epidemic that killed 299 Taiwanese, and data from the H1N1 swine flu in 2009, which killed 56. Taiwan has only 440 confirmed COVID-19 cases and 7 deaths. By leveraging historic data, Taiwan was able to model what was likely to happen in the current pandemic and make policy changes that prevented the spread of COVID-19. Good models are designed to answer specific questions while still using diverse and unbiased data.

Powerful insight in predictive models 

On March 16th, Neil Ferguson’s team from Imperial College London delivered the computer-modelled research paper to UK Prime Minister, Boris Johnson. The research paper projected that, left unchecked, COVID-19 could have a death toll of 500,000 in Britain alone. Based on this report, we saw the British governments adopt their social distance legislation. Soon after, Fergurson’s report influenced the United States, with claims that over 2 million could die.

Blog - Line Graph-1

Predictions from Ferguson's model

On December 30, 2019, BlueDot, a Toronto-based startup that tracks and predicts the spread of infectious diseases, alerted its private sector and government clients about a cluster of “unusual pneumonia.” Using data and AI, BlueDot was able to predict the potential virus 3 days before the World Health Organization.

BlueDot uses flight itineraries, climate conditions, health system, and even animal & insect population data to build well-rounded and informed models. They're a terrific example of a company taking as much pertinent data as possible and feeding it into a larger system that considers a wide range of different factors, not just a select few.

BlueDot Insights application

Source: Bluedot.global

Good models need great data

There are obstacles to using data, and even making sense of it in the first place. A lack of publishing standards, untraceable data provenance, and data filled with errors and bias are just three simple examples.

Amy Abernethy, Chief Medical Officer/Chief Scientific Officer & SVP Oncology for Flatiron Health, recognizes the value in data.

“Once the data are readily analyzable, frankly, the majority of the critical clinical questions can be addressed,” she stated in the Stanford Medicine 2018 Health Trends Report.

Broader analysis

The hallmark of science is the open exchange of knowledge. Proprietary black boxes and data withheld for competitive motivations have no place in a global crisis. Only by following the principles of transparency, reproducibility, and validity will it be possible to make accurate predictions.

At ThinkData Works we have been working with public and private organizations to help them uncover and deliver impactful data to the research community. As the pandemic continues to evolve, so has the data being generated which has sparked the emergence of a sharing data ecosystem. As more data is generated, more models are built and out of these models there is additional data being added to the ecosystem. This is a promising sign of the emergence of establishing a better, data-driven future where stronger models can be built and decisions are backed by data. 

Technology’s role in policymaking has grown, and will only get bigger. With a novel virus such as COVID-19, we can only look at the data we have now to help inform our decisions today, and collect as much high quality data as possible to inform our decision-making for the future. 



ThinkData offers a lot more data than that – over 250,000 thousand datasets from more than 75 countries around the world. Browse the Namara Marketplace now or request a consultation with one of our data experts to talk about external data.


Never miss an update from ThinkData: