The Roche Data Science Coalition: Collaboration in Crisis

Shared by  Lewis Wynne-Jones  on June 11, 2020

in Data Science, Open Data, covid-19

In March, ThinkData became a founding member of the Roche Data Science Coalition, a group of public and private organizations working with the global community to develop solutions around the COVID-19 pandemic. Committing to sharing knowledge and public data, the coalition has developed relationships with data providers to break down the silos between the people who have access to useful information and the people who can use it to better understand the crisis. 

At ThinkData, our focus is always on data, and we’ve been thrilled to work with the coalition to find, normalize, and provide access to hundreds of COVID-19 datasets from governments, research institutions, and private enterprises. The COVID-19 data catalogue brings together data from everywhere. From Johns Hopkins and the WHO to Google’s Mobility Trends and The Humanitarian Data Exchange, ThinkData connects to data wherever it resides and however it is formatted. Learn more about ThinkData's approach to Data Provenance here

COVID-Canada

Glimpse of ThinkData's COVID-19 Data Catalogue

In order to get this data into the hands of researchers, data scientists, and the community at large, the Roche Data Science Coalition launched the United Network for COVID Data Exploration and Research (UNCOVER) Challenge. The UNCOVER Challenge is administered by the Coalition through Kaggle, and aims to solve the challenges faced by global frontline workers, healthcare providers, hospitals, suppliers and policy makers. Since its launch, we have worked with a common goal to bring actionable COVID-19 intelligence to the front line.

As the challenge progresses, members of the coalition evaluate the solutions developed at each stage. We are impressed week after week with the innovation we’ve seen from participants as they dug into the 12 tasks we put to them:

The original challenge tasks

  • Which populations are at risk of contracting COVID-19?
  • How is the implementation of existing strategies affecting the rates of COVID-19 infection?
  • What is the incidence of infection with coronavirus among cancer patients?
  • Which patient populations pass away from COVID-19?
  • Which populations have contracted COVID-19 and require ventilators?
  • Which populations have contracted COVID-19 who require the ICU?
  • What is the change in turnaround time for routine lab values for oncology patients?
  • Which populations of clinicians are most likely to contract COVID-19?
  • Which populations assessed should stay home and which should see an HCP?
  • Which populations of clinicians and patients require protective equipment?
  • Are hospital resources being diverted from providing oncology care to support the COVID-19 response?
  • How are patterns of care changing for current patients (e.g. cancer patients)?

heatmapSample data visualization from the UNCOVER Challenge

In our most recent evaluation phase, we were excited to see an evolution in the way that participants were framing their submissions. In the first weeks of the pandemic, largely due to the types of data that were available, the submissions were mostly focused on understanding the crisis at ground level. How many new infections were there? What was the rate of increase? The first data sources we found – Johns Hopkins, the European Centre for Disease Control, WHO, New York Times, etc. – were primarily focused on providing this kind of information: the what and the where.

But as the crisis continued, and more data became available, participants started layering other information on top of the infection rate data. Novel data sources from third parties about government measures, mobility trends, and behavioural changes started creating unique insight when overlaid with other datasets. Critically, we started to see participants not only thinking about the current crisis we’re facing, but how the lessons learned every day can be applied in the future. These are insights into today’s problems; they may also be solutions for tomorrow’s.

With that in mind, the Coalition has decided to reframe the challenge tasks to help focus efforts towards prediction, risk assessment, and implementation.

The reframed challenge tasks

  • Can we predict the impact of various social interventions (including testing) and public attitudes on the spread of COVID-19 through populations?
    • For example, what if various measures were lifted at different time intervals? What is the expected impact on effective social distance? What is the impact on disease spread (Ro) and illness?
  • Can we predict the impact on health infrastructure and resources of the COVID-19 spread?
    • Alternatively, can you predict the severity of illness from COVID-19 in a population based on availability of healthcare resources?
  • What are the risk factors associated with the severity of illness from COVID-19 infection?
    • Profile of patients and lifestyle, social, co-morbidities, genetics, viral strain, etc. influencers; how do these affect the probability of requiring hospitalization, ICU, ventilator, mortality?
    • How does risk vary with healthcare professionals? What is the impact of availability & implementation of PPE on their risk?
  • How has COVID-19 affected non-COVID-related healthcare availability (e.g. for cancer, cardiovascular disease, dialysis, etc. patients)?
    • How can we restore/maintain effective non-COVID-related healthcare services?
  • Can we predict changes in demand for mental health services and how can we ensure access? (by region, social/economic/demographic factors, etc.)
  • Catch-all EDA and Insights section
    • Can we take any of the above and localize projections to sub-regions within a country (e.g. province, state, county, city, etc.)? Where is there sufficient data available to make this possible?
    • How do we take some of the provided models/solutions and convert them into visually appealing and meaningful insights for communication to a broad audience?

trend_comparison-1Sample data visualization from the UNCOVER Challenge

The UNCOVER Challenge doesn’t have an end date. We will continue to provide data and encourage critical examination of every relevant piece of information as long as it can make a difference. In order to do this, we need to find more data, and those who have data will have to continue to make it available. It is our hope – and our goal – to work together not only to discover actionable intelligence ourselves, but to help others do so.


About Roche

Roche is a global pioneer in pharmaceuticals and diagnostics focused on advancing science to improve people's lives. The combined strengths of pharmaceuticals and diagnostics under one roof, combined with a focus on innovation, have made Roche the leader in personalized healthcare - a strategy that aims to provide patients with timely access to their best possible healthcare solution.

Roche is the world's largest biotech company, with truly differentiated medicines in oncology, immunology, infectious diseases, ophthalmology and diseases of the central nervous system. Roche is also the world leader in in vitro diagnostics and tissue-based cancer diagnostics, and a frontrunner in diabetes management.

Founded in 1931, Roche Canada is committed to searching for better ways to prevent, diagnose and treat diseases while making a sustainable contribution to society. The company employs more than 1,200 people across the country through its Pharmaceuticals division in Mississauga, Ontario and Diagnostics, as well as Diabetes Care divisions in Laval, Quebec.

Roche aims to improve patient access to medical innovations by working with all relevant stakeholders. Roche Canada is actively involved in local communities through its charitable giving and partnerships with organizations and healthcare institutions that work together to improve the quality of life of Canadians. For more information, please visit www.RocheCanada.com.


Never miss an update from ThinkData: