Coronavirus Infection Prediction Project — Research Team Blog Post 1

Fernanda Moreno
Coronavirus Visualization Team
3 min readMay 15, 2020

--

The Coronavirus Infection Prediction project within the Coronavirus Visualization Team consists of researchers, data analysts, and data visualizers who aim to analyze different parameters of COVID-19 patients and predict the likelihood of one catching it. The end goal of this project is to create a function that takes variables, such as age, location, and underlying health conditions as an input, and outputs a probability for a person catching COVID-19.

The Coronavirus Infection Prediction project researchers have spent the first three weeks of the project’s timeline collecting links to publicly available datasets and compiling lists of restrictions currently in place for all 50 states. For each state, the research team has decided to specifically analyze 9 different widely accepted risk factors for developing COVID-19 symptoms:

  • State policies
  • Population density
  • 2-meter air temperature
  • Age range
  • Lifestyle choices (BMI, smoking/alcohol habits)
  • Pre-existing conditions (metabolic syndrome, asthma, etc.)
  • Socio-economic background
  • Degree of social distancing
  • Gender

In terms of collecting the data, the team had discussed building web scrapers to extract data from websites, but ultimately concluded that an automated scraper was not yet necessary in this stage of research and that manually pulling data into a CSV or Excel file would suffice for now.

Below are some screenshots of how the sources are being compiled for each state:

Contributed by Alicia Loui.
Contributed by Sanya Garg.

In addition to looking at the publicly available datasets, some members of the team have reached out to state departments and hospitals to obtain some data that was not easily accessible, such as the lifestyle choices and pre-existing health conditions of COVID-19 patients. Unfortunately, there was not much success in obtaining the requested data either because the public health officials were flooded with data requests or they just did not have the type of data we wanted. The good news is that some organizations out there, such as the COVID Tracking Project, have already compiled some data for the purpose of aiding researchers and data analysts, so the researchers of the Coronavirus Infection Prediction project plan on turning to these sources and obtaining as much data as they can.

In the next sprint, the research team plans to continue emailing professors and researchers at hospitals to ask for access to time-series data for COVID-19 patients with pre-existing conditions. Additionally, the team will look into more sources with pre-compiled data, such as the CDC, and contact them in an attempt to obtain access to the datasets they used. This data will allow the team to accurately quantify the increased risk of developing COVID-19 symptoms for people with pre-existing conditions.

--

--