Leveraging Data Science for Occupational COVID-19 Risk Assessment

Utilizing a data-driven approach to ascertain factories with elevated COVID-19 risk by integrating public health data and private company data.
Workers with face masks (image from: https://socialeurope.eu/occupational-safety-and-health-a-fundamental-right)

When the COVID-19 pandemic first emerged, the occupational medicine team at a large metallurgical corporation began documenting symptomatic employees. However, they quickly discerned that their rudimentary Excel tracking system was inadequate for handling the magnitude of this unfolding global health crisis. The question now arises: how might we employ data science techniques to assist these medical professionals in pinpointing high-risk locations, thereby enabling them to strategically allocate resources and concentrate their mitigation efforts?


To begin, we collaborated with the client to define the specific data required and establish a format for processing. A data cleaning pipeline was subsequently developed to rectify any typographical errors or inconsistencies present in our private dataset. Public health data was regularly procured via an API, treated, and stored in their database.

Working with the company's medical team, we devised a risk factor that integrated variables such as the number of symptomatic employees per factory, the city's moving average of new cases, cumulative deaths, among others. This risk factor served as a valuable tool for the medical team, guiding the allocation of testing kits, distribution of safety equipment, and even decisions about factory closures.

Having prepared both private and public data and created new predictive features, we developed a comprehensive dashboard. This platform provided the medical team with a consolidated view of COVID-19 rates across all factories, enabling comparisons between company and city case rates, assessments of testing efforts, and a historical overview of all relevant private and public health data, all conveniently located in one place.


We leveraged Python for the acquisition and preprocessing of both private and public datasets. This procedure was automated to execute daily using Airflow, ensuring up-to-date information. Utilizing these processed datasets, we constructed an intuitive Tableau dashboard that cohesively presented the private company data alongside public health statistics to the medical team.

In this endeavor, we were greatly assisted by the volunteer team at Brasil.io, who diligently aggregated all public COVID-19 data from Brazil into one accessible location. Their invaluable work enabled us to conveniently retrieve the most current data via a simple API call on a daily basis.

Other projects

Get in touch

If you'd like to learn more about my projects or work together, feel free to reach out! You can also connect with me on LinkedIn

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.