At the beginning of the COVID-19 pandemic, a metallurgical company occupational medicine team started tracking symptomatic employees, but soon realized that a simple Excel spreadsheet would not be enough for what was becoming a global crisis.
How can we use data science to help these doctors identify which locations are more at risk to focus their resources and efforts?
After defining with the client the format and specific data we would work with, a data cleaning pipeline was created to fix any typo or invalid values found in the in our private dataset. Public data was acquired daily via API, treated and stored in our own database.
With the help of the company doctors and a statistician, a risk factor was created taking in consideration features such as the number of symptomatic employees in a factory, moving average of new cases in the city, cumulative deaths, to name a few. This newly created risk factor was used by the doctors to distribute testing kits, safety equipment, or even to determine if a factory should close or not.
With both private and public data treated, and new features created, a Tableau dashboard was developed where the medicine team had a consolidated view of COVID rates in each factory, compare company cases with city cases, analyze the results of their testing efforts, and overview all private and public history in a single place.
Private and public data were acquired and treated using Python. This process was automated to run on a daily basis with a cron job. A Tableau dashboard was developed to show both private and public data to the medicine team.
Finally, a huge shoutout to the team of volunteers at Brasil.io that consolidate all public COVID data from Brazil in a single place, making it easy to get new daily data with a simple API call.
If you'd like to learn more about my projects or work together, feel free to reach out! You can also connect with me on LinkedIn