No Power BI tips and tricks in this article. Just a few words about data management. Pandemic and chaotic data management. If you’re interested in how I prepared data for my COVID-19 report, what it looks like to build a report while data formats are constantly changing, then this article is for you.
I got sick in the middle of January 2020. The entire family got sick right after me. We think it was just flu. But it was the worst flu in my life. Kids had high fever during a couple of days, then no fever at all during a day or two, then high fever again for a day, no fever and high fever again and then they are healthy as always. I still had a terrible cough during a couple of weeks.
So, I was spending my time on a couch with a terrible fever and cough and this is when I found that there is a new virus in China. It was growing fast and I decided that it’s an interesting case to build a Power BI dashboard. If the world is gonna die then at least everyone will be using my dashboard to track the end of the world. There were no good COVID-19 dashboards in January. My dashboard was first published Power BI COVID-19 dashboard (at least I didn’t see other Power BI dashboards at that time and I googled a lot). Actually, when I published it there were no other COVID-19 dashboards at all except JHU CSSEE (the Center for Systems Science and Engineering at Johns Hopkins University) dashboard. But JHU CSEE dashboard was not a Power BI dashboard and it was not an example of the best dashboard design.
It wasn’t like a usual commercial job. Usually a company gives you some data. It can be well prepared data warehouse or really poor and inconsistent data stored in multiple data source, but even if it’s a mess – it’s a relatively stable mess. But COVID-19 data (earlier in January-February) was a fast changing and growing chaos.
Data preparation was the most complicated part of the work. JHU CSSE team was doing a really great job. At the beginning no one else (including WHO) was able to consolidate data daily with no delays. And they shared their data with the entire world. Thank you, JHU CSSE!
But from data management point of view it was a real mess. For example JHU CSSEE shared Google Sheet first. Then they changed format of the sheet. Then they moved data to GitHub and changed data format again. Then they changed data format again. Then they renamed countries (there were no unique IDs, the only ID was a country name and they changed it without any notification). And so on. And it was a real pain to prepare data for the report and to keep the report working. I spent most of the time in Power Query consolidating data and fixing new and new issues.
And there were zero chances to schedule data updates. I tried a few times, but data formats were changing almost every day and I had to click ‘refresh’ manually, verify that nothing is broken and then publish the report. Every day, multiple times per day.
The best data provider – Ministry of Health of Italy. They shared CSV files via GitHub and there were no painful changes of data formats – just new data every day. Also there were some static data files. Everything else was changing very fast and required regular work in Power Query. Later I had to disable some of the data source and hide some report pages (for example, based on BNO News data, Spain Ministry of Health of Spain data) when data providers stopped regular updates or changed data format again when I had no time to keep supporting all the changes. The report refreshes are scheduled now.
The report was growing. I was adding new and new pages (read my previous article URL based navigation for a report shared via ‘Publish to Web’). Different users wanted to see different data and to have different tools. Finally the report consolidated data from multiple Excel, CSV and PDF files, Google Sheets, GitHub repositories, web pages (just HTML to parse).
In the future posts on my blog I’ll be sharing more details about how I prepared and consolidated data from multiple sources and how I built certain report pages. And eventually I’ll share .pbix file. Stay with me.
P.S. Thanks to everyone who were supporting my work by donations, by helping me to find new data sources, by asking questions and answering my questions. I had no chance to answer to every message I got from report users. But it’s a pleasure to know that hundreds of thousands of people used my report to track the pandemic. And if my report helped people to understand the pandemic better and if it saved at least one life then I’m really happy. I hope you are all safe and doing well.
P.P.S. This report allowed me and my family (and a lot of other people) to be ready to the pandemic in advance. I talked with doctors and warned them about what is coming. I understood how useless is case fatality rate (CFR) on early stages of an epidemic and how media (and even some governments) publish misleading information because they don’t understand the subject and they don’t want to understand (or because they just lie). I notified parents in our kindergarten that all kindergartens will be closed (and other apocalyptic things will happen) at the time when almost no one in our country have been expecting for something like this. I bought enough groceries before everyone went to supermarkets, so I had to buy nothing during the panic that started weeks later. I bought masks (not many, just for family needs) in their regular price. I found a bug in Map visual that affected many users and contacted Microsoft. The bug has been fixed by Microsoft. I discovered some Power BI tricks and got an experience of work with fast changing data chaos. My social media experience changed significantly. I got thousands of new Facebook and Twitter followers. I decided that I’ll be switching from being Excel VBA developer with occasional Power BI projects to being full time Power BI developer with occasional Excel VBA work. Despite it costed me a lot of time (no ads and no donations could fully cover the cost of the time required to build such report) it was an interesting and useful project.