How To Be a Good Data Analyst Without Good Data
Musings on data quality
--
“This is wrong.” — key stakeholder
I dreaded hearing that phrase (and variations of it) when showing dashboards to project stakeholders. I would have a lot of anxiety around data analytics projects in the beginning because I didn’t want to fail. (I’m not great at handling failure— still working on that.)
But once I had more experience as an analyst, my perspective started to change. I wasn’t a bad analyst; rather, I was given bad data.
What can you do?
In this article, I will walk through some of the strategies I employed when the data quality for a project was… less than ideal.
A quick note before I start: a lot of the time, the data is out of your control! There could be business decisions that were made years ago that have negatively impacted the data quality at your organization. IT reductions, acquisitions, failed ERP transitions, etc. Long-term, sustainable change in data quality needs to be an initiative from leaders higher up than analysts.
But, at the end of the day, you often have to work toward a deliverable anyway. So, how do you make progress?
Automate data cleaning tasks
The term “bad data” can encompass a variety of issues. Let’s talk first about “messy data.”
You might get data that is disorganized, has spelling issues, has the wrong data types… the list goes on. The data isn’t “good,” but you know how to fix it.
After you’ve done some exploratory data analysis, you should be familiar with any issues in your dataset that you need to correct. Once you are ready to clean your data, here are some tips:
- If there is business logic that goes into the cleaning, don’t forget to document it (e.g., you have to change some department labels because you know they are wrong in the system).
- Make sure that your process is sustainable — if it involves creating a mapping file that has to be actively maintained, then…