Harsh Truths about Data Analysis

In extremely layman terms, data analysis is the skill of using Excel very well. But to be precise, it is the skill of getting, transforming, and presenting structured data to inform decisions.

Either you are assessing your company’s growth, controlling your inventory, sales, and factory performance, navigate industry and media trends, modelling stock prices, or tracking your personal finances, data analysis give you good information that influences your next move.

Since who knows when, I don’t aspire to be a data analyst. However, from one internship over another, I have always encountered some task the requires data analysis skills. As it turns out, I am not alone. All my overachieving friends have had the same experience as me. It seems that data analysis is an important skill in the workplace now. So, if you haven’t learnt it, learn now, or you will be left behind in work.

The process of learning data analysis was fun. Along the way, I gleaned some unlovely, yet important lessons firsthand that might have eluded a beginner. I decided to write them down to solidify my knowledge. Here they are:


1. 80% of the Work is Just Collecting Data

When I was working on a water audit project, I was excited to dig into the numbers and find patterns. But very quickly, I realized that most of my time wasn’t spent analyzing data—it was spent collecting it. I had to ensure that our workers are consistent in collecting the right data for us, clean messy spreadsheets, chase missing information, and figure out where the data even came from in the first place.

Whether you are monitoring factory plant condition, analyze sales pipeline data, predict financial health of company in the next quarter, you need to understand the reality of data analysis: before you can even start working on insights, be prepared to spend A LOT of your time working on getting the right data. The transformation and analysis part is the easy part. The collection part however, is the part of the work where it is the hardest, and this is the work you are really paid for as a data analyst.

As a consequence, if you have control on how raw data is generated and stored—control it as soon as you can. This will save you a lot of tedious data collection work following it.

2. Clarify your Purpose

As the complexity of my problems increase, the more data analysis software I have learnt: Python, Power Query, and Power BI. When I learn a new software or a new trick, I have the itch to apply them to my work at hand. It was exciting to use these new tools, and I started making fancy dashboards and complex data models.

When I am done, I forgot why I am choose to use these dashboard and graphs the first place. When I take time to think through what I am trying to achieve in my work, I realize all the work I’ve done was useless.

I learnt this the hard way: don’t get caught up in using tools for the sake of using them.

Be intentional in what tools or techniques you choose to transform and present your results. Now, before I create anything, I always clarify:

  • Who is this for?
  • What problem am I solving?
  • What’s the simplest way to solve the problem?

If you are still itching to apply your new skills, understand that it is normal— your brain wants you to truly soak in the new information you get. Carve a time for playful practice to solidify what you have learnt.

3. There’s No Such Thing as Perfect Data

Data is collected by at the time by a person through a technology. Each of the three components limit the accuracy and relevance of the data to your work. And it real life, these limits are present in all of kinds of data. This is important to know when you are working on your own project, but it is even more important when obtaining information from others.

The flaws on time and technology is self-evident to most people; those by people? Not really. We are easily tricked by other people. To counter that fallacy, I harbor the principle that behind every data, there is an average employee. This is a consultant/researcher who cuts corners, manipulates data to make it look a certain way, and hallucinates information using Chatgpt that are distracting and worse, misleading. Maybe they just want to churn out SEO content on schedule at the expense of getting the facts 100% verified, maybe they want to make you think of them in a certain way, maybe it is just a marketing/sales tactic, or maybe they just don’t give a damn, etc.

During presentations, people use various tricks to persuade, even though some of the data is full of shit. It is prevalent practice in data analysis/consultancy work because it is effective. If you happen to work in this industry, your seniors will be frank about this to you. Once, I was asked by my ex-boss to add more graphs to show how much work we have done in the project, even though it is practically not important to know. Experience like this gives a healthy dose of skepticism when others use similar tactics against you.

Therefore, if you read industry reports, company presentations, or government statistics, always question where the data comes from. I’ve learned to do extra research, dig into documentation, and ask:

  • How the data is collected?
  • What assumptions were made when collecting this data?
  • What’s missing?
  • Does this data actually reflect reality, or is it just convenient for the person presenting it?

There always has gaps, errors, and biases—your job as a data analyst isn’t to find perfect data, but to understand its flaws.

4. Don’t let data teach you everything

Before doing any sort of data analysis work. Take a step back and ask yourself, what are the underlying mechanics operating behind the domain of your analysis? If you are not clear, learn about that first.

Data is a narrow, highly condensed form of language. It eludes nuances, interdependency, human irrationalities that fill up the real world. It doesn’t tell the full story. As I like to say, don’t let data teach you everything. If you try to understand the world through data first, you will have a limited understanding of how the world works.

Why? Because in real life, data is are not just structured data that exist in strings and integers. Data can be sourced from anecdotes, personal observation, practical experience and strong understanding of the underlying principles. This makes the world a lot more complex and richer relative to the numbers on a spreadsheet.

I’ve realized that sometimes, the best way to understand a problem is to talk to people, observe real-world operations, and think beyond the numbers. Since data can be seen differently by different people with have varying self-interest, expertise, industry background, and biases, never rely on a single source of knowledge. No data should replace critical thinking and real-world experience.

5. Excel is King

In one of my internships (before the water audit project), I saw people struggling with Excel. They were slow, inefficient, and making a lot of manual errors. That experience pushed me to take an Excel course on Coursera (I highly recommend it by the way). What is surprising is that, most of the things I learnt from that course have been used in almost any data analysis project I encountered up to this point.

Yes, tools like SQL, Power BI and Python are great, but Excel is still the go-to tool for business. It’s fast, flexible, and covers 90% of what you need in day-to-day data work.

However, I doesn’t mean that you should not learn other software. Excel has its limitations (like poor processing power and lack of built-in auditability), which is why it is wise to expand the number of tools in your toolbox over time.

That being said, you must learn Excel deeply before hopping off to learn the others. Excel is the most accessible data analysis tool and a programming language in the world. And as this blog astutely pointed out, Excel is the inspiration of the many data analysis software that came after. Mastering Excel allows you to master all the others more quickly.


6. Don’t Overcomplicate—Data Analysis is Simpler Than You Think

In my current internship, the company marketed itself as an AI company that predicts solar energy generation and cost savings. That sounded exciting, and it’s part of the reason I applied.

But when I joined, my seniors told me the AI software was flopped. The AI software that was marketed in the Internet was too expensive to build to the desired performance. The founder later realized that they can build the same software with OpenStreetMap. But still, when looking at the database and workflow myself, I realize that their cost saving calculations didn’t even use Python libraries, which could have improved the predictive performance dramatically.

Which leads to the point:

“An idiot admires complexity,
a genius admires simplicity”

Terry A. Davis

Not to throw too much shade on my current company (turned out that the calculator was a good lead magnet, the effort was not entirely wasted), but it is still important to understand that most data analysis problems don’t need AI. You don’t need a chainsaw to cut a carrot, just because the world is hyped about chainsaws. A highly productive data analyst knows the humble power of kitchen knife and use it when it is right.

Many people think data analysis is all about complex data models and machine learning, but most of the time, getting high quality data, using simple Excel tools, or just go outside to see how the real world works is all you need. Don’t be too obsessed about the learning too many advanced software and complicated data transformation methods. The technical tools you use in data analysis can only help you so much.

Don’t fall into the trap of thinking complexity = better results. That is how stupid people think.

Final Thoughts

In conclusion, data analysis isn’t about using the fanciest tools, chasing AI trends, or making complicated transformation models. It’s about understanding the real world behind the data, asking the right questions, and getting the right source of data. The tools only exist to serve the former.


Comments

Leave a comment