Part 1: Why Data Collection is the Foundation for Building Successful Systems
You've organised your files. You've set up permissions. You can find anything in seconds. But none of that matters if the data inside those files is a mess.
Most teams collect data. Few collect it in a way that's actually useful.
This post is about understanding the difference and why it matters more now than ever before.
The problem with how we've been doing it
Open any shared spreadsheet in your organisation. Look at how data is entered.
You'll probably find:
- Dates written as “15th Jan”, “15/01/24”, “January 15”, and “2024-01-15” all in the same column.
- Names sometimes as “John Smith”, sometimes as “Smith, John”, sometimes as just “John”.
- Amounts with currency symbols in some rows and without in others.
- Empty rows where someone deleted data. Half-filled rows where someone got interrupted.
- Comments stuffed into cells meant for numbers.
This happens because we've been collecting data for humans to read. Someone opens the sheet, scans through it, and makes sense of it. The human brain is excellent at interpreting messy data.
But that's not what we need any more.
Data needs to be readable by programs, not just people
When data is consistent, programs can work with it. When it's inconsistent, they break.
Think about a simple task: calculating total sales for January. If your dates are formatted ten different ways, a program can't filter by month without extensive cleaning first. If your amounts sometimes include currency symbols and sometimes don't, calculations fail.
Now multiply this across every report, every analysis, every automated process you want to build.
Messy data doesn't just slow you down. It stops you entirely.
AI is coming for your spreadsheets
This isn't a problem for the future. It's a problem for right now.
AI agents are already being built to learn from company data, access systems, and process information automatically. Within the next few years, you'll likely have AI tools that can:
- Pull insights from your spreadsheets without you asking.
- Generate reports by reading your data directly.
- Automate workflows based on what's in your systems.
- Answer questions by analysing your files.
But here's the catch: AI is only as good as the data it reads.
Feed it messy, inconsistent data, and you get wrong answers, failed automations, and wasted time fixing errors. Feed it clean, structured data, and you get reliable insights and processes that actually work.
The organisations that prepare their data now will have a significant advantage. Those that won't will spend years cleaning up before they can benefit.
What's next?
You understand why clean data matters. But how did we get into this mess in the first place? In Part 2, we'll look at exactly why spreadsheets fail at data collection and what to do about it.