Part 2: Why Spreadsheets Fail at Data Collection and What to Do About It
In Part 1, we talked about why data needs to be readable by programs, not just people. Now let's look at why the way most teams collect data is broken and how to fix it.
Why spreadsheets fail at data collection
Spreadsheets are flexible. That's their strength and their weakness.
When you let people type directly into a spreadsheet, you're trusting them to:
- Use the exact format you need.
- Spell things correctly every time.
- Only enter valid options.
- Not accidentally delete someone else's data.
- Not modify historical records they shouldn't touch.
This trust is misplaced. Not because people are careless, but because people are human.
Someone is in a hurry and types "Jan" instead of "January". Someone copies a row and forgets to update one cell. Someone thinks they're helping by adding a note in a data column. Someone's finger slips and deletes a row without noticing.
Every one of these small mistakes compounds. After six months, your data is unreliable. After a year, it's unusable for anything automated.
The four problems with manual data entry
1. Inconsistency
Ten people will format the same information ten different ways. Without strict rules enforced by the system, consistency is impossible.
2. Invalid entries
When someone can type anything, they will type anything. Typos, wrong values, text in number fields, numbers in date fields. The possibilities for error are endless.
3. Unwanted modifications
Spreadsheets don't protect historical data well. Someone can edit a record from three months ago, and you might never know. Someone can delete rows, and unless you're checking version history, the data is simply gone.
4. No validation
A spreadsheet accepts whatever you give it. It won't stop you from entering a date that doesn't exist, an email address without an @ symbol, or a negative number where only positives make sense.
The solution: Controlled data entry
The answer isn't to train people better or write more documentation. People will still make mistakes.
The answer is to control how data enters your system in the first place.
Instead of letting people type into a spreadsheet directly, you give them a form. The form:
- Forces specific formats. Dates are picked from a calendar, not typed. Options are selected from a list, not invented.
- Validates entries. Required fields must be filled. Email fields must contain valid emails. Numbers must be within acceptable ranges.
- Prevents modification. Once data is submitted, it goes into the sheet. The person who submitted it can't go back and change it without proper access.
- Ensures consistency. Everyone uses the same fields, the same options, the same structure. No variation.
The data that reaches your spreadsheet is clean from the start. No cleaning required. Ready for programs to read. Ready for AI to process.
Google Forms solves this
Google Forms is the simplest way to implement controlled data entry.
You create a form with the fields you need. You define what type of data each field accepts. You set which fields are required. You specify the options for dropdown lists and multiple choice questions.
When someone fills out the form, their responses go directly into a Google Sheet. Every response follows the same structure. Every field contains the type of data it's supposed to contain.
No one can type directly into the sheet. No one can modify historical records through the form. No one can enter invalid data because the form won't let them.
It's not a perfect solution for every situation, but for most data collection needs, it's exactly what you need.
What changes when you get this right
When your data collection is clean, everything downstream becomes easier.
Reports generate themselves. When data is consistent, building reports is straightforward. Filter by date, sum the amounts, group by category. No cleaning, no fixing, no manual adjustments.
Automations actually work. Want to send an email when a certain condition is met? Want to move data between systems? Want to trigger workflows based on submissions? All possible when data is reliable.
Analysis becomes meaningful. You can trust the numbers. Trends are real trends, not artefacts of inconsistent entry. Comparisons are valid because like is compared with like.
AI can help you. When you eventually connect AI tools to your data, they'll work correctly from day one. No garbage in, no garbage out.
The mindset shift
Stop thinking about data entry as something people do.
Start thinking about data entry as something the system controls.
Your job is to design the system. Define what data you need. Define what format it should be in. Define what values are acceptable. Then build a form that enforces all of this automatically.
The people entering data don't need to think about formats or consistency. The form handles it. They just answer the questions.
This shift feels restrictive at first. People are used to the freedom of typing whatever they want. But that freedom is exactly what creates the mess.
Constraints aren't limitations. They're what make reliable systems possible.
What's next?
You understand why data collection matters and why forms are the solution. But before you create a single form, you need to answer two questions: What data is actually worth collecting? And how do you get people to collect it consistently? Get these wrong and even the best-designed form will fail. In the next post, we'll tackle the first question.