Data Engineering·5 minutes read

How Data Warehouses Were Created?

In short, because Spreadsheets Spiraled Outta Control

Linas Kapočius

Solutions Architect at Corgineering.com

April 13, 2025

Spreadsheets like VisiCalc (1979) and Lotus 1-2-3 (1983) were hailed as miracle tools. No more waiting for the IT guy to print reports, suddenly, everyone could be an analyst. And they did. A little too hard. Ever tried to find a specific number in a spreadsheet with 47 tabs, each named something like final_FINAL_v3_THISONE.xlsx? Yeah, me too, so you can get the idea.

It Wasn’t the Spreadsheets (At First)

The original data chaos actually started before spreadsheets were common. In the pre Enterprise Resource Planning days, most business systems were siloed—HR, finance, sales, you name it—all running on their own. To report on anything meaningful, you had to extract data from each system, often manually. These extracts were pulled at different times, using different rules, and then stitched together. The result? Data quality issues. And to make matters worse, people were running these reports directly against transactional databases—systems that were supposed to be optimized for speed and reliability, not analytics. The reporting load slow them down considerably.

This wasn’t just annoying—it was a real operational risk. Businesses were making decisions based on inconsistent, delayed, or outright incorrect data.

Birth of the Data Warehouse

The problem was so painful for the businesses, so around the late 1980s, a few forward-thinking folks—most famously Bill Inmon—proposed a better way: a data warehouse.

A centralized system where you could bring together data from all those wild systems (CRM, ERP, sales tools, whatever) and structure it in a way that made sense. Think of it like this: your company is a library, but instead of all the books being filed under “M” for “Maybe relevant,” you finally hire a librarian. That librarian (the data warehouse) organizes everything so you can actually find stuff.

Then along came Ralph Kimball, who said, “Cool idea, Bill, but let’s also make this data easier for business folks to use.” He pushed the idea of dimensional modeling—facts, dimensions, star schemas. Basically, he turned the warehouse into something mere mortals (read: non-engineers) could query without summoning the SQL gods.

When Everyone's an Analyst and No One's Aligned

By the late ’00s, every department had its own spreadsheet empire. That made things even worse. Finance had one version of “the truth,” Sales had another, and Marketing were inventing their own metrics. People would walk into meetings with totally different numbers for the same KPI. The spreadsheet party had turned into a data chaos rave. There was no lineage, no source of truth—just lots of tab-switching and passive-aggressive email threads. It wasn’t just annoying—it was a risk. Businesses were making big calls on bad data.

Was the First Data Warehouse... a Database?

Yes—but not just a database.

A data warehouse used a database under the hood, but it wasn’t the same thing as your everyday transactional database (like the one your app uses to store users and passwords). It was designed for a very different purpose.

OLTP vs. OLAP

In the world of databases, there are two big styles:

OLTP (Online Transaction Processing): Fast reads/writes for lots of tiny operations. Think: your banking app logging every coffee you shouldn't have bought.
OLAP (Online Analytical Processing): Optimized for big, slow, juicy questions like “What were our top five products in Q3 by region?”

The first data warehouses took existing relational databases (usually something like IBM DB2, Teradata, or Oracle) and tuned them to do OLAP-style querying. But it wasn’t just the tech—it was the design philosophy that changed.

Bill Inmon's definition, which still kinda holds up, was:

"A subject-oriented, integrated, nonvolatile, time-variant collection of data in support of management's decisions."

In plain English:

Subject-oriented: Grouped by topics like customers, sales, products—not raw transactions.
Integrated: Pulled from multiple source systems (not just one app).
Nonvolatile: Doesn’t change once it’s in (no updates, just append).
Time-variant: Stores historical data, not just the latest state.

So while the first data warehouses ran on databases, they weren’t just repurposed MySQL tables. They were structured, modeled, and tuned specifically for analytics.

From Basement Servers to Cloud Beasts

Fast forward to today: data warehouses have gone full glow-up.

Instead of managing physical servers in your office’s weirdest closet, now we’ve got Snowflake, BigQuery, and Redshift—cloud-native beasts that scale like mad and can crunch petabytes without breaking a sweat.

They’re not just for storing stuff anymore—they power dashboards, real-time analytics, machine learning pipelines, and late-night debugging sessions fueled by sheer spite and caffeine.

Also, let’s be honest: it’s very satisfying to run a SELECT * (please, don't do this if you value money and/or resources 😅) on billions of rows and get results in seconds. Magic. Literal wizardry.

References:

Inmon, W. H. (1992). Building the Data Warehouse. Wiley.
Kimball, R., & Ross, M. (2013). The Data Warehouse Toolkit. Wiley.
Foote, K. D. (2023). “A Brief History of the Data Warehouse.” DATAVERSITY.

This article is part of our Data Engineering series. Check out our other articles.