Big Data – Blog by YY
Database v.s. Data warehouse
Questions being popped out in BIG DATA field might be, “what is the difference between database and data warehouse,」 or “if I already had databases, do I really need a data warehouse? “
Actually, these questions are fair ones. The reason is the two may appear to be similar; however, there are important differences in itself, like the architecture, technology or even the usage patterns.
In this post, I will provide simpler and generic explanation between the two.
As usual, which one to choose is totally based on the need from organizations/companies/clients.
“Databases,” as a repository of information used as a backup data storage for applications/services, are usually structured, meaning that data being ingested into must be in accordance with database schema.
“Data warehouse,” a particular type of database, emphasizing on specific applications: storing, filtering, retrieving and analyzing huge volumes of information. This application imposes a different set of constraints and leads to a completely different architecture and usage patterns. Besides, some people might contribute Data warehouse as a collection of Databases.
The following is the differences in between.
- Generally speaking, in the same organizations, Data warehouses are much bigger than the Databases usually store active data, what I mean “active data” is the so-called “hot data”; for performance reasons, older records can be purged and moved to backups, or the so-called “cold data,”. Data warehouses are used to store much older historical records; it’s also common to use data warehouses to store additional information that is bought or captured elsewhere to complement the information that is generated and stored by the internal database system.
- Access patterns are different. Database records are often retrieved and updated one by one; data warehouses are nearly always accessed by reporting engines that work on entire datasets at a time to generate aggregates and other analytical information. Hence, we can say, Data warehouses, are mainly for analytics purposes and Databases, are for normal storage usage. Databases are frequently updated, sometimes only a field or record at a time; data warehouses aren’t updated very frequently.
Databases are usually normalized, meaning a lot of work is done to guarantee that there’s unique copy of any given information, which is important for performance and consistency reasons. But it’s common to store different versions of the same information on a data warehouse, using different structures to compose and access the information. In other words, data warehouses are more irregular, partly by design, as they need to be able to work with different sources of information.