-->

What is a data warehouse?

2020-02-19 03:38发布

问题:

I was asked by a customer what the term "data warehouse" really means.

I thought about ETL, details of the data model, differences to NoSQL, Clouds, 'normal' DBMS, MDM (Master Data Management) etc. but wasn't able to describe the term in a few words to him... (In fact I did some talking and left him un-illuminated.)

How can "data warehouse" described in 1-3 (or a bit more) sentences?

回答1:

For non technical guys the best is to describe it as "Huge ammount of data stored in a specialized computer system. Data is usually related to some specific domain and whole system is designed to be fast and optimized for some special tasks. Data stored in data warehouses are mostly used for analysis or in decision making processes."

Not sure if this is enough:) There is a lot of references to this subject in the Internet but if somebody asked me for a quick definition I would use something similiar to that I wrote above.



回答2:

From wiki:

A data warehouse is a repository of an organization's electronically stored data, designed to facilitate reporting and analysis 1.

This definition of the data warehouse focuses on data storage. However, the means to retrieve and analyze data, to extract, transform and load data, and to manage the data dictionary are also considered essential components of a data warehousing system. Many references to data warehousing use this broader context. Thus, an expanded definition for data warehousing includes business intelligence tools, tools to extract, transform and load data into the repository, and tools to manage and retrieve metadata.



回答3:

At least theoretically, the idea of a data warehouse is to provide a consolidated view of data from a variety of existing systems, which are generally considered impractical to rewrite to consolidate the data directly. Therefore, the data warehouse collects data from those existing systems, and provides (at least the illusion of) all the data being in one place, so it can be queried in one way.

The primary intent is (usually) to allow correlation between data from existing systems. For example you can compare how much time your sales person spent with customer X (that's stored in one system) to how much customer X bought (stored in a second system) and how happy customer X is with what they're receiving (stored, of course, in a third system).

From a practical viewpoint, it often means the customer's ideas are somewhere between poorly defined and completely insane. The cost and schedule are next to impossible to even guess at, and a solid estimate is clearly impossible. Delivering what he really wants is almost certainly impossible, and figuring out something that will be useful is going to take enough time and work, that your first step is to make what you're doing sound sufficiently technical that he won't have a heart attack when he gets an inkling of the cost and/or schedule.



回答4:

A data warehouse is an attempt to make disparate systems appear to be homogeneous, regardless to the underlying technology or storage mechanism.

You could get into the 'why' of data warehousing, but thats a different question.



回答5:

Wow, I was doing some research. This is a really good answer I came across:

Data warehousing ... is the reproduced version of data transactions that are particularly structuralized and built for inquiry, analysis and reporting. In a very simple definition, the term data warehousing refers to the process of systematically collecting data that are stored in an organized manner so they can be accessed and retrieved for future reporting and document analysis

It's from "Data Warehouse 100 Success Secrets" by Richard Martin



回答6:

A data warehouse is a database, data load and reporting system designed to aggregate data from multiple sources and present it in a manner which is easy to extract and report on. From a practical standpoint, the benefits of a successful data warehouse project are:

  • Statistical and financial reporting - data warehouses make it easy to work with data in aggregate and get useful analysis from it, particularly where you have 65537 or more rows of data.

  • Data safety - the data is well behaved and doesn't have traps for young players. Ad-hoc reporting systems can be used by inexperienced users with a low risk of producing invalid results in the reports without noticing.

  • Transparency - the business can see and identify issues with data in the underlying systems. A data warehouse can be a good tool to drive data quality work.

  • Empowerment - end-user reporting tools should support the majority of management information requirements with only a minimal set requiring a bespoke report to be developed by a technical specialist.



回答7:

KISS...

A data warehouse is a repository of data related to a given organisation and its activities. This data will allow analysis and reporting of the performance of the organisation across various pertinent dimensions such as time, structure, streams of activity. These dimensions can be combined and results aggregated through relevant hierarchies.



回答8:

From a practical viewpoint: businesses change, environments change, what was an important question yesterday, may not be today and most probably won't be tomorrow. This is especially true when dealing with questions at the CEO level.

If you can't foresee what the questions will be, your only option is to provide the means to answer any question quickly. That's what data warehouses attempt or purport to do. Where the data comes from, and from how many disparate systems, is immaterial.

Many data warehouses fail in the "answer any question quickly" because their technology requires you to turn the available raw data "inside out" (making cubes) to ensure the "quickly". And defining those cubes restricts the variety of questions that can be answered.



回答9:

A data warehouse is a relational database that is designed for query and business analysis rather than for transaction processing.It contains historical data derived from transaction data. This historical data is used by the business analysts to understand about the business in detail.

A data warehouse should have the following characteristics:

  1. Subject oriented:

A data warehouse helps in analyzing the data. For example, to know about a company's sales, a data warehouse needs to build on sales data. Using this data warehouse we can find the last year sales. This ability to define a data warehouse by subject (sales) makes it a subject oriented.

  1. Integrated:

Bringing data from different sources and putting them in to a consistent format. This includes resolving the units of measures, naming conflicts etc.

  1. Non volatile:

Once the data enters into the data warehouse, the data should not be updated.

  1. Time variant:

To analyze the business, analysts need large amounts of data. So, the data warehouse should contain historical data.



回答10:

From what I know Data warehouse is nothing but a relational database which is designed for query and analysis. It usually contains history data derived from transaction data.

As per William Inmon, Data Warehouse definition is :

Data Warehouse is a subject-oriented, integrated, nonvolatile and time-variant collection of data in support of management's decisions.

And the above definition is logical and perfect if we think rationally and can be understood from here



回答11:

In simple terms...

A data warehouse is a way to control the items in a real warehouse that contains control location, stock, movement, reporting, auditing, anything about "real items" at a "real warehouse".

I hope it's more simple.