data warehouse

Strategic set supported by all types of data
Collection
zero Useful+1
zero
Data warehouse, English name is Data Warehouse, which can be abbreviated as DW Or DWH. Data warehouse is used for enterprise Decision making process at all levels, strategy to provide all types of data support aggregate It is a single data store, created for analytical reporting and decision support purposes. For enterprises that need business intelligence, provide guidance for business process improvement, monitoring time, cost, quality, and control. [1]
Chinese name
data warehouse
Foreign name
Data Warehouse
Abbreviations
DW
Presenter
Bill Inmon

development history

Announce
edit
Data warehouse is a decision support system( dss )And online analytical applications data source Structured data environment Data warehouse studies and solves the problem of obtaining information from database. The characteristics of data warehouse are subject oriented, integrated, stable and time-varying.
Data warehouse, the parent of data warehouse Bill Inmon It was proposed in 1990 that the main function of the organization is to Online Transactions (OLTP) A large amount of data accumulated over the years are systematically analyzed and sorted through the data storage architecture unique to the data warehouse theory to facilitate various analysis methods such as OLAP (OLAP)、 data mining (Data Mining) and further support such as decision support system (DSS) Executive Information System The creation of EIS helps decision-makers to quickly and effectively analyze valuable information from a large amount of data, so as to facilitate decision-making and quickly respond to changes in the external environment and help construct business intelligence (BI)。
Parent of Data Warehouse Bill Inmon The definition put forward by Bill Inmon in the book "Building the Data Warehouse" published in 1991 is widely accepted - Data Warehouse is a subject oriented, integrated, relatively stable and time variant data set, It is used to support decision making support.

characteristic

Announce
edit
1. Data warehouse is subject oriented; The data organization orientation of operational database transaction processing Task, and the data in the data warehouse is organized according to a certain subject field. Topics refer to the key aspects that users care about when making decisions using data warehouses. One topic is usually related to multiple operational information systems.
Core tools of data warehouse
2. The data warehouse is integrated. The data of the data warehouse comes from decentralized operational data. The required data is extracted from the original data, processed and integrated, and can only enter the data warehouse after unification and integration;
The data in the data warehouse is based on the original decentralized database Data extraction After systematic processing, summary and sorting, the inconsistency in the source data must be eliminated to ensure that the information in the data warehouse is consistent global information about the entire enterprise.
The data in the data warehouse is mainly used for enterprise decision analysis. The data operations involved are mainly data queries. Once a certain data enters the data warehouse, it will generally be retained for a long time. That is, there are generally a lot of query operations in the data warehouse, but there are few modification and deletion operations. Usually, only regular loading and refreshing are required.
The data in the data warehouse usually contains historical information. The system records the information from a certain point in the past (such as the point when the data warehouse was applied) to the current stages of the enterprise. Through this information, the development process and future trends of the enterprise can be quantitatively analyzed and predicted.
3. The data warehouse cannot be updated. The data warehouse is mainly used to decision analysis Provide data. The operations involved are mainly data query;
4. Data warehouse changes over time, traditional Relational database system It is more suitable for processing formatted data, and can better meet the needs of commercial business processing. Stable data is saved in read-only format and does not change over time.
5. Summarized. Operational data is mapped into a format available for decision making.
6. Large capacity. time series data Collections are usually very large.
7. Denormalized. Dw data can and often is redundant.
8、 metadata Save the data describing the data.
9、 data source Data comes from internal and external non integrated operating systems.
Data warehouse, which is not a so-called "large database", is produced for further mining data resources and decision-making needs when there are already a large number of databases. The purpose of data warehouse scheme construction is to provide the basis for front-end query and analysis. Due to greater redundancy, the storage required is also large. In order to better serve front-end applications, data warehouses often have the following characteristics:
1. The efficiency is high enough. The analysis data of the data warehouse is generally divided into daily, weekly, monthly, quarterly, annual, etc. It can be seen that the data with daily as the cycle requires the highest efficiency, requiring customers to see yesterday's data analysis within 24 hours or even 12 hours. Because some enterprises have a large amount of data every day, poorly designed data warehouses often have problems. It is obviously not feasible to delay 1-3 days to provide data.
two Data quality The information provided by the data warehouse must be accurate, but the data warehouse process is usually divided into multiple steps, including Data cleaning , loading, querying, presenting, etc., the complex architecture will be more hierarchical, so data source yes Dirty data Or the code is not rigorous, which can lead to data distortion. When customers see the wrong information, they may analyze and make wrong decisions, resulting in losses rather than benefits.
3. Scalability. The reason why the architecture design of some large-scale data warehouse systems is complex is that the scalability in the next 3-5 years is taken into consideration. In this way, it will not cost too much to rebuild the data warehouse system in the future, and it will be able to operate stably. It is mainly reflected in the rationality of data modeling. There are more middle layers in the data warehouse scheme, so that the massive data flow has enough buffer, so that the data volume is not too large, and it will not run.
As can be seen from the introduction above, Data warehouse technology It can wake up the data accumulated by enterprises for many years, not only manage these massive data for enterprises, but also mine the potential value of data, thus becoming one of the highlights of the operation and maintenance system of communication enterprises.
Broadly speaking, the decision support system based on data warehouse consists of three components: data warehouse technology, online analytical processing technology and data mining Data warehouse technology is the core of the system. In the following articles of this series, we will introduce the main technologies of modern data warehouse and the main steps of data processing around data warehouse technology, and discuss how to use these technologies in the communication operation and maintenance system to help operation and maintenance.
4. Theme oriented
The data organization orientation of operational database transaction processing Task, each business system is separated from each other, and the data in the data warehouse is organized according to a certain subject field. The theme is Traditional database The application oriented concept of is an abstract concept. It is an abstraction that synthesizes, classifies, analyzes and utilizes data in enterprise information systems at a higher level. Each theme corresponds to a macro analysis field. The data warehouse excludes data that is useless for decision-making and provides a concise view of specific topics.

purpose

Announce
edit
Under the environment of information technology and data intelligence, data warehouse provides many cost-effective computing resources in the field of software and hardware, Internet and enterprise intranet solutions, and databases. It can save a large amount of data for analysis, and allows the use of multiple data access technologies.
Open system technology makes the cost of analyzing a large amount of data more reasonable, and hardware solutions are more mature. The main technologies used in data warehouse applications are as follows:
parallel
Computing hardware environment, operating system environment, database management system and all related database operations, query tools and technologies, applications and other fields can benefit from the latest achievements of parallelism.
partition
Partitioning makes it easier to support large tables and indexes, while improving data management and query performance.
data compression
The data compression function reduces the cost of the disk system usually required to store a large amount of data in the data warehouse environment. The new data compression technology has also eliminated the negative impact of compressed data on query performance. [1]

technological development

Announce
edit
From database to data warehouse
Enterprise data processing can be roughly divided into two categories: one is operational processing, also known as online transaction processing, which is a daily operation of specific businesses online in the database. It usually queries and modifies a few records. The other is analytical processing, which generally analyzes historical data of certain topics to support management decisions.
They have different characteristics, mainly reflected in the following aspects.
1. Processing performance
Daily business involves frequent and simple data access, so the performance requirements for operational processing are relatively high, and the database needs to be able to respond in a very short time.
2. Data integration
The operational processing of enterprises is usually decentralized, and the application oriented characteristics of traditional databases make data integration difficult.
3. Data update
Operational processing is mainly composed of atomic transactions, and data is updated frequently, requiring parallel control and recovery mechanisms.
4. Data time limit
Operational processing mainly serves daily business operations.
5. Data integration
Operational processing systems usually only have simple statistical functions.
Database has been widely used in the field of information technology. Almost every department of our social life has a variety of databases holding various data closely related to our life. As a branch of database, the concept of data warehouse is much closer to database in terms of time. Famous in America Information Engineering Expert William Inm ON In the early 1990s, the doctor put forward a statement of the concept of data warehouse, saying: "A data warehouse is usually a subject oriented, integrated, time changing data set, but the information itself is relatively stable. It is used to Pipe alignment Management decision process support. "
The topic here refers to the key aspects that users care about when making decisions using data warehouses, such as revenue, customers, sales channels, etc; The so-called subject oriented means that the information in the data warehouse is organized by subject, rather than business functions, as in the business support system.
Integration means that the information in the data warehouse is not simply extracted from various business systems, but goes through a series of processing, sorting and summarizing processes. Therefore, the information in the data warehouse is consistent global information about the entire enterprise.
Change over time means that the information in the data warehouse does not only reflect the current state of the enterprise, but records the information from a certain point in the past to the current stages.
database security
Computer attacks, insider violations, and various regulatory requirements are prompting organizations to seek new ways to protect their enterprise and customer data in commercial database systems.
You can take eight steps to protect your data warehouse and achieve compliance with key regulations.
1. Discovery
Use discovery tools to discover changes in sensitive data.
2. Vulnerability and configuration evaluation
Evaluate database configurations to ensure that they do not have security vulnerabilities. This includes verifying the way the database is installed on the operating system (such as checking the file permissions of the database configuration file and executable program), and verifying the internal configuration options of the database itself (such as how many login failures lead to account locking, or what permissions are assigned to key tables).
3. Strengthen protection
Through vulnerability assessment, delete all functions and options that are not used.
4. Change audit
The security protection configuration is strengthened through the change audit tools. These tools can compare the configured snapshots (at the operating system and database levels), and issue warnings immediately when changes that may affect database security occur.
5. Database activity monitoring( DAM )
Monitor database activity in real time by detecting intrusion and misuse in time to limit information exposure.
6. Audit
It is necessary to generate and maintain security and anti repudiation audit trails for all database activities that affect security status, data integrity or sensitive data viewing.
7. Authentication, access control and authorization management
The user must be authenticated to ensure that each user has complete responsibility, and access to data is restricted through administrative privileges.
8. Encryption
Encrypt is used to render sensitive data in an unreadable manner, so that an attacker cannot gain unauthorized access to data from outside the database.
How to respond to monitoring needs
Data, as the core asset of enterprises, is increasingly concerned by enterprises. Once illegal access, data tampering, and data theft occur, it will bring huge losses to enterprises. As the core carrier of data, database security is more important.
In the face of database security problems, enterprises often face the following major challenges: databases are accessed, attacked, or even stolen maliciously, and you can't find these malicious operations in time; Do not know the details of the data user's access to the database, which cannot ensure your management of data security;
Information security will also bring about audit problems. Today, the global requirements for compliance/audit are becoming more and more strict. It is not uncommon to see penalties for failing to meet compliance requirements. The mandatory requirements of the Sarbanes Oxley Act of the United States led to the delisting of Brilliance China Automotive Holdings Limited, China's first overseas listed company, from the New York Stock Exchange on July 5, 2007.
The Chinese government has also made a lot of efforts to strengthen the compliance/audit requirements on information security. For example, in order to strengthen the information technology risk management of commercial banks, the CBRC has issued the Guidelines on Information Technology Risk Management of Commercial Banks. The Chinese government - the Ministry of Finance, the CSRC, the CBRC Five ministries and commissions, including the CIRC and the National Audit Office, jointly issued the "China Sarbanes Oxley Act" (hereinafter referred to as the "C-SOX Act") - "Basic Norms for Enterprise Internal Control".
Faced with compliance/audit requirements, enterprises often face the following challenges:
·Failure to achieve continuous audit
User audit mainly focuses on database and application system logs, which are very large
Database administrator) and information security auditors can only analyze after work, and the analysis time is also long. Continuous audit cannot be achieved.
·Audit is not standardized
The contents and forms of user audit are mainly considered according to the requirements of external auditors and internal security management elements
The quality of audit work basically depends on the experience and skills of DBAs and information security auditors, which cannot effectively become the company's norms and meet external audit requirements.
·Database administrator's rights and responsibilities are not completely separated, resulting in audit effect problems
The database management and the collection of audit original data are actually done by the DBA, which leads to the unclear rights and responsibilities of the DBA. The DBA cannot objectively audit the work it has done. Although the user has set up an information security auditor, some evidence of the role's audit work is based on the preliminary audit of the DBA, so the audit effect and reliability are problematic.
·Incomplete audit
Manual audit needs to face massive logs, so it is impossible to conduct detailed audit on all data; The audit report may not meet
100% visibility.
In order to meet the information security, compliance, audit and other needs of enterprises, IBM has launched the "CARS" enterprise information architecture, which mainly provides comprehensive satisfaction and protection from four aspects: "Compliance", "Availability", "Retention", and "Security". In addition, the introduction of IBM Guardium database security, compliance, audit, and monitoring solutions has specifically addressed and strengthened "compliance" and "information security".
Guardium's database security, compliance, audit, and monitoring solutions, in the form of a software and hardware integrated server, greatly enhance database security, meet and facilitate audit work, improve performance, and simplify installation and deployment. It can prevent damage to the database, malicious access, and theft of data, and help determine where the customer's key sensitive data is; Who is using these data; Control access to data in the database and monitor privileged users; Help enterprises enforce security regulations; Check weak links and vulnerabilities to prevent changes to database configuration; Meet compliance/audit requirements, simplify and automate internal and external audit and compliance processes, and enhance operational efficiency; The complexity of managing security.

Main cases

Announce
edit

Agrofert

Agrofert, an agricultural, food and chemical group, found that with the rapid development of the enterprise, its subsidiaries have more than 160 different systems in operation. Unified reporting is difficult, and support and licensing costs are rising. It is obviously not a scalable strategy to expand the infrastructure every time a new system is purchased. Agrofert uses SAP ERP applications as shared services for some of its subsidiaries in order to gradually extend them to the entire enterprise. These applications are centrally managed on IBM Power Systems servers in two locations. The company migrated from a mixed database environment (including Oracle and Microsoft SQL Server) to IBM DB2, taking IBM DB2 as its standard database, and deploying a centralized storage system for key business data. After migration, the local system is no longer needed, which can greatly reduce the management, support and licensing costs; IBM DB2 can reduce license fees, simplify management and reduce employee education and training; Consolidated storage helps reduce costs, while IBM DB2 deep compression will reduce overall storage requirements; The total cost is estimated to be reduced by 20%.

Disneyland

Disney Every year, there is one billion dollars in merchandise sales revenue, and it is extremely challenging to establish an ERP system to handle this information.
The latest centralized ERP The system is designed to handle commodity management, inventory management and related business processes. But Disney also wants to balance financial and business intelligence (BI) reporting and business analysis systems, which means building a new data warehouse. Some products Disney used in this project include SAS analysis software and Teradata data warehouse technology. The latest centralized ERP, data warehouse and analysis system are helping Disney better manage inventory, analyze sales and forecast commodity demand in specific areas. [2]

Structural design

Announce
edit
Data warehouse has the power to change business. It can help companies understand customer behavior, predict sales trends, and determine the profitability of a group of customers or products. However, the implementation of data warehouse is a long-term and risky process. According to an online survey released by DM Review, 51% of respondents believe that the number one obstacle to creating a data warehouse is the lack of accurate data. The most important point is that all data cannot be updated in real time.
There are six guiding principles that can help enterprises quickly implement data warehouse plans and evaluate their processes:
·Simplify requirements collection and design.
It is often difficult for companies to determine which data is important and which makes them unable to use valuable unstructured information to drive key business processes. The organization should check whether the IT manager has a deep understanding of the business plan and the information needed to support it. For example, where is the source data? What transformation is needed to make it available to critical applications?
·Support collaboration between business and IT users.
Incomplete, outdated or inaccurate data can lead to a lack of trusted information. Notice whether the company has a business glossary for users to view, collaborate and adjust according to their collective business perspective?
·Avoid costly low-level errors and rework.
Clarify whether the company has an implementation strategy that includes a well-defined data model, and applications provide information?
·Identify matching information and create a single view.
Multiple versions of the same fact can lead to problems managing users, products, and partnerships - increasing the risk of non-compliance.
·Transform and publish using the fastest and most scalable method.
Determine whether the company has an automated process that can utilize parallel processing and reuse the results of previous transformations? Can the company's system release data to users and applications on demand in a timely manner?
·Expand information accessibility through information services.
Clarify whether the enterprise can really use information as common property? Can IT experts keep these properties and let the authorized person use them? Can the information be released to the right place and scene at the right time? [3]

Implementation mode

Announce
edit
Data warehouse is a process rather than a project.
The data warehouse system is an information providing platform Business processing system Data acquisition is mainly organized by star model and snowflake model, and provides users with various means to acquire information and knowledge from data.
From the functional structure, the data warehouse system should at least include data acquisition data storage (Data Storage) and Data Access.
The construction of enterprise data warehouse is based on the existing enterprise business system and the accumulation of a large amount of business data. Data warehouse is not a static concept. Only by giving information to users who need it in time for them to make decisions to improve their business operation, can information play its role and be meaningful. The fundamental task of data warehouse is to sort out, summarize and reorganize the information and provide it to the corresponding management decision makers in time. Therefore, from the perspective of the industry, data warehouse construction is a project and a process.

Architecture

Announce
edit

data source

It is the foundation of the data warehouse system and the data source of the whole system. It usually includes internal and external information of the enterprise. Internal information includes various business processing data and various document data stored in RDBMS. External information includes various laws and regulations, market information and competitor information, etc;
Data storage and management
It is the core of the whole data warehouse system. The real key of data warehouse is data storage and management. Data warehouse is different from Traditional database At the same time External data Form of expression. To decide what products and technologies to use to build the core of the data warehouse, we need to start from the technical characteristics of the data warehouse. The data of existing business systems shall be extracted, cleaned, effectively integrated, and organized according to the theme. Data warehouses can be divided into enterprise level data warehouses and department level data warehouses (usually called data mart )。
OLAP server
The data needed for analysis is effectively integrated and organized according to multidimensional models, so as to conduct multi angle and multi-level analysis and find trends. Its specific implementation can be divided into: ROLAP (Relational Online analysis and processing )、 MOLAP (Multidimensional online analysis processing) and HOLAP (Hybrid Online analysis and processing )。 ROLAP basic data and aggregate data are stored in RDBMS in MOLAP basic data and aggregate data are stored in Multidimensional database Medium; The HOLAP basic data is stored in the RDBMS, and the aggregate data is stored in the multidimensional database.

Front end tools

Mainly including various Reporting Tools , query tools, data analysis tools Data Mining Tools with data mining And various applications based on data warehouse or data mart development tool Among them, data analysis tools are mainly for OLAP servers, and report tools and data mining tools are mainly for data warehouses.

form

Announce
edit

Data extraction tool

IBM data warehouse solution product composition
Take the data out of various storage methods, carry out necessary transformation and sorting, and then store it in the data warehouse. For all kinds of different data storage The access capability of the mode is the key of the data extraction tool, which should be able to generate COBOL Procedures MVS Operation control language (JCL)、 UNIX Scripts, and SQL Statement to access different data. Data conversion includes deleting those that are not meaningful for decision application Data segment Convert to unified data name and definition; Calculate statistics and derived data; Assign missing data to Default Unify different data definition methods.

data base

It is the core of the entire data warehouse environment, the place where data is stored and the support for data retrieval. Compared with the manipulation database, its outstanding characteristics are the support for massive data and fast retrieval technology.

metadata

Metadata is data that describes the structure and establishment method of data in the data warehouse. It can be divided into two categories according to different purposes, technical metadata and commercial metadata.
Technical metadata is the data used by data warehouse designers and managers to develop and daily manage data warehouses. Including: data source information; Description of data conversion; Definition of objects and data structures in data warehouse; Data cleaning and Data update Rules to be used; Mapping of source data to target data; User access rights, data backup history, data import history, information release history, etc.
Business metadata describes the data in the data warehouse from the perspective of business. Including: description of business subject, including data, query and report;
Metadata provides an information directory for accessing the data warehouse. This directory comprehensively describes what data are in the data warehouse, how they are obtained, and how to access them. It is the center of data warehouse operation and maintenance. The data warehouse server uses it to store and update data, and users use it to understand and access data.

data mart

A part of data that is independent from the data warehouse for a specific application purpose or scope can also be called department data or subject area. In the implementation process of data warehouse, we can often start from the data mart of one department, and then use several data marts to form a complete data warehouse. It should be noted that when implementing different data marts, field definitions with the same meaning must be compatible, so that the implementation of data warehouse in the future will not cause great trouble.
In the report on data mart products by well-known Gartner abroad, the agile business intelligence products in the first quadrant include QlikView, Tableau and SpotView, which are all data mart products with full memory computing, posing a challenge to traditional business intelligence product giants in terms of big data. Domestic BI products started late, and well-known agile business intelligence products include PowerBI, Z-Suite of Yonghong Technology, SmartBI FineBI business intelligence software Among them, the Z-Data Mart of Yonghong Technology is a data mart product for thermal memory computing. DEAN Information in China is also a system integrator of data mart products.

Data warehouse management

Security and privilege management; Update of tracking data; Data quality Inspection; Manage and update metadata; Audit and report the use and status of data warehouse; Delete data; Copy, split and distribute data; Backup and recovery; Storage management.

Information release system

Send the data in the data warehouse or other relevant data to different places or users. Web based information publishing system is the most effective way to deal with multi-user access.

Access Tools

Provide means for users to access data warehouse. There are data query and report tools; application development tool management information systems (EIS) tools; Online analysis( OLAP )Tools; Data mining tools.

data model

Announce
edit
Different from the general online transaction processing (OLTP) system, data model design is the foundation of a data warehouse design. At present, the two major mainstream theories are to use the normalized approach or the dimensional approach to design data models. Data models can be divided into logical and entity data models. The logical data model states the relationship of business related data. It is basically a structural design unrelated to the database. It is usually designed in a formal way. The main spirit is to formulate the subject area model from the perspective of the enterprise business field and height, and then gradually go down to entities and attributes. Future adoption will not be considered in the design Database management system , there is no need to consider the analysis performance. The entity data model is related to the database management system and is the data architecture built on the system, so data type, space and performance related issues should be considered in the design. The design of entity data model is often discussed in a formal or multi-dimensional way, but in practice, it is the right consideration for enterprises to build data warehouses when they do not adhere to theory and can best match business needs.
The establishment of data warehouse is not only the application of information tool technology, but also the in-depth understanding of industrial knowledge, marketing management, market positioning, strategic planning and other related businesses in planning and implementation. Only in this way can the value of data warehouse and subsequent analysis tools be truly brought into play and organizational competitiveness be improved.

Design steps

Announce
edit
1) Select the appropriate topic (the problem area to be solved)
2) Clearly define the fact table
3) Determine and confirm dimensions
4) Select Fact table
5) Calculate and store derivations in the fact table Data segment
6) Convert dimension table
7) Database data collection
8) Refresh dimension table as required
9) Determine the query priority and query mode.
Hardware platform: The hard disk capacity of the data warehouse is usually 2-3 times that of the operating database. Mainframe usually has more reliable performance and stability, and is easy to be combined with legacy systems; The PC server or UNIX server is more flexible, easy to operate and provides the ability to dynamically generate query requests for queries. The question to consider when selecting a hardware platform: Do you want to provide parallel I/O throughput? What is the ability to support multiple CPUs?
Data warehouse DBMS: its ability to store large amounts of data, query performance, and parallel processing How about the support of.
Network structure: the implementation of the data warehouse will generate a lot of data communication on that part of the network segment, so it is necessary to improve the network structure.

Modeling division

Announce
edit
The data modeling of data warehouse can be roughly divided into four stages:
one Business Modeling This part of modeling work mainly includes the following parts:
  • Dividing the business of the whole unit generally defines the business work between each part and clarifies the relationship between each business department according to the division of business departments.
  • Deeply understand the specific business processes within each business department and program them.
  • Propose methods to modify and improve the work process of the business department and program them.
  • The scope of data modeling, the goal and phase division of the entire data warehouse project.
two Domain concept modeling The modeling work of this part mainly includes the following parts:
  • Extract key business concepts and abstract them.
  • Group business concepts and aggregate similar grouping concepts according to the business main line.
  • Refine the grouping concept, clarify the business process within the grouping concept and abstract it.
  • Clarify the relationship between grouping concepts to form a complete domain concept model.
three Logical modeling The modeling work of this part mainly includes the following parts:
  • Materialize the business concept and consider its specific attributes
  • Materialize the event and consider its attribute content
  • Explain materialization and consider its attribute content
four Physical modeling The modeling work of this part mainly includes the following parts:
  • Make corresponding technical adjustments for specific physical platform
  • Adjust the specific platform according to the performance of the model
  • Make corresponding adjustments according to management needs and specific platforms
  • Generate the final execution script and improve it.

Establishment steps

Announce
edit

step

1) Collect and analyze business requirements
Data warehouse value curve
2) Establish data model and physical design of data warehouse
3) Define Data Source
4) Select data warehouse technology and platform
5) Extract, purify, and transform data from operational databases to data warehouses
6) Select Access and Reporting Tools
7) Select database connection software
8) Select data analysis and data presentation software
9) Update data warehouse

Data conversion tool

1) Data conversion tools should be able to read data from various data sources.
2) Support Flat file Index file ,, and legacyDBMS.
3) It can integrate data with different types of data sources as input.
4) With standardized data access interface
5) It is better to have data dictionary Ability to read data in
6) The code generated by the tool must be in the development environment Maintainable in
7) Only the data that meet the specified conditions and the specified part of the source data can be extracted
8) Capable of data type conversion and character set conversion in extraction
9) It can calculate and generate derived fields during extraction
10) Enable the data warehouse management system to automatically call for regular Data extraction Work, or be able to generate results Flat file
11) The vitality and product support capability of software suppliers must be carefully evaluated
Main data extraction tool supplier: Prismsolutions Carleton'sPASSPORT.InformationBuildersInc.'s
EDA/SQL.SASInstituteInc.

key problem

General problems (not completely technical or cultural, but very important) include but are not limited to the following:
What kind of analysis do business users want to perform?
Do the data you are collecting need to support those analyses?
Where is the data?
How clean is the data?
Are there multiple data sources for similar data?
What structure is most suitable for the core data warehouse (such as dimension or relational)?
Technical problems include but are not limited to the following:
How much data will flow in your network? Can it handle it?
How much hard disk space is required?
Will you use solid-state or virtualized storage?

benefit

Announce
edit
Each company has its own data. Moreover, many companies store a large amount of data in the computer system, recording a large amount of information and customer information in the process of enterprise purchase, sales and production. Usually these data are stored in many different places.
After using the data warehouse, the enterprise stores all the collected information in a unique place - the data warehouse. The data in the warehouse is organized in a certain way, which makes the information easy to access and valuable.
Some special software tools have been developed to semi automate the process of data warehouse, help enterprises import data into the data warehouse and use the data that has been stored in the warehouse.
Data warehouse has brought great changes to organizations. The establishment of data warehouse has brought some new Workflow Other processes have also changed.
Data warehouses bring some "data based knowledge" to enterprises. They are mainly used to evaluate market strategies, find new market opportunities for enterprises, and control inventory, check production methods and define customer groups.
Through the data warehouse, you can establish the data model of the enterprise, which is of great significance to the production and sales, cost control and revenue and expenditure distribution of the enterprise. It greatly saves the cost of the enterprise and improves the economic efficiency. At the same time, the data warehouse can be used to analyze the relationship between the enterprise's human resources and basic data, and can be used for return analysis, To ensure the maximum use of human resources, human resources performance evaluation can also be carried out, making enterprise management more scientific and reasonable. The data warehouse organizes the enterprise's data in a specific way to generate new business knowledge and bring new perspectives to the operation of the enterprise.

Pre development

Announce
edit
In the early days of computer development, people have put forward the idea of building a data warehouse. The term "data warehouse" was first proposed by Mr. Bill Inmon in 1990. It is described as follows: data warehouse is a data set specially designed and established to support enterprise decision-making.
Enterprises establish data warehouses to fill the existing data storage The form can no longer meet the needs of information analysis. One of the core concepts of data warehouse theory is: affair The processing performance of type data is different from that of decision support data.
Enterprises collect data in their transactional operations. In the process of enterprise operation, these transactional data are generated continuously with the progress of ordering and sales records. In order to introduce data, we must optimize transactional databases.
When processing decision support data, some questions are often asked: What kind of customers will buy what kind of products? How much will the sales change after the promotion? How much will the sales change after the price changes or the store address changes? In a certain period of time, which products are particularly easy to sell compared with other products? Which customers increased their purchases? Which customers cut their purchases?
Transactional database can answer these questions, but the answers it gives are often not very satisfactory. There is often competition in the use of limited computer resources. When adding new information, we need the transactional database to be idle. When answering a series of specific questions about information analysis, the effectiveness of the system in processing new data will be greatly reduced. Another problem is that transactional data is always changing dynamically. Decision support processing requires relatively stable data, so that questions can be answered consistently and continuously.
The solution of data warehouse includes separating decision supporting data processing from transactional data processing. According to a certain period (usually every night or every weekend), the data is imported from the transactional database into the decision support database, namely "data warehouse". The data warehouse organizes data according to the "theme" of answering some questions of the enterprise, which is the most effective way of data organization.

market analysis

Announce
edit
Basic architecture of data warehouse
For decision support databases data mart It is aimed at a department or project team in the enterprise. Some expert consultants describe the construction of data mart as a step in the whole process of building a data warehouse. First, a data warehouse is created to store all the information of the enterprise. The data has an organized, consistent, and unchanging format. The data mart was then created to provide different departments with the information they needed. The data warehouse gathers all the detailed information, and the data in the data mart is summarized according to the specific needs of users.
Other experts believe that the establishment of a data mart does not require the first establishment of a data warehouse. In this model, data is directly transferred from transactional database to data mart. A company may establish multiple data marts without any connection with each other.
This way of creating a data mart without building a data warehouse will be cheaper and faster, because its scale is easier to manage.
The defect of the second view is that it is unable to achieve the primary purpose of creating a data warehouse at first - to unify all data of the enterprise into a consistent format. The data of existing transaction processing systems are often inconsistent and redundant. If a company wide data warehouse is established first, the organization can obtain a unified knowledge base about enterprise activities and customers. If independent data marts are established first, many advantages of data warehouse can be realized, but enterprises are far from achieving consistent storage of data.

Relationship content

Announce
edit
Connection between the two:
The emergence of data warehouse is not to replace the database. Most data warehouses still use Relational Database Management System To manage. It can be said that database and data warehouse complement each other. [4]
Differences between the two:
1. The starting point is different: database is transaction oriented design; Data warehouse is subject oriented design.
2. The data stored are different: databases generally store online transaction data; The data warehouse generally stores historical data.
3. Different design rules: database design is to avoid redundancy , which is generally designed according to the rules of the paradigm; In the design of data warehouse, redundancy is intentionally introduced and anti normal form is adopted.
4. The functions provided are different: the database is designed to capture data, and the data warehouse is designed to analyze data,
5. The basic elements are different: the basic element of the database is the fact table, and the basic element of the data warehouse is the dimension table.
6. Different capacity: the basic capacity of the database is much smaller than that of the data warehouse.
7. Different service objects: the database is for high efficiency transaction processing It is designed to serve the staff of enterprise business processing; The data warehouse is designed to analyze data for decision-making, and the service object is the high-level decision makers of enterprises.

Representative works

Announce
edit
Sybase - IQ
Oracle - Oracle Database / Oracle Exadata
TeraData - TeraData
IBM - Red Brick
Netezza - Netezza TwinFin
NEC - InfoFrame DWH Appliance
Microsoft - Microsoft SQL Server
Pivotal - Greenplum