Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

The Integrated Data Hub: The Next Generation Data. Warehouse (Amsterdam Version)

IDH
by

Dario MANGANO

on 22 May 2013

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of The Integrated Data Hub: The Next Generation Data. Warehouse (Amsterdam Version)

The Next Generation Data Warehouse The Integrated Data Hub Introduction To Data Vault IDH Important Concepts SERVICE LAYER METADATA LAYER Core
DWH Presentation
Layer Access
Layer SOURCES
SYSTEMS TARGETS
SYSTEMS Staging
Layer HiST
Layer Orchestrator Business Rules
Engine MDM
Engine Archiving Data
Lineage Impact
Analysis Data
Vault IDH Concepts Hub Logical
Unit of
Work Services Metadata
Driven
Automation Data
Lineage Message
Oriented
Layer Data
Canonicalization The Data Warehouse Schism

Inmon and Kimball The New Data Warehousing‘s Challenges The Message Oriented Layer

Another important concept of the IDH. The ability to the Data Hub to communicate in all direction using canonical data embedded into messages. Meta Data Driven Automation

This concept would allow avoiding the use of traditional ETL engine. The idea here is to describe the transformations you want to see applied to the data, and the orchestrator service will do the rest. Business Ownership and Natural Language

Another important concept is the fact that all business rules are described in a natural language by the owner of the business rule, normally the business. These business rules must be historized in order to reload the EDW with a correct picture of the past. The Metadata Driven Automation service will manage the application of these business rules. The Staging Layer

You will find here mainly a copy the source systems tables and a delta load of the data needed to refresh the EDW. Small transformations can occur in a second phase of the staging layer. The Core Data Warehouse Layer

This is where the data integration and the decoupling with the source systems are achieved. The Data Model is mainly business oriented. Historization and data lineage are performed here also. The Presentation Layer

Analytical: This is where you will find the data marts designed to answer business questions.

Operational: This is where you will find detailed integrated information to perform Operational Reporting The Access Layer

Mainly composed of a corporate semantic layer. The Metadata Layer

This is where we record every action done on and with the data going through the IDH. The main purpose of this layer is to allow full data lineage and impact analysis. The Service Layer

This is one of the main concepts behind the IDH. You will find the MDM engine, The Business Rules Engine, the Orchestration services, etc. The HiST Layer

Here you will find a complete historization of the staging layer, in order to provide a valid source to reload the data warehouse. The Logical Unit Of Work

This is the biggest part of code that is allowed to be written by an ETL developer. It must be designed keeping three main concepts in mind: Restartability, Parallelism, and Orchestration. A Logical Unit Of Work is supposed to do only data movement from one border of a layer, to the other border of the following layer. All other parasites work has to be performed by a service. This is the IDH ;-)
Simple isn't it ? :-) Soon on
AMAZON.COM ! May I Have Your Questions ? THANK
YOU ! The
Integrated
Data Hub The Next Generation Data Warehouse "A" "O" The Integrated Data Hub Defined

The Integrated Data Hub is a Hub of Integrated Data. A Hub because, it is the place where all the data of the enterprise, that have to be validated by corporate business rules will have to go through. A Hub because, the IDH communicate with the surrounding environment in every direction. West with the source systems, North with the Service layer, East with the target system, mainly Business Intelligence but also other operational systems and South with the Metadata layer. Integrated, because, the main purpose of the IDH is to give the user a single version of the truth. You will thus find all data related to a business concept in a single place. We could add to the concept of Integrated Data, those of Non Volatile, Time Variant and Subject Oriented, to stick with the Inmon’s definition of the Enterprise Data Warehouse. The Integrated Data Hub is a reference architecture for Data Integration but it is also a set of concepts, development framework, best practices and other technical considerations. This Data Integration Reference Architecture is based on different layers, each one having its own and only purpose.

In addition to architectural considerations, you will find behind the IDH concept a set of guidelines and best practices to ensure that this architecture is built to answer the future challenges of Data Integration. This architecture was thought to be most relevant to when:

Your EDWH has to be fault resistant
Your EDWH is supposed not to be impacted by changes in the surrounding environment, and particularly when new sources are added frequently or existing ones change often.
Business rules are subject to change over time.
Auditability is required.
The structure of arrival data is not always known.
Historization and time slices are required. Data
Vault TM TM TM IDH TM Dario Mangano
Full transcript