Send the link below via email or IMCopy
Present to your audienceStart remote presentation
- Invited audience members will follow you as you navigate and present
- People invited to a presentation do not need a Prezi account
- This link expires 10 minutes after you close the presentation
- A maximum of 30 users can follow your presentation
- Learn more about this feature in our knowledge base article
Do you really want to delete this prezi?
Neither you, nor the coeditors you shared it with will be able to recover it again.
Make your likes visible on Facebook?
Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.
From conceptual model towards a Data Vault
Transcript of From conceptual model towards a Data Vault
According to Oxford Dictionary:
, especially a mathematical one, of a system or process, to assist calculations and predictions
The Conceptual Model
Using fact oriented modeling to create a conceptual model that tells you which information is required
of the Universe of Discourse ...
information is important
validated by domain experts
as being THEIR Universe of Discourse (not that of the modeler!)
We use FCO-IM
100% Communication Oriented
Do not model
but model the
of the domain experts
(nothing less and nothing more!!)
Only possible when using the
language of the domain expert
100% Supported by CASE-tools
Single point of definition of metadata
The Fact Oriented Modeling of an Universe of Discourse
The Data Vault Structure
"The optimal approach for creating the Enterprise Data Warehouse in the DW2.0 framework"
The Data Vault approach
Advantages of Data Vault
A typical Data Vault
Building the Information Frame
Building a framework that ensures:
Full understanding of the Universe of Discourse
Full automation (CASE support) for all structuring
Model to model
Fully controlled Metadata Management
==> easy to
How the Data Vault fits in the Information Management Frame
Looking at the DV Architecture we can have other columns in the Information Management Frame
The automated Transformation
Model to model transformations from Fact Oriented models towards Data Vault have been created by:
Marco Engelbart, Dineke Romeijn (HAN University)
Mark Zwijsen (Atos)
Helping to bridge the
Having the Corporate Fact Base as container
for all semantics, we can use the Corporate Fact Base to find our way in any (Data Vault) structure derived from the conceptual schema
Combine the Strengths
+ Conceptual modeling ensures
+ You know exactly which information you will obtain in the BI solution
+ Data Vault Methodology ensures
+ An optimal way to get that information to the end users by using the Enterprise Data Warehouse
Use full strength of
Model 2 Model transformation
Keep all conceptual information
Gain all benefits of the structure you're transforming towards
Fully automated (given the input of analysts)
End to end solution starting from the the conceptual model via the Data Vault EDW towards the dimensional Starschemes
Typical DV architecture
Consequences for the Information Management Frame (IMF)
Pattern: fact types between two or more HUBs will become LINK
Pattern: gather all labeltypes that will be grouped into the associated Hub or Link
What's the value?
through any structure
Easy way to
from one structure to another
Simple form of
From conceptual model
towards a Data Vault
Structuring is not ...
information is required
Hence, can not be validated by any domain expert
Structuring is ...
the required information should be stored
To find the right structure is not easy!
A model -- like Erasmus Medical Center -- containing of 1200 fact types will have at least 2 possible different structures
Book "About Fact and Things" by Peter Alons, soon to be published in English.
Courses available via HAN University
Information Management Frame
from model to structure
Business Intelligence Consultant
Atos SI Projects
+31 6 511 843 12
ANSI Model Layers
Bringing them together
Typical Structures Found
See also Peter Alons' Presentation
Hub, link, satellite structure
Standards for system architecture
Big Data / NoSQL
Source: XR Magazine, September 2011, author: Frank Habersken
Using a Raw and Business Vault
The two of them together form the Enterprise Data Warehouse (Linstedt)
The Raw Vault
Mainly Source Driven
As per Data Vault standard:
Structure based on Business Keys
Contains Structural Constraints
Dependencies (foreign keys / links)
The Business (Rule) Vault
Non structural Business Rules
All about presenting the "real" Business Rules to convert the Raw information into the Business Domain and hence crossing the Semantic Gap
Source: Defining The Business Data Vault, author: Dan Linstedt
The Mathematical Vault
Completely Source Driven
"Raw raw" ==> Staging Vault ==> "I can create (generate) a DV in 8 hours" Vault
Not based on business keys, rather based on (likely artificial) primary keys in the source
100% derivable from the source (so still 100% of the semantic gap)
Operational, Corporate Fact Base, Data Warehouse, Enduser
Why do they fit that well??
The conceptual model will in itself find
the business keys that are the cornerstone of the Data Vault Modeling Technique
The EDW-DV can't be placed before the corporate fact base as the corporate fact base is based on business keys, the operational column isn't
The modeling cycle in practice
According to us:
A model tells us
information is relevant to the Universe of Discourse
Completing the Frame
Model to model transformations
(or model to structure transformations?!)
Checks and balances
This 'mapping' possibility is available for all kind of structures. It is particularly important when treating generic models
and 'off the shelf' business models
Using mappings in the Frame
Pattern: All object types played by labeltypes after Reducing
See article " Semantic Modeling & Generic Structures", Peter Alons and Rob Arntz (chapter 7 of "About Facts and Things")
Some Data Vault links
Book "Fully Communication Oriented Information Modeling": Bakema, Zwart , Van der Lek
Upcoming book Engelbart, Zwart, Hoppenbrouwers
Courses by HAN University
Modeling versus Structuring
Fact Oriented Modeling
Using the Data Vault structure
Building up the Information Management Frame
Model to Model transformations
Bridging the Semantic Gap
Wrap up (combine strengths)
Lots of structural aspects (metadata) that allows for instance audit ability and traceability
Can this structure be validated by a domain expert? When is it necessary?
Pattern: group groupable fact types that are played by a HUB and non HUB. Check again if fact types arose between two HUBs`. These fact types will also become part of the link
Time can be
taken into this
transaction link as degenerated
fields or being separated into a new hub Time
We have an answer!
From: Super Charge Your Data Warehouse, Dan Linstedt, 2011
All required information, in the form of fact stating sentences, is present in the Conceptual Model. Business keys, but also all other non-key attributes and relationships are present in the Conceptual model. That's no surprise: we modeled the Universe of Discourse
Choices in Structuring
Degenerating attributes in Transaction Links
Splitting Satellites based on Originating Source systems
Splitting Satellites based on Frequency of Change
Number of Hubs based upon "Hub"-factor
We found Hubs
(and business keys!)
Enrich the data vault structure with all meta data to ensure complete audit ability and traceability 100% of the time
"The amount of blood used in the blood transfusion started on 7-4-2006 on 11:23:17 hrs for patient 3427839 was 500 ml."
"The amount of blood used in the blood transfusion started on <13.15: Date> on <13.16: Time> hrs for patient <13.17.3: Patient number> was <14.18: Nr of ml> ml."
SELECT "The amount of blood used in the blood transfusion started on <13.15:
> on <13.16:
> hrs for patient <13.17.3:
> was <14.18:
FROM BLOOD_TRANSFUSION_LINK, btl,
PATIENT_HUB ph, TIME_HUB th,
PATIENT_SAT ps, BLOOD_TRANSFUSION_SAT bts
WHERE btl.PATIENT_HUB_SEQ = ph.PATIENT_HUB_SEQ
AND btl.TIME_HUB_SEQ = th.TIME_HUB_SEQ
AND bts.BLOOD_TRANSFUSION_LNK_SEQ = btl.BLOOD_TRANSFUSION_LNK_SEQ
AND bts.BLOOD_TRANSFUSION_LOAD_END_DATETIME is null /* Actual = Yes */
AND ps.PATIENT_HUB_SEQ = ph.PATIENT_HUB_SEQ
AND ps.PATIENT_LOAD_END_DATETIME is null /* Actual = Yes */
One more example
"Patient 3427839 has first name Pieter."
"Patient <4.3: Patient number> has first name <2: First name>."
SELECT "Patient <4.3: Patient number> has first name <2: First name>."
FROM PATIENT_HUB ph,
WHERE ps.PATIENT_HUB_SEQ = ph.PATIENT_HUB_SEQ
AND ps.PATIENT_LOAD_END_DATETIME is null
What would happen if we want to combine information across facts?
For instance: our interest is in the nr of ml blood per first name ...
(Sounds interesting isn't it?)
The Conceptual Model contains every building block that is required
So let's create great structures!
What would we obtain if we look for the word "models" on Google??
Placing the Mathematical Vault in the IMF
, Corporate Fact Base, Data Warehouse, Enduser
, Corporate Fact Base, Data Warehouse, Enduser
Placing the Data Vault EDW in IMF
Operational, Corporate Fact Base,
From the conceptual model we can obtain the path in which the information is available ... the
for this inquiry. In this case Amount of blood in Transfusion and Patient First name linked via the objects Transfusion and Patient. The resulting query would become a join of the 2 previous obtained queries joined on role number 3 (Patient number).