Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

From conceptual model towards a Data Vault

Presentation held for the Data Modeling Zone 2013 presentation, Septenmber 24 regarding the combination of conceptual modeling using FCO-IM and the Data Vault approach for Data Warehousing
by

Rob Arntz

on 9 March 2015

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of From conceptual model towards a Data Vault

Modeling Versus Structuring
Models?
According to Oxford Dictionary:

simplified
description
, especially a mathematical one, of a system or process, to assist calculations and predictions
Structuring
The Conceptual Model
Using fact oriented modeling to create a conceptual model that tells you which information is required
Conceptual Modeling
Making a
model
of the Universe of Discourse ...
Emphasis on
which
information is important
Must be
validated by domain experts
as being THEIR Universe of Discourse (not that of the modeler!)
We use FCO-IM
100% Communication Oriented
Do not model
reality,
but model the
communication
of the domain experts
100% Conceptual
Model
all

communication
(nothing less and nothing more!!)
100% Validation
Only possible when using the
language of the domain expert
100% Supported by CASE-tools
Single point of definition of metadata
The Fact Oriented Modeling of an Universe of Discourse
The Data Vault Structure
"The optimal approach for creating the Enterprise Data Warehouse in the DW2.0 framework"
The Data Vault approach
Advantages of Data Vault
Source: www.danlinstedt.com
A typical Data Vault
Structure
Building the Information Frame
Building a framework that ensures:
Full understanding of the Universe of Discourse
Full automation (CASE support) for all structuring
==>
Model to model
transformations!
Fully controlled Metadata Management
==> easy to
maintain discipline

How the Data Vault fits in the Information Management Frame
Looking at the DV Architecture we can have other columns in the Information Management Frame
The automated Transformation
Model to model transformations from Fact Oriented models towards Data Vault have been created by:
Marco Engelbart, Dineke Romeijn (HAN University)
Mark Zwijsen (Atos)
myself
...
Helping to bridge the
Semantic Gap
Having the Corporate Fact Base as container
for all semantics, we can use the Corporate Fact Base to find our way in any (Data Vault) structure derived from the conceptual schema
Conclusions
Combine the Strengths
+ Conceptual modeling ensures
+ You know exactly which information you will obtain in the BI solution
+ Data Vault Methodology ensures
+ An optimal way to get that information to the end users by using the Enterprise Data Warehouse
Use full strength of
Model 2 Model transformation
Keep all conceptual information
Gain all benefits of the structure you're transforming towards
Fully automated (given the input of analysts)
End to end solution starting from the the conceptual model via the Data Vault EDW towards the dimensional Starschemes
Further reading
Typical DV architecture
Consequences for the Information Management Frame (IMF)
Link
Pattern: fact types between two or more HUBs will become LINK
Satellite
Pattern: gather all labeltypes that will be grouped into the associated Hub or Link
An Example
What's the value?
Adding
semantics
to
any
structure
Enables
navigation
through any structure
Easy way to
derive ETL
from one structure to another
Simple form of
Data Virtualization
From conceptual model
towards a Data Vault


Models?
Structuring is not ...
Telling
which
information is required
Hence, can not be validated by any domain expert
Structuring is ...
All about
how
the required information should be stored
To find the right structure is not easy!
A model -- like Erasmus Medical Center -- containing of 1200 fact types will have at least 2 possible different structures
1200
Book "About Fact and Things" by Peter Alons, soon to be published in English.
Courses available via HAN University
Information Management Frame
from model to structure
1. Communicate
2. Specify
4. Validate
3. Model
Contact
Rob Arntz
Business Intelligence Consultant
Atos SI Projects
rob.arntz@atos.net
+31 6 511 843 12
ANSI Model Layers
www.casetalk.com/community
@CaseTalk_dev
CaseTalk
CaseTalk
Bringing them together
Typical Structures Found
See also Peter Alons' Presentation
1. Communicate
2. Specify
3. Model
4. Validate
Modeling
Hub, link, satellite structure
Architecture
Standards for system architecture
Methodology
Project Management
Agile
Implementation aspects
ETL (generation)
Big Data / NoSQL
Source: XR Magazine, September 2011, author: Frank Habersken
Using a Raw and Business Vault
The two of them together form the Enterprise Data Warehouse (Linstedt)
The Raw Vault
Mainly Source Driven
As per Data Vault standard:
Structure based on Business Keys
Business Oriented
Contains Structural Constraints
Uniqueness (keys)
Value (domains)
Dependencies (foreign keys / links)
The Business (Rule) Vault
Non structural Business Rules
All about presenting the "real" Business Rules to convert the Raw information into the Business Domain and hence crossing the Semantic Gap
Source: Defining The Business Data Vault, author: Dan Linstedt
The Mathematical Vault
Completely Source Driven
"Raw raw" ==> Staging Vault ==> "I can create (generate) a DV in 8 hours" Vault
Not based on business keys, rather based on (likely artificial) primary keys in the source
100% derivable from the source (so still 100% of the semantic gap)
Further reading:
http://prudenza.typepad.com/dwh/2011/02/dv-the-case-against-a-raw-data-vault.html
Existing Columns:
Operational, Corporate Fact Base, Data Warehouse, Enduser


Why do they fit that well??
The conceptual model will in itself find
the business keys that are the cornerstone of the Data Vault Modeling Technique
http://www.linkedin.com/pub/rob-arntz
The EDW-DV can't be placed before the corporate fact base as the corporate fact base is based on business keys, the operational column isn't
The modeling cycle in practice
According to us:

A model tells us
which
information is relevant to the Universe of Discourse
Algorithmic Transformations
Completing the Frame
Model to model transformations
(or model to structure transformations?!)
ETL Flow

Metadata Flow
Checks and balances
Validation and
Source Analysis
Conceptual
Logical
Physical
Technical
This 'mapping' possibility is available for all kind of structures. It is particularly important when treating generic models
and 'off the shelf' business models
Using mappings in the Frame
Hubs
Pattern: All object types played by labeltypes after Reducing
Algorithms
Toolset
See article " Semantic Modeling & Generic Structures", Peter Alons and Rob Arntz (chapter 7 of "About Facts and Things")
Some Data Vault links
http://danlinstedt.com/
http://learndatavault.com/
datavaultdirectory
http://dm-unseen.blogspot.de/
http://www.b-eye- network.com/blogs/damhof/
On FCO-IM
Book "Fully Communication Oriented Information Modeling": Bakema, Zwart , Van der Lek
Upcoming book Engelbart, Zwart, Hoppenbrouwers
Courses by HAN University
Agenda
Modeling versus Structuring
Fact Oriented Modeling
Using the Data Vault structure
Building up the Information Management Frame
Model to Model transformations
Bridging the Semantic Gap
Wrap up (combine strengths)
Lots of structural aspects (metadata) that allows for instance audit ability and traceability
Can this structure be validated by a domain expert? When is it necessary?
HUB
HUB
HUB
HUB
HUB
HUB
LINK
LINK
LINK
Pattern: group groupable fact types that are played by a HUB and non HUB. Check again if fact types arose between two HUBs`. These fact types will also become part of the link
LINK
HUB
HUB
HUB
LINK
LINK
LINK
LINK
SAT
SAT
Date and
Time can be
taken into this
transaction link as degenerated
fields or being separated into a new hub Time
We have an answer!
From: Super Charge Your Data Warehouse, Dan Linstedt, 2011

All required information, in the form of fact stating sentences, is present in the Conceptual Model. Business keys, but also all other non-key attributes and relationships are present in the Conceptual model. That's no surprise: we modeled the Universe of Discourse

Choices in Structuring
Degenerating attributes in Transaction Links
Splitting Satellites based on Originating Source systems
Splitting Satellites based on Frequency of Change
Number of Hubs based upon "Hub"-factor
We found Hubs
(and business keys!)
Links added
Satellites Included
The resulting
Data Vault
Enrich the data vault structure with all meta data to ensure complete audit ability and traceability 100% of the time
``
"The amount of blood used in the blood transfusion started on 7-4-2006 on 11:23:17 hrs for patient 3427839 was 500 ml."
"The amount of blood used in the blood transfusion started on <13.15: Date> on <13.16: Time> hrs for patient <13.17.3: Patient number> was <14.18: Nr of ml> ml."
SELECT "The amount of blood used in the blood transfusion started on <13.15:
th.DATE
> on <13.16:
th.TIME
> hrs for patient <13.17.3:
ph.PATIENT_NUMBER
> was <14.18:
bts.NR_OF_ML
> ml."
FROM BLOOD_TRANSFUSION_LINK, btl,
PATIENT_HUB ph, TIME_HUB th,
PATIENT_SAT ps, BLOOD_TRANSFUSION_SAT bts
WHERE btl.PATIENT_HUB_SEQ = ph.PATIENT_HUB_SEQ
AND btl.TIME_HUB_SEQ = th.TIME_HUB_SEQ
AND bts.BLOOD_TRANSFUSION_LNK_SEQ = btl.BLOOD_TRANSFUSION_LNK_SEQ
AND bts.BLOOD_TRANSFUSION_LOAD_END_DATETIME is null /* Actual = Yes */
AND ps.PATIENT_HUB_SEQ = ph.PATIENT_HUB_SEQ
AND ps.PATIENT_LOAD_END_DATETIME is null /* Actual = Yes */
One more example
"Patient 3427839 has first name Pieter."
"Patient <4.3: Patient number> has first name <2: First name>."
SELECT "Patient <4.3: Patient number> has first name <2: First name>."
FROM PATIENT_HUB ph,
PATIENT_SAT ps
WHERE ps.PATIENT_HUB_SEQ = ph.PATIENT_HUB_SEQ
AND ps.PATIENT_LOAD_END_DATETIME is null
What would happen if we want to combine information across facts?
For instance: our interest is in the nr of ml blood per first name ...
(Sounds interesting isn't it?)



The Conceptual Model contains every building block that is required
So let's create great structures!
What would we obtain if we look for the word "models" on Google??
Placing the Mathematical Vault in the IMF
Operational,
Mathematical Vault
, Corporate Fact Base, Data Warehouse, Enduser
Or even:
Mathematical Vault
, Corporate Fact Base, Data Warehouse, Enduser
Placing the Data Vault EDW in IMF
Operational, Corporate Fact Base,
EDW
, Enduser
From the conceptual model we can obtain the path in which the information is available ... the
relevant facts
for this inquiry. In this case Amount of blood in Transfusion and Patient First name linked via the objects Transfusion and Patient. The resulting query would become a join of the 2 previous obtained queries joined on role number 3 (Patient number).
Full transcript