Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

Information Server for Data Quality

Information Server for Data Quality Prezi
by

Miguel Ortiz

on 8 January 2013

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Information Server for Data Quality

Data In => Clean Information Out IBM Data Quality Process Access & Discover Basic assessment report for any data. Validate Business focused data quality dashboard. Cleanse & Enrich Cleansed data for any use case anywhere. Master Consistent & cleansed data across organization. Monitor/Track Continuously consistent & cleansed data. Define Objectives 1 2 3 4 5 6 Process Overview and Demo Information Server IBM InfoSphere Intro Introduction and Use Cases Best Practices Mind Map Data Quality Best Practices Best Practices and Discussion Topics InfoSphere Information Server Data Quality Intro Process Best Practices Define Monitor Master Cleanse Validate Understand Best Practices Analyze Monitor Information
Analyzer Data Quality Console Cleanse QualityStage Don’t try to do everything
Focus on what adds value
Consider correction cost before measuring
Zero defects? It’s All About The Business
Typically difficult to get funding for IT driven initiatives
Solve a business problem
Not all of them Best Practices – Information Analyzer Snapshot versus operational
Information Analyzer identifies issues
Control flow in the Data Rule stage in DataStage/QualityStage Best Practices – Information Analyzer Rule sets
Collection of rules that for the same data
Reads the data once and applies each rule in the rule set
Works best if all rules are coded ‘positively’ Best Practices – Information Analyzer Data Rules
Use predefined rules
Approximately 200 rules
In categories
Code ‘positively’
The rule should identify what is valid Best Practices – Information Analyzer Establish an initial understanding
‘Profiling’
16k-20k random sample is usually good enough
<20m just use full volume
Use Data Classification
Provides focus
Identifiers: uniqueness, duplicates, nulls
Codes: nulls
Indicators: valid
Extended data class
Spanish NIF
Canadian SIN
Master Card number
…and many more
Drive what and how for data validation rules Best Practices – Information Analyzer Frequencies on full volume
Match Designer Database
32k page size
UTF8 Use the Match Wizard
Match specification
Complete jobs
US only, but can be used to learn Best Practices – QualityStage Establish an initial understanding (continued)
Zero unhandled data? Best Practices – QualityStage Establish an initial understanding (continued)
Use the Standardization Rules Designer to repair Best Practices – QualityStage Deploying QualityStage processes as Web Services
Use single-node configuration file Best Practices – QualityStage Use the Match Wizard (continued)
Match specification
Complete jobs
US only, but can be used to learn Best Practices – QualityStage Establish an initial understanding (continued)
Use the Standardization Rules Designer to repair Best Practices – QualityStage Establish an initial understanding (continued)
Identify ‘unhandled data’ Best Practices – QualityStage Establish an initial understanding
Focus on free-form text
Let Information Analyzer handle single-domain fields
‘Domain Validation’ Best Practices – QualityStage Deploying QualityStage processes as Web Services (continued)
Keep applications simple to start Best Practices – QualityStage Deploying QualityStage processes as Web Services
Use single-node configuration file Best Practices – QualityStage ‘When’
Consolidate multiple projects Best Practices – Data Quality Console ‘Who’
Production projects only Best Practices – Data Quality Console Performance considerations (continued)
Do not show ALL exception records
Use saved searches Best Practices – Data Quality Console Performance considerations
Only log when necessary Best Practices – Data Quality Console ‘What’
The items that impact your business
Only record exceptions
Non-exceptions rarely help
Identifying information Best Practices – Data Quality Console Miguel A. Ortiz Jr.
maortiz@us.ibm.com Consolidate, track & report on data quality exceptions & metrics across Information Server. Business Gain & manage business perspective about information and align with IT: leading technology for business-friendly access & pre-packaged terms.
time-saving Industry Models for warehouses in key industries. Process Guide projects with best practices to achieve goals with reduced risk: unique capability to architect information projects with embedded methodology that can be tracked. Analyze and Validate Use source system analysis to understand your issues: automated discovery of critical data and hidden data relationships.
govern your data with sophisticated data validation rules and metrics Assess and Discover Discover data structures and understand your lineage to manage compliance: unique capability for discovering business objects.
assess data in multiple dimensions such as consistency, completeness, redundancy and validity. Cleanse & Enrich Investigate, standardize: most comprehensive & customizable solution.
enable business users to investigate & standardized any data. Master Match and survive data: most comprehensive & customizable solution.
supports Probabilistic (fuzzy) and Deterministic matching
integration with Master Data Management (MDM) and Product Information Management (PIM) solutions. Data Quality Console A single view of all activity and status at a job, engine and OS resources level across Information Server Operations Console
Full transcript