Send the link below via email or IMCopy
Present to your audienceStart remote presentation
- Invited audience members will follow you as you navigate and present
- People invited to a presentation do not need a Prezi account
- This link expires 10 minutes after you close the presentation
- A maximum of 30 users can follow your presentation
- Learn more about this feature in our knowledge base article
UMSL DW 101
Transcript of UMSL DW 101
warehouse programming? What's special about data
warehouse technology? Complex queries
Unpredictable workload Integrated access to data
Time variant tracking of historical activity Subject oriented
Non-volatile Why is a data warehouse different? OLTP
Access a few records at a time
Read and write activity
Fast, small inserts and updates
Thousands of concurrent users
Application managed logic
Database is just a database Data Warehouse
Queries scan lots of data
Results are typically summary
Business questions as queries
Lots of rank, sort, filter, join
Find and use new relationships
Database is the application Complex query criteria:
Who are the patients that are visiting our new Walmart clinics? Have we seen them before? Are they following up with a referal? Are they showing up for their referal appointment? How does our revenue and outcomes from them compare with revenue from patients who come in through the in-hospital clinics versus efree standing clinics? Classic algorithm:
1. Find all patients who went to Walmart
2. For each patient:
--a. Lookup their activity for prior visits
--b. Lookup their follow-up referals
--c. Lookup their follow-up activity Sometimes better algorithm:
1. Scan through all activity
--a. Store "first activity" in memory
--b. Store "first Walmart visit" in memory
--c. Store "follow-ups" in memory
2. Summarize Value-based Compression Shared-Nothing Parallelism Column-Store Workload Management Range Partitioning Hash Partitioning OLAP Data Warehouse Appliances Non-Relational Principles:
Minimize the amount of data being retreived.
Distribute the work as widely as possible.
Access each row only once.
Alternate "window" aggregation functions.
Case/When statements in place of aggregation.
Techniques to avoid self-joins on large tables.
Techniques to avoid scanning larget tables more than once.
Recursive SQL. Other Important Things to Know Where Do You Learn This? Internships
Database theory classes
Creativity, trial and error How does parallelism and workload mangement help? by Paul Boal
Sept. 28, 2010 UMSL Programming Club