Introducing 

Prezi AI.

Your new presentation assistant.

Refine, enhance, and tailor your content, source relevant images, and edit visuals quicker than ever before.

Loading…
Transcript

OLTP

  • Access a few records at a time
  • Read and write activity
  • Fast, small inserts and updates
  • Thousands of concurrent users
  • Application managed logic
  • Database is just a database

Data Warehouse

  • Queries scan lots of data
  • Results are typically summary
  • Business questions as queries
  • Lots of rank, sort, filter, join
  • Find and use new relationships
  • Database is the application

Why is a data warehouse different?

Subject oriented

Non-volatile

Integrated access to data

Time variant tracking of historical activity

Complex queries

Unpredictable workload

What is a data warehouse?

UMSL Programming Club

Data Warehousing 101

by Paul Boal

Sept. 28, 2010

What's special about data

warehouse technology?

OLAP

What's challenging about data

warehouse programming?

Complex query criteria:

Who are the patients that are visiting our new Walmart clinics? Have we seen them before? Are they following up with a referal? Are they showing up for their referal appointment? How does our revenue and outcomes from them compare with revenue from patients who come in through the in-hospital clinics versus efree standing clinics?

Data Warehouse Appliances

Hash Partitioning

Column-Store

Shared-Nothing Parallelism

Range Partitioning

Workload Management

Classic algorithm:

1. Find all patients who went to Walmart

2. For each patient:

--a. Lookup their activity for prior visits

--b. Lookup their follow-up referals

--c. Lookup their follow-up activity

Value-based Compression

Non-Relational

Sometimes better algorithm:

1. Scan through all activity

--a. Store "first activity" in memory

--b. Store "first Walmart visit" in memory

--c. Store "follow-ups" in memory

2. Summarize

Other Important Things to Know

Principles:

  • Minimize the amount of data being retreived.
  • Distribute the work as widely as possible.
  • Access each row only once.

Techniques:

  • Alternate "window" aggregation functions.
  • Case/When statements in place of aggregation.
  • Techniques to avoid self-joins on large tables.
  • Techniques to avoid scanning larget tables more than once.
  • Recursive SQL.

How does parallelism and workload mangement help?

Where Do You Learn This?

  • Internships
  • Database theory classes
  • Experience
  • Creativity, trial and error
Learn more about creating dynamic, engaging presentations with Prezi