Introducing

Prezi AI.

Your new presentation assistant.

Refine, enhance, and tailor your content, source relevant images, and edit visuals quicker than ever before.

High Volume Data and Schema Evolution

Scott Carey

Updated April 10, 2013

Transcript

High Throughput BigData Pub/Sub

Operate on message batches
Compress at batch level
Broker does not track delivery
Broker retains data on its own terms
Typically time or space bound
Removes oldest data first
Consumers pull from broker
Streaming or Batch
Track own progress

Apache Kafka

Schema Evolution on Streaming Data

+

Decoupling High Volume Data

Production and Consumption

Scott Carey

scottcarey@apache.org

Apache Avro

Kafka

Scalable Design

Data organized by Topic
Topics can have multiple partitions
Reads are delivered in consistent order per partition
Partitions scale horizontally at each level

Producer

Request Id for:

Topic 4

Produce Data with Schema:

record Coffee {

brand = null;

float ounces;

boolean caffeinated = true;

}

record Coffee {

string brand = "";

float ounces;

boolean caffeinated = true;

}

Response:

"1"

Topic 3

Kafka Broker

Acquire SchemaID from Schema Repo

id = "1"

Schema ID

Message Payload

Send all Messages Tagged with ID

Topic 2

Expressive schemas
Efficient, compact binary serialization
Fields are not tagged
More compact
Potentially faster
Schema as written must be known at read time
Code generation is optional

Avro Data Serialization

Avro

Topic 1

Schema Resolution

Read data written with schema A using compatible schema B.

Ignore data in A not specified in B
Apply default values to fields in B not present in A
Promote types
Reorder fields

Avro

Schema Repository

Maps Schemas to Schema IDs
Validates Schema Compatibility
REST interface
See AVRO-1124

Consumer

Consume Data with Schema:

Read the schema Id tag
Look up the schema in the Repository (cacheable)
Generate an Avro schema reader for each reader-writer schema pair, as needed.
Read the payload with the appropriate reader.

For each record from Kafka:

record Coffee {

float ounces;

boolean caffeinated = true;

string countryOfOrigin = "";

}

Result:

Consumers Decoupled from Producers

Compatible schema changes decouple Producers from Consumers.

As new Producers come online or are upgraded, they can evolve the schema.
Consumers do not have to change at the same time.
Likewise, Consumers can change the schema they use to interpret the data with no producer changes.
Invalid schema evolution caught by Repository Server validation.
Choose your own schema and validation rules

Message Payload

Choose a template

Whiteboard (AI Assisted)

Unleash creativity and collaboration with our Whiteboard Prezi AI-assisted presentation template, seamlessly combining the simplicity of a traditional whiteboard with the power of digital innovation for dynamic and interactive visual storytelling.

Constellations (AI Assisted)

Illuminate your ideas with our captivating Constellations Prezi AI-assisted presentation template, merging celestial elegance with professional design to elevate your content and guide your audience through a stellar visual experience.

Hiking Journey (AI Assisted)

Elevate your presentations with our immersive Hiking Journey Prezi AI-assisted presentation template, meticulously crafted to showcase the beauty of your adventures, from scenic trails to breathtaking landscapes, providing a visually compelling experience for every outdoor enthusiast.

See more templates →

Presentations from around the world

EUSEBI E BONOMI

Pietro Giubbini

Symantec VIP

tarcisse eddy

Research Paper Sources

Amber Pound

See staff picks →

Learn more about creating dynamic, engaging presentations with Prezi

Why Prezi is better