Introducing 

Prezi AI.

Your new presentation assistant.

Refine, enhance, and tailor your content, source relevant images, and edit visuals quicker than ever before.

Loading…
Transcript

High Throughput BigData Pub/Sub

  • Operate on message batches
  • Compress at batch level
  • Broker does not track delivery
  • Broker retains data on its own terms
  • Typically time or space bound
  • Removes oldest data first
  • Consumers pull from broker
  • Streaming or Batch
  • Track own progress

Apache Kafka

Schema Evolution on Streaming Data

+

Decoupling High Volume Data

Production and Consumption

Scott Carey

scottcarey@apache.org

Apache Avro

Kafka

Scalable Design

  • Data organized by Topic
  • Topics can have multiple partitions
  • Reads are delivered in consistent order per partition
  • Partitions scale horizontally at each level

Producer

Request Id for:

Topic 4

Produce Data with Schema:

record Coffee {

brand = null;

float ounces;

boolean caffeinated = true;

}

record Coffee {

string brand = "";

float ounces;

boolean caffeinated = true;

}

Response:

"1"

Topic 3

Kafka Broker

Acquire SchemaID from Schema Repo

id = "1"

Schema ID

Message Payload

Send all Messages Tagged with ID

Topic 2

  • Expressive schemas
  • Efficient, compact binary serialization
  • Fields are not tagged
  • More compact
  • Potentially faster
  • Schema as written must be known at read time
  • Code generation is optional

Avro Data Serialization

Avro

Topic 1

Schema Resolution

Read data written with schema A using compatible schema B.

  • Ignore data in A not specified in B
  • Apply default values to fields in B not present in A
  • Promote types
  • Reorder fields

Avro

Schema Repository

  • Maps Schemas to Schema IDs
  • Validates Schema Compatibility
  • REST interface
  • See AVRO-1124

Consumer

Consume Data with Schema:

  • Read the schema Id tag
  • Look up the schema in the Repository (cacheable)
  • Generate an Avro schema reader for each reader-writer schema pair, as needed.
  • Read the payload with the appropriate reader.

For each record from Kafka:

record Coffee {

float ounces;

boolean caffeinated = true;

string countryOfOrigin = "";

}

Result:

Consumers Decoupled from Producers

Compatible schema changes decouple Producers from Consumers.

  • As new Producers come online or are upgraded, they can evolve the schema.
  • Consumers do not have to change at the same time.
  • Likewise, Consumers can change the schema they use to interpret the data with no producer changes.
  • Invalid schema evolution caught by Repository Server validation.
  • Choose your own schema and validation rules

Message Payload

Learn more about creating dynamic, engaging presentations with Prezi