Loading presentation...

Present Remotely

Send the link below via email or IM


Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.


Cloudera Impala

An introduction to Impala. Cloudera's open-source, real-time, interactive SQL query engine for Apache Hadoop.

James Kinley

on 5 November 2012

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Cloudera Impala

Beyond batch What's wrong with batch? impalad The benefits of Impala What is Impala Impala's components Demo Beyond Batch open source, real-time use Hive for SQL load data every 90 minutes or less use HBase for real-time data access need faster queries move data into RDBMS for interactive SQL want single platform 67% 51% 54% 78% 62% 71% Fast, interactive ad-hoc queries for Apache Hadoop Use HiveQL to directly query
data in: HDFS & HBase What is Impala Hadoop Hive Dremel Benefits of Impala Rapid return of information Query data as its being ingested No MapReduce
= low latency Uses its own daemons
to query data directly Impala, Hive, and MapReduce HiveQL Doesn't replace Hive or MR subset of SQL92 1 line Impala query = 100's lines of Java Familiar and unified platform for batch and real-time queries we still need batch Impala features Language similar to HiveQL Supports HDFS & HBase
compressed text, sequence, avro Uses same metadata, ODBC, Hue Beeswax, as Hive Kerberos authentication No SPOF Current limitations No SerDes No UDF's Raises performance bar,
whilst retaining user experience Impala state store coordinates information about all instances of impalad used to find data so the daemons can be used to respond to queries Runs on all datanodes Responds to queries from Impala shell Schedules tasks for optimal execution Updates Impala state store Impala shell Issues queries Perform admin tasks Queries passed via ODBC Trevni columnar binary storage format Impala vs Dremel Distributed scalable aggregation algorithms user decides on flexibility vs pure performance Impala + Trevni = extra awesome! Demo Thanks! @jrkinley kinley@cloudera.com
Full transcript