Send the link below via email or IMCopy
Present to your audienceStart remote presentation
- Invited audience members will follow you as you navigate and present
- People invited to a presentation do not need a Prezi account
- This link expires 10 minutes after you close the presentation
- A maximum of 30 users can follow your presentation
- Learn more about this feature in our knowledge base article
Do you really want to delete this prezi?
Neither you, nor the coeditors you shared it with will be able to recover it again.
Make your likes visible on Facebook?
You can change this under Settings & Account at any time.
Transcript of Cloudera Impala
data in: HDFS & HBase What is Impala Hadoop Hive Dremel Benefits of Impala Rapid return of information Query data as its being ingested No MapReduce
= low latency Uses its own daemons
to query data directly Impala, Hive, and MapReduce HiveQL Doesn't replace Hive or MR subset of SQL92 1 line Impala query = 100's lines of Java Familiar and unified platform for batch and real-time queries we still need batch Impala features Language similar to HiveQL Supports HDFS & HBase
compressed text, sequence, avro Uses same metadata, ODBC, Hue Beeswax, as Hive Kerberos authentication No SPOF Current limitations No SerDes No UDF's Raises performance bar,
whilst retaining user experience Impala state store coordinates information about all instances of impalad used to find data so the daemons can be used to respond to queries Runs on all datanodes Responds to queries from Impala shell Schedules tasks for optimal execution Updates Impala state store Impala shell Issues queries Perform admin tasks Queries passed via ODBC Trevni columnar binary storage format Impala vs Dremel Distributed scalable aggregation algorithms user decides on flexibility vs pure performance Impala + Trevni = extra awesome! Demo Thanks! @jrkinley email@example.com