Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

Pig, the Good Parts

A short introduction of Hadoop Pig
by

Zoltan C. Toth

on 23 May 2016

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Pig, the Good Parts

Pig: the Good Parts
Zoltán Tóth
Alternatives
Pig
Fine tuning

Basic commands
more Pig
A simple Pig script
Hive
Pig
RHadoop
Hadoop
SQL like interface
Structured logging?
"Everybody knows SQL"
Looks nice

but you need to know R
native
Works on unstructured data
very and simple logic
Extend it if you want to go deep
a pipeline-like system for data
input
a
b
c
output
FROM
WHERE
SELECT
DISTINCT
INTO
example script
in practice
LOAD
STORE
DUMP
FOREACH
DISTINCT
LIMIT
ORDER BY
UNION
JOIN
left,right
inner,outer
cross join
GROUP - FOREACH
Nested Foreach
UDFs
It just works *
*not
Thanks!
Reduce side
Replicated
Skewed
Merge
Select your Joins wisely
Manually the number of reducers
PARALLEL
DEBUG
describe
illustrate
explain
Best practices
Project early and often
Filter early and often
Select JOIN method manually
Use the PARALLEL feature
SET default_parallel;
Don't make any errors.
Runtime error reporting is very poor
https://www.linkedin.com/in/zoltanctoth
Full transcript