Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

Data Bootcamp - Team Intro, working with data

No description
by

on 24 February 2015

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Data Bootcamp - Team Intro, working with data

stable data infrastructure
accurate business metrics
data warehouse
self-service tools
to transform data
from logs to charts
charts
insights
Data Team
Goals
Our responsibilities
Hadoop
tools for analyzing data on s3
store logs on s3
#Infrastructure
dedicated data-related
machines
Redshift
#Data
framework for automated
data transformation
reporting tools
(Chart.io)
Flowkeeper
etl.prezi.com
hadoopclient.prezi.com
Your responsibilities
Write your own job, test it, deploy it
Keep it healthy
Create charts, reports, etc.
Help your team understanding them!
Produce nice and tidy (structured) logs
Data @ Prezi
Working with data
s3://dataservice-logs/category/category-{$date}_000{$hour}
Logs
unstructured*, textual data
sorted, cleaned, 'structured'
Logbox
https://github.com/prezi/logbox
s3://dataservice-logs/sorted_log/
Redshift
load to
data warehouse
Amazon's data warehouse solution
PostgreSQL
Quite fast, but expensive
Pig
process further
create charts, dashboards*
*Tamas Imre (MX) can help
SQL-like language for creating MapReduce programs for Hadoop
http://pig.apache.org/
slow, but deals with any data
Automatization
Flowkeeper
hourly or daily jobs
ETL
old data pipeline framework
Don't
put any
new jobs here!
proper dependency handling
https://github.com/prezi/flowkeeper-user-jobs
https://github.com/prezi/etl
Q
&
A
You can reach us:
datasupport@prezi.com

core-data @ HipChat
s3cat, catlog, s3cmd, s3tac, piggrep
http://wiki.prezi.com/index.php?title=Main_Page#Data
https://github.com/prezi/flowkeeper/tree/master/docs
Read our projects' tutorial
(ask if something is unclear)
#DataInfrastructure
#MetricsTeam
https://github.com/prezi/flowkeeper-core-jobs/blob/master/jobs/logbox/user-rules.template.json
Sorted logs
eg. https://chartio.com/prezi/online-marketing-chargebacks/
https://github.com/prezi/flowkeeper/tree/master/docs
How to start it?
Workflow:

write your job,
test it on etl.prezi.com
push to
if jenkins is green, deploy with
https://missioncontrol.prezi.com/
#Tooling
https://analytics.prezi.com/
but you can use it as documentation
use
https://flowtracker.prezi.com
Data Infrastructure
Metrics (MX)
Business model
"Prezilians can get answers to quantifiable question within an hour."
#BusinessModel
(mainly Data Infrastructure)
Director of Data
Importance of being data-driven:
https://honey.is/home/#group/45567/posts
Full transcript