Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

Manta Platform Overview

No description
by

Brad Warnick

on 7 May 2014

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Manta Platform Overview

client-side MVC
AngularJS
Web Framework
presentation
(Twitter) Bootstrap 3.0
server-side MVC
Node.js / Express / HoganJS
Gateway API
async data collection / workflow management
single point authorization/authentication
standardize API format (w/o being forced to change internal services yet)
NO direct database or internal API access from the web application (server-side or client-side)
will be able to support mobile apps / 3rd parties
...
Members
Products
Search
Company
Internal Web Services
Data Stores
MongoDB
PostgreSQL
why?
responsive (mobile first)
good cross-browser compatibility
great developer adoption
much smaller than jQuery/jQueryUI
not a proprietary component library
easy to digest component library (developers and designers)
less/recess with CSS linter/tests!
what are we replacing?
proprietary component library
AND jQueryUI
AND lots and lots of custom CSS
+ other new
Base AMI
Ubuntu 12.04
popular development platform
vagrant and aws images available from canonical
two year refresh cycle
fives years worth of updates
free
Framework Selection Process
Identify Choices and Criteria
Evaluate Based on Criteria
Implement Simple App
Node.js / Express
Ruby on Rails
Scala Play
Eliminate 1 Candidate and Build Again
Node.js / Express
Ruby on Rails
why?
client-side MVC for interactive components / apps
super fast development
more flexible than alternatives (e.g. EmberJS)
backed by Google
best practices
lineman for developer productivity
TDD using jasmine and protractor
isolated testing (mock all dependencies)
what are we replacing?
simple, logicless templates
handles the SEO and API data scraping concerns with using only AngularJS
created by twitter for speed
forces developers to take complexities out of the template
very easy to learn
why?
proprietary MVC framework/router written in Perl
big learning curve for new devs
we have to fix, maintain, and enhance it
why?
developer flexibility (can work throughout the web stack)
developer productivity
world class pedigree (V8)
adoption trajectory
winner of the selection process mentioned earlier
best practices
TDD using jasmine
Project/Team Goals
Simple
Elegant
Fast Delivery
Stability
Speed
What are we solving for?
Web Framework (Responsive design)
Complete the Services Layer
Enable best practices - TDD and CI
Infrastructure in the Cloud
Consistent API standards
Performance
Incremental migration
Node.js
Java
MongoDB
User Story
Acceptance Criteria
FAILING
E2E Test
Base
AMI Family Tree
vagrant up nodejs --provider=aws
FAILING
UNIT Test
Create EC2 Instance
vagrant provision
Provision Instance
rake spec:nodejs
Run Tests
Refactor
vagrant create-ami
Bake AMI
Baking AMIs
Node.js
Node.js
Node.js
Autoscaling
PASSING
UNIT Test
PASSING
E2E Test
CODE
Node.js
API/Services
Java
MongoDB
Standards
JSON objects
Security
API/Service Gateway
Authentication
Authorization
Rate Limiting
Data Limiting
Node.js
Java
MongoDB
Node.js
Java
MongoDB
Best Practices
CloudFormation
aws resource orchestration service
declarative
updateable
composable
duplicatable
API Versioning
RESTful
Standard Return codes
JSON Objects
Documentation
Languages
Java
Node.js
Autoscaling and Self-healing
Autoscaling groups solve both
self-healing means replacing failed instances
autoscaling means adding instances based on demand
Datadog
metrics collection saas
also handles alerting
simple api for collecting app metrics
dashboard builder
programmable api
Log Centralization
beaver/elasticsearch/logstash/kibana
most popular opensource stack for logging
excellent text search features
alerting
Refactor
Steps
User story is created by Product Owner, Dev, & QA
Write a failing E2E test
Write a failing Unit test
Code and refactor components until Unit test passes
Code and refactor until E2E test passes
Code & tests can be checked into CI
write code to
address unit tests
existing data stores - out of scope
what are we replacing?
ModPerl & Apache
not attractive
not async (problem for SOA)
why?
industry standard web application framework for Node.js
minimal, fast, flexible
Template Toolkit
too powerful - devs are tempted to put too much in the view... business logic, formatting, data normalization, etc.
what are we replacing?
what are we replacing?
hand rigged jquery + mustache + js
TDD Workflow
Client-side Performance
tests (TDD/CI) that perform static analysis on the page to check that the page is built to perform well
ex. use YSlow! to analyze usage of sprites, optimized CSS, correct placement of JS, etc
tests to do in-browser performance analysis
ex. use WebPageTest to look at waterfall and render time
Single point of failure (SPOF) tests to see how the site degrades when some resources fail
ex. use WebPageTest SPOF feature for this
CI runs (above) can produce graphs to show trends
gateway
Java
[Java]
[Node.js]
[Node.js]
[Node.js]
[Java]
Infrastructure
TDD / CI in Services
Java
jUnit/RSpec – unit and integration tests
Jenkins/Travis CI
Node.js
Jasmine – unit and integration tests
Jenkins/Travis CI
Migration Status
Q4 2013
Infrastructure foundation
TDD / CI systems & best practices
Primary services in place (gateway)
Unclaimed company page ready to launch
Staged for incremental rollout
Q1 2014
Extensive adsense testing to achieve $ parity
Company & related pages
Site search
Q2 2014
Member pages
Megabrowse (industry / geo browse)
Payments
Member Service
Repo
Travis CI
Test suite focuses on business concerns, problem areas, and key changes
Runs on Manta infrastructure (AWS)
Setup with controlled - realistic data
Pull all repos from Github (services, api, web apps, etc)
Executes "true" E2E tests
CI - Continuous Integration
Jenkins CI
Unit Tests
E2E Tests
Code
Migration Status
Manta Technology Team
Organization / Project Staffing
(Web framework project)
Oksana Shmaliy - GM
Engineers / Leads - 15
QA team - 4
Project Manager - 1
Product Manager -1
Consultant - 1
(Maintain current site)
Mark Harris - GM
Engineers / Lead - 6
(New product initiatives)
Engineers - 2
(Support all initiatives)
Engineers / Lead - 7
IT Support -1
Manta Platform Overview
Data Systems and Challenges
Q2 - 2014
Agenda
8:30 Welcome & Introductions
Manta overview
9:00 Technology stack / migration project
11:00 Data environment & challenges
12:00 Lunch
Data Model(s)
1:30 ETL – Data Quality
Manta Site Search (Solr)
2:30 Batch Data Processes
4:00 Future data directions & opportunities
5:00 Wrap-up & next steps
5:30 Tony to airport


Manta New - 22
Manta Current - 7
Manta Labs - 2
Systems Engineering - 8
Company Service
Repo
Unit Tests
E2E Tests
Code
Company Page
Repo
Unit Tests
E2E Tests
Code
Member Page
Repo
Unit Tests
E2E Tests
Code
Platform Migration Goals - Q3 13
Move Manta off legacy homegrown PERL app to a modern open-source [simple/elegant] web framework
Move while operating site to maintain traffic & revenue
Solve significant current “technical debt”
Attract engineering talent in the future
Optimize for performance and maintainability
Principles
Meet the date. Our mission has a delivery date. Hit it.
Everything that we move will be easier to maintain and enhance (less dev/design time).
The stuff that makes money will monetize at very nearly the same rate or better. Key metrics will be neutral to positive.
We will be proud of the quality of the stuff we move to the new platform. We can say "Hey, Mom! Look at this!"
Increase performance - faster!
Goals
If it isn't one of these,
why are we doing it???
CDN origin server switched to point at new web stack
Routing rules in new frame work handles requests for pages that have been moved
Requests for pages that have not been moved fall through to the current web stack
Status (cont.)
cluster
reproduce entire environment
reproduce entire environment
cluster
cluster
cluster
cluster
cluster
cluster
cluster
cluster
AWS
Next Platform Priority - Data Layer
Near term tactical objectives (2014)
Solve technical debt
Improve quality
Simplify & reduce maintenance
Reduce number of database technologies
Migrate remaining systems to cloud
Faster update cycles of licensed company data (D&B etc.)
Better support for business requirements
Long term strategic objectives
Additional data partners (company data etc.)
Exploit user event data for better experience & business value
(30M uniques per month)
Personalized experience per user segment
(business owners, consumers, prospectors)
Open & for fee APIs
Other business opportunities
Databases
PostgreSQL (data center)
Company data (adds / claims / unclaimed)
Transactions / subscriptions
Ad Targeting
SEO
MongoDB (data center / cloud)
Members
Products
Connections
Realtime analytics / tracking
MySQL (cloud)
Manta Connect (phpBB forum)
Oracle
ETL
Reporting
Email marketing integration / Exact Target
Search
Improve search relevance
Index additional fields
Real time indexing of data
Synchronizing bulk data updates with SOLR indexes
We have additional data that can be used to infer additional attributes about the companies.
Specific document models can be tagged with these attributes to boost search results & increase relevance
Use a document processing framework for entity & metadata extraction and allow flexible mappings to related document models
Query pre-processing & query templating to identify user type, normalization and query expansion
SOLR Cloud and overall architecture upgrade
ETL & Data Quality
Company data updates from different vendors periodically
(D&B monthly, TMLC quarterly)
Vendors don’t provide a delta so full data needs to be matched, merged and de-duped based on data licensing rules
Industry data combined from 2 vendor specific extended SIC formats
Mediocre vendor data quality, requires name stds, address stds, etc
Better data cleansing, validation and standardization to eliminate duplicates
Current ETL process is batch oriented, very time consuming and requires lots of resources, manual testing and full data table swaps resulting in a multi-week event
Lack of flexibility to add data from new data sources
Need to setup a data pipeline where data can be added/merged from different data sources in a simplified & automated manner.
Data sources can provide new data, updates, bulk claims on behalf of partners, deletes etc.
Data Model
Main entities
Company
Member
Products
Connections (business owners)
Subscriptions / Transactions
Current
Most of the data structures are flattened and stored in flat tables
Need to model data in a more structured way to satisfy different data consumers
Consumers
Application
SOLR / Search
ETL
Batch processes (3rd parties, reporting, etc.)
Batch Processing
Email Marketing
SOLR updates
ETL
Subscription Sales & Renewals
Industry and Geo Hierarchy re-categorization
Site maps
SEO Indexing
Future Data Driven Opportunities
Additional data partners (company data etc.)
Exploit user event data for better experience & business value
(30M uniques per month)
Personalized experience per user segment
(business owners, consumers, prospectors)
Open & for fee APIs
Other business opportunities?
Full transcript