Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

4th Wave: Disruptive Innovation in the Localization Space

No description
by

espell ltd.

on 23 January 2014

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of 4th Wave: Disruptive Innovation in the Localization Space

4TH WAVE
DISRUPTION
EYE-WATERING
AMOUNT OF
DATA
www
1.0

2.0

3.0
VOLUME - VARIETY - VELOCITY
LUXURY > COMMODITY > UTILITY
SERIAL > DISTRIBUTED
QUALITY DISENTANGLEMENT
APPLICATIONS
ALREADY
HERE
STRONG
BUSINESS
CASE
SYNTAX TO SEMANTICS
TECHNOLOGIES
SMT
SMT variants
RBMT
hybrids
representation
ontology mapping
NLP
phrase-based


n-gram based


pivoting
rule-based MT



prevalence


SMT augmenting




ontology-based
processes
applications
perception & value
idea in the 40-ies
first SMT systems early nineties
deploying into production for a decade
statistical MT


model


mathematically describes relationships
lexical semantics, weak in syntax
de facto standard
noisy channel approach, probability distributions
Moses: open source baseline
statistical properties of n-gram sequences
word-to-word aligned corpus
more efficient for closely related target and source languages
intermediary language if volume is low
small narrow-domain corpora or smaller portions of large corpora
solutions since the early seventies
equivalence, abstract intermediary language
structural semantics performance
formalized rules
SMT: significant advantage in quality and operational costs
pure RBMT is very rare
hybrid systems
NLP-specific problems, procedural content generation
predefining certain morphological properties
improving phrasal coverage
contextual rules
syntax-based reordering (high word order disparity)
syntactically lexicalized phrase-based SMT
Google R&D
experimental
data create their own Facebook page, restrict friends

data decide they can work without humans, create their own language

human users realize that they no longer can find data unless invited by data

data get cheaper cell phone rates

data horde all the good YouTube videos, leaving human users with access to bad 80's music videos only

data create and maintain own blogs, are more popular than human blogs

all episodes of Battlestar Galactica will now be shown from the Cylons' point of view

users find data

users find each other

data find each other
4.0

5.0

6.0

7.0

8.0

9.0

10.0
klossner
PIGGIEBACK TIMELINE
~2.5 quintillion bytes of data a day
estimated 2.8 ZB created in 2012
IDC (2011): global data volumes
double every two years
the location of the world’s data is shifting
2013: emerging markets account for 36%
US – 32%
Europe – 19%
China – 13%
India – 4%
rest of the world 32%
2020: set to increase to 62%
estimated 40 ZB of data by 2020
CERN
targeted marketing
railway “talking windows”
querying very large distributed aggregations of loosely-structured data
Google Hummingbird, Amazon S3, ...
e-commerce
quantitative risk analysis
health-care systems
fraud detection
monitoring & analysis
MIT Senseable City Lab
Birmingham hyper-local weather forecasting
transportation grids, city planning
Bamberg city visualization
IBM/Cần Thơ, Vietnam
big data ROI, Wikibon 2013
analytics-driven decision making
correlation mining
niche groups
sentiment analysis
operational and business intelligence infrastructures
Google and Facebook “rejected established translation tools and implied workflows” and “invented new models and their own tools f or MT, TM and translation management, largely bypassing the translation profession. They harnessed linguistic and product expertise rather than translation skills, then relied on user feedback to improve output quality.”

Joanna Drugan: Quality in Professional Translation
12 languages reach 80%
13 languages reach 90%
2011: $36.5 trillion
2013: $44.6 trillion
only 33% is addressable in English as a native tongue
20 languages reach 80%
the long tail goes further
2012
2015
economic potential of online communication
FOCAL CHANGES
DISINTERMEDIATION
Facebook: 75 officially sanctioned languages in 2 years
Google: 10x translated words than the entire professional translation workforce
IBM: global workforce to customize machine translation engines
Prezi: organic growth, evangelists in Spain, Korea, Portugal, Japan
Valve: 2% of the entire worldwide traffic, Steam Translation Platform
TM, terminology
document, project
translators, vendors
segment matching
data
corpus
network
statistical
PROCESSES
atomic
large number of steps
lot of waiting time
waterfall mgmt
serial, static
>
asynchronous
distributed
caters for velocity
agile
dynamic, “free lunch” of ideas
EVALUATION
differentiation: domain, text type & communication channel
end of the pass/fail era: equivalence or adherence does not translate into quality
LISA/EN15038 becoming deprecated
quality: not the key differentiating factor
STANDARDS
TAUS DQF
ISO 17100
GOOD ENOUGH
buyer perception is changing
high performance supply chains
incremental adaptation
E
semantic correspondence
element matching models
lexical clouds
declarative, formal representation
standards
W3C converging on standards for publishing web ontologies (OWL)
OWL: web ontology language for semantic meaning: classes, properties and relationships
RDF: resource description framework: expressing data as triples (subject-predicate-object)
establishing logical correspondences even in loosely structured data
no commercial, mature NLP product
not quite there yet
element based
strings
morphology mapping
edit distance
n-grams: number of common substrings
structure based
tree/graph-based
ontology mapping and schema matching
useful substitutes/imitations
schemas
ontologies
typically relational or XML-based
taxonomic structure
morphology analysis
distributed ontologies is still an issue
relationship assumption:
matched elements should have related elements
normalization
tokenization
expansion
elimination & lemmatization
background knowledge to provide axioms
Google Knowledge Graph, Wordnet
Wikipedia mining
topology assumption:
super- and subclasses of elements are more likely to be related
similar to translating into a very basic structureless language (RBMT)
doesn't rely on artificial intelligence, it relies on user intelligence
daniel.szucs@espell.com

www.espell.com

labsblog.espell.com

THE ELUSIVE POINT
BIG LANGUAGE
content annotation, representation, conceptualization
challenges
deriving meaning
semantic web
XML abstraction, semantic markup
multilingual mapping
explicit formal specification of relations
semantic alignment to integrate heterogeneous resources
benchmarking
Google Hummingbird (Aug 2013)
no longer profitable to game search
semantic markup confers an advantage
ROI (CMO survey, February 2013)
revenue-per-customer metric on social media 17% (2010) to 9% (2013)



syntax and semantics enrich language models

custom data selection to train data

style and language use detection / enforcement

automatic domain adaptation
MT-DRIVEN VELOCITY, PRESENCE AND ANALYTICS
differentiation: business-critical to good enough

fluency and understandability more critical than accuracy

edit distance is not the best measure

diminishing MT <> PE gap
sentiment analysis via intermediary language

customized MT by communication channel, feedback and frequency
hybrids: SMT, RBMT, ontologies
assessment and quality levels
integration
DEPARTURES
FUTURESHOCK
INTELLIGENT CONTENT
DATA AS STRATEGIC ASSET
web today
markup defines syntax, not semantics
search engines weigh keywords and links
no annotated abstraction layer
sentiment analysis

granular global insight

product data, cost data, geological data

relationship, risk and behavior models
CUSTOMERS
INTEGRATION
indispensable to capture value & be competitive
analytics-driven decision making
homogeneous processes
conditionizable content distribution

database integration & linking instead of
fragmented data retrieval & supply chains

index information in products and services
SUPPLIERS
HIGH PERFORMANCE
GOOD ENOUGH
focus on understandability
MT-driven
fully automated
real-time to extreme volume
HIGH VELOCITY
AGILE
HIGH VALUE
focus on speed and quality
rapid turnaround times
ubiquitous, incremental
automated & controlled
focus on expertise and quality
classic supply chains
monolithic projects
TRANSLATION TO STRATEGY
differentiators
commodity to utility
targets
cost to solution
revenue on bundled packages
localization to information systems
SERVICE MODELS
Full transcript