Loading presentation...

Present Remotely

Send the link below via email or IM


Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.


Linked Data & the Semantic Web

GMA IS meeting - June 24-28 2013

andy siegel

on 27 June 2013

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Linked Data & the Semantic Web


Andy Siegel
Dave Fritsche, Jun Dong

& Use Cases

Take Aways
Number of Links
Number of Nodes
Semantic Web
Benefits & Value

Discussion & Questions?
History of the
Semantic Web

Provide a brief introduction to:
semantically linked data and
the semantic web

Highlight some healthcare & life science examples

Attempt to address questions such as:
What is linked data?
What are semantic web technologies?
How would I use them? (Why should I care?)
How are semantic applications built?
How do they differ from traditional apps?
What are some of the potential benefits?
Semantic Search
Architecture of the
Semantic Web

the global disease alert surveillance system
Discovery & Development
R&D knowledge management
Informatics asset management
Target assessment
Assay lifecycle management
CRO data exchange
Compound in-sourcing
R&D data curation
Lot genealogy

Regulatory submission preparation
Trial site evaluation
Clinical data harmonization & analysis

Manufacturing, Finance, Sales, Marketing, IT
Market intelligence
Sales forecasting
Supply chain metrics management
Departmental budgeting
IT asset management
HR insights
Publish Structured Product Label data

Linked Data
As of May-2013, the English version of the DBpedia knowledge base currently describes 3.77 million things, out of which 2.35 million are classified in a consistent Ontology, including:

764,000 persons
573,000 places (including 387,000 populated places)
333,000 creative works (including 112,000 music albums, 72,000 films and 18,000 video games)
192,000 organizations (including 45,000 companies and 42,000 educational institutions)
202,000 species and
5,500 diseases
The full DBpedia data set features
labels and abstracts for:
10.3 million unique things in up to 111 languages;
8.0 million links to images;
24.4 million HTML links to external web pages;
27.2 million data links into external
RDF data sets
55.8 million links to Wikipedia categories.

The data set consists of 1.89 billion bits of information in
RDF triples
400 million extracted from the English edition of Wikipedia
1.46 billion extracted from other
language editions
Bio2RDF is an open-source project that uses Semantic Web technologies to build and provide the largest network of Linked Data for the Life Sciences.
Web OS
Social Web
The Internet
Intelligent Web
Semantic Web
The PC
The Web
Intelligent personal agents
Distributed Search
Semantic Databases
Semantic Search
Social Media Sharing
File Servers

Office 2.0
Directory Portals
Keyword Search
File Systems
Social Networking
Connections between Information

Connections between people
The Intelligence is in the Connections
PC Era
1980 - 1990
Web 1.0
1990 - 2000

Web 4.0
2020 - 2030
Web 2.0
2000 - 2010

Web 3.0
2010 - 2020
source: http://www.slideshare.net/syawal/nova-spivack-semantic-web-talk
Beyond the Limits of Keyword Search
Productivity of Search
Amount of Data
Keyword Search
The Desktop
The World Wide Web
File Servers
File Systems
Natural Language Search
PC Era
1980 - 1990
Web 1.0
1990 - 2000

The Intelligent Web
Web 4.0
2020 - 2030
The Social Web
The Semantic Web
Semantic Search
Web 2.0
2000 - 2010

Web 3.0
2010 - 2020
source: http://www.slideshare.net/syawal/nova-spivack-semantic-web-talk
Scientific data makes up a significant portion of the current Linked Data Web.
There is information on proteins and genes, pathways, sequences, chemistry, genetics, drugs, …

World Wide Web : Web pages :: The Semantic Web : Data
The Web is the Database!
Used to build & drive web sites and enterprise applications for:
Data integration
Business intelligence
Large knowledgebases

These technologies enable us to build capabilities and solutions that were not previously possible, practical, or feasible.
Semantic Web
Web of Data
Giant Global Graph
Data Web
Web 3.0
Linked Data Web
Semantic Data Web
Enterprise Information Web
“Semantic technologies”
generally refers to
a broad spectrum of techniques for finding a signal in large or complex data sources

Semantic Web standards
tend to
be particularly effective tools for implementing
Data mining
Entity extraction
Semantic search
Unstructured text mining / NLP
AI / expert systems / machine learning

Semantic Web technologies
Is scalable, flexible, and adaptive to future changes
Adoption in Pharma to the best of our knowledge…

Biogen Idec
Boehringer Ingelheim
Eli Lily
Johnson & Johnson
Q: What is Linked Data?

A: In essence, it is the shift in practice from publishing data in human readable HTML documents to machine readable documents & data structures.

With linked data, machines become a lot smarter and can do more of the knowledge work which we now rely on humans to do.
Wikipedia is a collaboratively edited, multilingual, free Internet encyclopedia supported by the non-profit Wikimedia Foundation.

Wikipedia's 30 million articles in 286 languages, including over 4.2 million in the English Wikipedia, are written collaboratively by volunteers around the world.
DBpedia is a project aiming to extract structured content from the information created as part of the Wikipedia project.

DBpedia allows users to query relationships and properties associated with Wikipedia resources, including links to other related datasets.

DBpedia has been described by Tim Berners-Lee as one of the more famous parts of the Linked Data project.
A family of technology standards that ‘play nice together’, including:
Standard data model
Standard schema/ontology language
Standard query language

Semantic Web standards
Some Differentiators and Benefits of Semantic Technology
Where Can Semantic Infrastructure Add Value?
Empower collaboration and data sharing:
Across the Sanofi R&D community and
With external partners, vendors, regulatory agencies & other 3rd parties (research & healthcare ecosystem)
Ensure data from diverse areas of research and development is both accessible and combinable so the data can be analyzed in an integrated fashion
Simplify integration of business applications
Foster insight and innovation by reducing the administrative burden associated with data management
Increase the ability to clearly define, use and re-use content
Ensure consistent information across business applications
Permanently increase our data quality
Reduce cost by avoiding duplicate updates and errors
The meaning travels with the data
Handles diverse data from varying structured & unstructured sources
Users and applications deal only with a unified view of information, without worrying about where that information came from
Because the data coming from unstructured text is unpredictable, Semantic Web technologies are a particularly effective way to collect & integrate NLP results
Semantics is extremely flexible: the RDF data model evolves as new data shows up and is connected with existing data
You can run analyses that span any data connected to the semantic fabric, regardless of how the concepts are connected
Cooperation without Coordination:
Upfront coordination regarding table schemas and relational structures is not required... Change is cheap!
The semantic model is a conceptual model.
It avoids using relational constructs, IDs, keys, etc. in favor of concepts and relationships expressed/expressible in human language.
This is also reflected in software that is built with Semantic Web data.
Semantic technologies are non-destructive, overlay technologies and often use a metamodel
— a level of abstraction on top of the information — that sits between the user (either human or machine) and the information to provide a level of structure that adds meaning.
This diagram shows some of the information available and how its linked together. Nodes are sized according to their quantity of data, and links are sized according to the quantity of links.

Bio2RDF Release 2 (Jan 2013) Features:
1 billion triples across 19 updated datasets
Full transcript