Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

Semantic Search

No description
by

Jennifer Guzman

on 12 August 2013

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Semantic Search

Notes
Ideas
Ideas
Ideas
Semantic Search - A new way to get search results
What is Semantics?
Is the study of meaning.
It focuses on the relationship between signifiers: words, phrases, signs and symbols and what they stand for.

Syntax: is how we say something, specific letters, words, punctuation, etc.

Semantics: is the meaning behind what we say.
Semantic Web
It helps computers "understand" a meaning behind a web page.
Semantic web is about things: people, places, events, movies, music, organization, any concept.
It also lets computers know how these things are related between them.

What is going on now with results on Search Engines?
Google can fairly easily match any query across web documents that co-reference any page and return results for that "entity" without relying on keyword string matches in HTML tags and anchor text.

Entity-based SEO and Knowledge Graph results attempt to guess user intent through localization, personalization, and entity disambiguation.

What is an Entity?
People, places and concepts that make up the underlying meaning of a web resource: images, videos, and individual pieces of data.
RDF Framework
RDFa in more detail: Alice's blog examples
<html>
<head> ... </head>
<body> ...
<h2>The Trouble with Bob</h2>
<p>Date: 2011-09-10</p>
...
</body>

Example
I love technology.

I <3 technology.

Different syntax, same meaning.

This is very easy to understand for us humans.
Syntax and Semantics in Websites
Syntax used to create websites: HTML (HyperText Markup Language).

Computers understand this and the "tags" used tell them how to display a page.

Computers don't understand the meaning behind a page, only the syntax.
Definition: Semantic Search
"Semantic search is a search or a question or an action that produces meaningful results, even when the retrieved items contain none of the query terms, or the search involves no query text at all."
How did we get to this new type of search?
Mobile is the driving force behind the semantic search revolution.
How Semantic Search works?
It considers context of search, location, intent, variation of words, synonyms, generalized and specialized queries, concept matching and natural language queries to provide relevant search results.
2 major forms of search:
Navigational search: the user is using the search engine as a navigation tool to navigate to a particular intended document.
Semantic search: the user provides the search engine with a phrase which is intended to denote an object about which the user is trying to gather/research information.

There is no particular document which the user knows about and is trying to get to. Rather, the user is trying to locate a number of documents which together will provide the desired information.
How this affects search results? a resource no longer needs to be associated with keywords to be useful for the search engines and to be used by them.
Why is this important? Because when Google receives a user query it's not trying to provide a match for the query keywords, but (informed, whenever it's possible for them to do, by the context of the query) to understand the meaning underlying the query, and then return information about the entities it has identified.
How are entities stored?
"Things not strings"
As unique, web-accessible identifiers that take the form of URIs (uniform resource identifiers): we can think of these as URLs.
Major strengths: it allows search engines to disambiguate entities and collapse references to them
Search is changing because the lives of search engines' users are changing: they now carry their computers with them.
The triple
Mechanism by which a relationship between any two entities may be expressed.

It consists of a subject, a predicate and an object.

Triples allow us not only describe any relationship between any two things, but then map relationships between other statements about those things.

"Jenny lives in Guatemala"
Because entities in the semantic web have unique identifiers, there's no ambiguity about what Jenny or what Guatemala is being referenced.

And information about those entities from other triples – like Jenny knows Gaby, Gaby knows Christian, Christian lives in Venezuela, Christian knows Maria, Maria lives in Brazil, Gaby lives in Guatemala – enables queries like "who does Gaby know who also lives in Guatemala" to be easily and accurately answered. Yes, the correct answer is "Jenny" :)
3 things used in RDF
1. Subject: what we're describing

2. Predicate: attribute of the thing we're describing

3. Object: thing we're referring to with the predicate
Stands for: Resource Description Framework

Used by the semantic web, facilitating the encoding and linking of triples (in technical terms it is a graph data model that uses URIs.)
- It can describe any concept, relationship or thing (entity).

Graph: is a representation of objects that are connected by links
- Each "node" is a subject
- Each subject has information
- Each subject can be related to another subject
RDF is a concept, not a syntax,
but it uses several notations
We can describe anything with this structure

RDF uses URIs to specify subjects and predicates

Example:
Natalia likes cookies
Subject = Natalia
Predicate = likes
Object = cookies
N3 Notation:
We can use N3 to express RDF
N3 is a syntax to describe RDF to humans.

RDFa syntax is used to describe RDF to computers.

Example
N3 Notation:
<#natalia> <pref:likes> <#cookies> .
- The brackets determine it's a URI

- The period ends the statement or triple

- The more triples there are the more we know about the subject

Neither N3 nor RDFa have any inherited meaning until paired with a vocabulary.

A vocabulary defines what the triples actually mean. It allows the computer to understand when we're talking about a specific concept.

Popular vocabulary: FOAF (Friend of a friend)
Contains concepts to identify people and represents relationships between those people
- about attribute to set the subject
- property attribute to set the predicate

<body xmlns:foaf="http://xmlns.com/foaf/0.1">
<span about="#Mary" property="foaf:name" >
Mary Smith
</span>
</body>
Now that the browser knows there's something with a name "Mary" on the page we need to let it know that Mary is also a person.
<body xmlns:foaf="http://xmlns.com/foaf/0.1">
<span about="#mary" instanceof="foaf:Person"
property="foaf:name" >
Mary Smith
</span>
</body>
We can use the "instanceof" attribute, which is used to specify a special sort of attribute on a subject, what class of thing the subject falls into:
Create a relationship
<body xmlns:foaf="http://xmlns.com/foaf/0.1">
<span about="#mary" instanceof="foaf:Person"
property="foaf:name">
Mary Smith
</span>
</body>

<body xmlns:foaf="http://xmlns.com/foaf/0.1">
<span about="#john" instanceof="foaf:Person"
property="foaf:name" >
John Smith
</span>
</body>

<span about="#mary" rel="foaf:knows" resource="#john">

This information is, however, aimed at humans only; computers need some sophisticated methods to extract it. However, by using RDFa, we can annotate the page to make the structured data clear for computers.
<html>
<head> ... </head>
<body> ...
<h2 property="http://purl.org/dc/terms/title">The Trouble with Bob</h2>
<p>Date: <span property="http://purl.org/dc/terms/created">2011-09-10</span></p>
...
</body>

It is worth emphasizing that RDFa uses URLs to identify just about everything. This is why, instead of just using properties like title or created, we use http://purl.org/dc/terms/title and http://purl.org/dc/terms/created.
The reason behind this design decision is rooted in data portability, consistency, and information sharing. Using URLs removes the possibility for ambiguities in terminology.
Without ensuring that there is no ambiguity, the term "title" might mean anything.

When each vocabulary term is a URL, a detailed explanation for the vocabulary term is just one click away. It allows humans or machines, to follow the link to find out what a particular vocabulary term means.

By using a URL to identify a particular type of title, for example http://purl.org/dc/terms/created, both humans and machines can understand that the URL unambiguously refers to the "Date of creation of the resource", such as a web page.
If the URLs are the same, the vocabulary terms mean the same thing.
Example # 2
<p>All content on this site is licensed under
<a href="http://creativecommons.org/licenses/by/3.0/">
a Creative Commons License</a>.
©2011 Alice Birpemswick.</p>

A human clearly understands this sentence, in particular the meaning of the link with respect to the current document: it indicates the document's license, the conditions under which the page's contents are distributed.

Unfortunately, when someone else visits Alice's blog, his browser sees only a plain link that could just as well point to one of Alice's friends or to her CV. For Bob's browser to understand that this link actually points to the document's licensing terms, Alice needs to add some flavor, some indication of what kind of link this is.
Example # 1
<p>All content on this site is licensed under
<a property="http://creativecommons.org/ns#license"
href="http://creativecommons.org/licenses/by/3.0/">
a Creative Commons License</a>. ©2011 Alice Birpemswick.</p>

With this small update, the visitor's browser will now understand that this link has a flavor: it indicates the blog's license.
Example # 3: Use of "vocab"
<div vocab="http://xmlns.com/foaf/0.1/"
typeof="Person"
>
<p>
<span
property="name"
>Alice Birpemswick</span>,
Email: <a
property="mbox"
href="mailto:alice@example.com">alice@example.com</a>,
Phone: <a
property="phone"
href="tel:+1-617-555-7332">+1 617.555.7332</a>
</p>
<ul>
<li
property="knows"
typeof="Person">
<a property="homepage" href="http://example.com/bob/"><span
property="name"
>Bob</span></a>
</li>
<li
property="knows"
typeof="Person">
<a property="homepage" href="http://example.com/eve/"><span
property="name"
>Eve</span></a>
</li>
</li>
</ul>
</div>
Example # 4: Use of prefixes
<html>
<head>... </head>
<body
prefix="dc: http://purl.org/dc/terms/ schema: http://schema.org/"
>
<div resource="/alice/posts/trouble_with_bob"
typeof="schema:BlogPosting"
>
<h2
property="dc:title"
>The trouble with Bob</h2> ...
<h3
property="dc:creator"
resource="#me">Alice</h3>
<div
property="schema:articleBody"
>
<p>The trouble with Bob is that he takes much better photos than I do:</p>
</div> ...
</div>
</body>
</html>
Some popular vocabularies
Dublin Core

Schema

FOAF
Steps to follow:
1. Investigate further about the different vocabularies available
2. Determine which vocabularies are useful for each of our clients
3. Start implementing RDFa in our clients' sites :)
Full transcript