Loading presentation...

Present Remotely

Send the link below via email or IM


Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.



No description

Natural Tech

on 9 August 2017

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of SentiLecto

Once you understand language, you´re ready to express ideas in your own words.
Representation over SVO slots
the entities involved in each clause are syntactically mapped onto an abstract representation: SVO (subject-verb-object) slots.
Anaphora Resolution &
How many times this text refer to Juana della Carbonara and Gonzalo Juárez?
Negation Scope
As you may know, these two utterances are completely different, they´re denying
differents facts:
Me gusta que Riquelme no haya ganado el campeonato.
No me gusta que Riquelme haya ganado el campeonato.
Fact Mining
Once it separates real facts from hypothetical facs, SentiLecto also can identify and normalize differents references of the same facts. Look at the example:
State-of-the-art technology
in NLP, NLU & NLG.

raw text
abstract representation:
SVO (subject-verb-object) slots
NLU engine

1. El defensor despejó la pelota hacia el córner.
2. El juez Ramírez duda de las intenciones de los abogados.
3. Durante el 2015, Bou se convirtió en el ídolo de Racing.
4. Las nuevas políticas macroeconómicas fueron esbozadas por el ministro Bullrich.
Look how a passive voice clause is represented in its active form. That´s a very useful tool for merging facts!
Gentil y amable como pocas, la Excma. Dra. Juana della Carbonara saludó al arq Gonzalo F. Juárez Jr en las oficinas de la gobernación, ante la mirada del propio gobernador, dejándolo perplejo. María Fernández se cruzó con ella y le preguntó por él. Juana y Gonzalo parecen una pareja disfuncional cuando discuten, no? Aunque los admiro a ambos por igual, supuse que ella lo iba a hostigar.
Now, take a look at the SentiLecto´s Named Entities Extraction output:
Fact Mining
SentiLecto can identify whether or not an utterance is a real fact. The Globals Operator´s output indicates wheter the utterance mapped onto the following SVO slots is not a real fact but an hypothetical situation that didn't happen:
Understanding natural language the way native speakers do.
Why is it important to know about language and syntax?
5. Se introdujeron reformas a la Ley de Indemnizaciones.
Likewise, an impersonal clause is represented in its active form.
1. Afortunadamente, el servicio de telefónica mejoró.
If none of the flags is checked, it indicates that the utterance is a real fact.

2. ¿Mejoró el servicio de telefónica?
6. Exijo que el servicio de Telefónica mejore.

7. El servicio de Telefónica mejorará durante el próximo cuatrimestre.

8. Espero que el servicio de Telefónica mejore pronto.
3. El servicio de Telefónica no mejoró en los últimos años.

4. Tal vez el servicio de Telefónica mejore.

5. Si el servicio de telefónica mejora, contrataré sus servicios.

La candidatura de Macri fue apoyada por la UCR.
La UCR respaldó la candidatura de Mauricio Macri.
Named entities
recognition & classification
Entity Based Sentiment Analysis
I hope I've convinced you about the SentiLecto´s abilities to understanding natural language as a native speaker do.
Rewriting &

How the Google Panda Algorithm ranks your site?
What are you talking about?
Durante la reunión celebrada en Bruselas, el funcionario estadounidense, Antony Blinken, habló ante el Consejo de Derechos Humanos de Naciones Unidas y reclamó la inmediata liberación del venezolano Antonio Capriles.
It´s not enough simply to know if a sentence is positive or negative. You should know wich entity is affected by the discourse´s opinions. That´s simple, once we know each sentence´s syntactic structure.
These two sentences are negative. But they´re not negative in the same way. In the first one, the negativeness affects María. In the other one, it affects Juan.

Juan odia a María

Juan difamó a María
El Ministerio de Salud mejoró las instalaciones

de los hospitales nacionales.

El Ministerio de Salud erradicó a la vinchuca.
And sometimes, we can express simultaneously a judgement over two entities:
When you understand language, it´s easy to say the same things in differents ways.
news merge
SentiLecto´s capability to identify and normalize differents references of the same facts makes possible to identify diverse coverages of the same news. A module called Tecnonews makes this job.
SentiLecto can easily recognize if the entities are persons, organizations or places!
Sentilecto´s output:
Sentilecto´s output:
Google Panda is a filter that prevents low quality sites or pages from ranking well in the engine results page.
Low value content (duplicate, overlapping, or redundant articles) can cause the algorithm to slap down your entire site even if a great deal of your content is valuable.
It mainly uses two statistical methods to identify low-quality content: n-gram models & information gain. The first one can recognize sites that duplicate content. The second one, identifies sites that are not bringing additional value.
In order to prevent Panda from blacklisting you, you have to really focus on high quality content.
That means content that include additional information and without plagiarism.
entretenimientobit.com is a turing-test based blog that employs SentiLecto´s engine and Tecnonew´s algorithm. Take a look at it´s Google anlytics statics:
SentiLecto can recognize the negation scope of this kind of sentences:
Negation Scope
But saying "no" is not the only way to deny something...
Nadie respondió las preguntas del profesor.
El presidente desmintió que haya usado fondos públicos para gastos personales.
From a word-based approach, that ignore syntax, you only can count occurrences. But it´s possible to say the opposite using exactly the same words...
Microsoft compró Oracle y sorprendió a todos. (Microsoft buy Oracle and surprise everybody)
Oracle compró Microsoft y sorprendió a todos. (Microsoft buy Oracle and surprise everybody)
Instead, a syntax-based approach allows you to access a true representation of the facts expressed. In this way, you would easily know who surprise everybody.
Try to imagine this
algorithm acting over
large corpora!
copyscape.com is a useful tool to detect plagiarism on Internet. It shows only a 31% of coincidence between our text and its sources. The article mixes three differents sources.
almashopping.com and noticias.com are two of our clients. Our content appears on the Google´s first page, just under Google´s news services!
Still skeptical?
Architecture Overview
Fact Rewriting
Perhaps you think that entitie´s order of appearance could be a good hint to solve this kind of problem. Maybe in english, but in roman languages (like spanish, portuguese and many others) constituency order is much more free.
In order to answer "Who did what to whom", full parsing is the best approach.
A full parsing also allows you (we will show it immediatly) to identify non-explicit references to entities (that could be persons, companies, countries, organizations). That´s imposible from a traditional approach.
Besides, a syntax-based approach can solve passive voice sentences and subordinate clauses. SentiLecto can tell you who say what and who create what in this example:
Bill Gates dijo que Apple fue creada por azar por Steve Jobs.
Previously, i show you two of our most important algorithms: named entities recognition and fact mining. If we combine their power, we can merge two facts in a complex sentence
Full transcript