Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

Adzuna Job Salary Prediction

No description
by

Dayna Karns

on 7 December 2013

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Adzuna Job Salary Prediction

Unstructured Data
Project Objective
Natural Language Processing
Natural Language Processing is the way in which computers try to interact with and accommodate human languages.

The hard part about natural language processing is trying to get the computer to understand human language

Because almost all of our data was in word form we had to find and learn a program that would allow us to break down these words and match certain job descriptions with job titles in order to predict that salary
This gets tricky when there are so many programs out there and we did not know which one to look for

Information that does not fit in a traditional row-column database
Includes many types of business documents, such as presentations, webpages, word processing documents, etc.
Unstructured data is hard to analyze, but many businesses feel the information within this data would help could help businesses succeed in a competitive environment

To build a prediction engine for the salary of any UK job listing
To help employers and jobseekers find the market worth of different positions
About half of the job ads listed on the website have a salary publicly displayed

Adzuna's Goal
About Adzuna
Search engine for job, property, and car ads
Based in the UK
Started in 2011 and is run by only 12 people
Uses social networks and market statistics to provide information to users, making search results more efficient and productive for the user

Adzuna Job Salary Prediction
Tokenization
After digging deeper into natural language processing we found out about tokenization
Tokenization is a type of NLP where streams of text are broken up into words or phrases called tokens
The tokens are then used for a different processing called parsing or text mining

Parsing
Parsing is when the text of a phrase is broken down into its most basic parts of speech and then gives a description of the form and function of that particular word as it pertains to a sentence
To us it seemed like parsing focused more on the grammar of phrases instead of being able to relate phrases to specific words namely job descriptions to job titles

Text Mining
Text mining is another form of natural language processing that we looked into, text mining is the process of obtaining high quality information from text
Text mining attempts to structure text, realize patterns within the data, and then interpret the output
High quality data refers to the relevance of the text

Text Mining
Text mining is another form of natural language processing that we looked into, text mining is the process of obtaining high quality information from text
Text mining attempts to structure text, realize patterns within the data, and then interpret the output
High quality data refers to the relevance of the text

Bag of Words Model
In a bag of words model text is represented as a bag, or multi-set, of its words.
Bag of words is usually used in document classification and the frequency of each word is used as a type of feature for that word

Our Language
Did the clerk
put the bananas on the shelf?
Yes.
The ice cream in the freezer?
Yes.

Jack
took out a match.
He
lit a candle.

Jack
forgot his wallet
.
Sam did
too
.
To, too, or two?

Siri
Google Glasses
IBM Watson
Full transcript