Send the link below via email or IMCopy
Present to your audienceStart remote presentation
- Invited audience members will follow you as you navigate and present
- People invited to a presentation do not need a Prezi account
- This link expires 10 minutes after you close the presentation
- A maximum of 30 users can follow your presentation
- Learn more about this feature in our knowledge base article
Do you really want to delete this prezi?
Neither you, nor the coeditors you shared it with will be able to recover it again.
Make your likes visible on Facebook?
Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.
South Park Speech Prediction
Transcript of South Park Speech Prediction
Vectorization using TfidfVectorizer:
GridSearchCV showed the highest accuracy is based on the following parameters:
Modeling used: MultinomialNB
Data set description
Determining character speech prediction accuracy
Findings and other applications
Results Presentation format
Who is talking is in question
Cleaning our data set
The main challenge in this data set in the amount of "irrelevant" data:
3. Support characters
50% of the dialogue lines are spoken by only 15 characters.
The other 50% of the dialogue lines are spoken by 3935 characters
Character Dialogue Prediction
= ["Well, I guess we'll have to roshambo for it. I'll kick you in the nuts as hard as I can, then you kick me square in the nuts as hard as you can..."]
new_text_transform = vect.transform(new_text)
is the phrase entered
Leveraging a data set that includes the transcripts of 18 seasons of South Park, determine the prediction accuracy of the character most likely speaking given a random word or set of words.
By Armando Galeana, General Assembly DS-SF-31
Predicted accuracy applying MultinomialNB and cross validation in the characters that have 50% of the dialogues is 33.33%
It is hard to get a much higher accuracy score as the "Spam & Ham" words per character is very similar except for the top 4 characters
70896 dialogue lines
3950 unique characters
How would success look like?
Accuracy score must be higher than the percentage of lines of the main character
max_features = 850
MultinomialNB accuracy 0.405300702664
Accuracy applying cross validation 0.3333783092
Main character's percentage of lines
Future of primary healthcare?
Now, imagine that instead of having characters you have user profiles, and instead of episode dialogues you have patient-doctor conversations...
Could you determine the success of a new treatment based on outspoken symptoms?
Can you predict what may be the most likely issues of a new patient?