You're about to create your best presentation ever

Data Mining Presentation Template

Create your presentation by reusing one of our great community templates.

Data mining presentation

Transcript: The Ethics of Corporate Data Mining James D'Souza 4/28/2020 Corporations Collect personal data for multiple reasons PREMISE WHY The Benefits It is easier to target advertisers Knowing how a person prefers helps better your own algorithm for maximum user attention Information gained can be sold to help mitigate outlying costs Knowing the user-base more intimately can help lead to business decisions Counter Argument Reasons Why its a Bad Idea If you really think about it #1 #3 #2 We do not know who receives our data Google and Microsoft applications come with the computer After "reading" a 60 page user agreement... HOW MUCH BANK WE TALKING 1 Stake- holders 1 FACEBOOK Facebook - net worth $86.2 Billion Stuff Facebook collects: Facebook collects about your browsing history Facebook collects about the apps you visit and your activity within those apps the advertisers who uploaded your contact information to Facebook more than two months earlier ads that you interacted with more than two months prior age, employer, relationship status, likes and location, expected net worth facial biometrics, medical conditions Based on contacts, it can assume social bubbles you are in. The data of an average American is worth between $0.20 and $0.40. assume international mean is $.15. $.15 x 1.62 Billion users is $243 Million. GOOGLE Google - net worth $133.3 Billion Stuff Google collects: Google collects about your browsing history Google collects about the apps you visit and your activity within those apps the advertisers who uploaded your contact information to Google based on how long you've gone since you've cleared your history. ads that you interacted with age, employer, relationship status, likes and location, expected net worth medical conditions, purchase history Watch times and engagement with advertisements. Ambient noise Google does not sell personal data. They manage it all in house. By optimizing their algorithm, they enable the products wanted to be advertised in a one-stop shop BING Microsoft- net worth $1 trillion Stuff MS collects: MS collects about your browsing history MS collects about the apps you visit and your activity within those apps the advertisers who uploaded your contact information to Microsoft media ads that you interacted with more than two months prior age, employer, relationship status, likes and location, expected net worth facial biometrics, medical conditions Watch times and engagement with advertisements. Microsoft does not sell personal data. They manage it all in house. By optimizing their algorithm, they enable the products wanted to be advertise in a one-stop shop Government Governments All chatter gets stored in some facility in New Mexico. I heard it on the internet once The US government buys bulk data to survey possible security threats China mass censors the internet, they set up social credit scores that have internet history among other things affect. Russia and China both use bot accounts to influence foreign policy that benefit from the echochamber environment created by the internet Forms of monetization Types of Monetization Advertisements Sponsorships Subscription based Crowd-sourcing Donations Data mining Advertisements Strengths/Weakness Weaknesses Strengths Money from a source that is not your user base Not effective unless data mining makes targeted advertisements You do not have to seek out patrons Opportunities Threats Disgruntled viewership Minimal intrusion into viewers finances User base get deals on products they might be interested Lauren Ingraham Incident Improves relations with a corporate entity via networking Trickle down economics failures Minimal interaction with sponsors SWOT- Sponsors Strengths/Weakness Weaknesses Strengths Money from a source that is not your user base Opportunities Threats Free merchandise No regular paycheck unless the collaboration is a constant thing False positive reviews User base get deals on products they might be interested Improves relations with a corporate entity via networking Economic sustainability of sponsors Downward spirals faster than advertisements Raid is a turn based rpg done right. In case you’ve been living under a rock and haven’t heard, raid is a badass mobile game that changes everything. The game is crazy popular, with almost 15 million downloads in the last 6 months. Raid is an epic dark fantasy done right. A hero collecting turn based game with over 400 champions to collect and customize. In raid you can get knights orcs undead and more. Raid with friends in a clan, claim glory in the pvp arena. Some other cool features are multi battle auto mode, set battles to run in auto mode while you do something else. Spend less time grinding and more time developing your team and finding the fun stuff. They also have weekly tournaments and events, such as fighting in the arena, running special dungeons, or leveling up your hero’s. There’s always a way to compete and win extra prizes every week. The game is growing In

Data Mining Presentation

Transcript: Model 3. Ensemble as a plus:Bagging/Boosting 2.Mean Integrated Squared Error: Problem Description: Predict user’s star rating on a business, rounded to half-star. Basic Ideas: 1. Overall learning: learn from the user-business review pool. 2. Targeted learning: learn each user individually based on review history. Future Problem 1. How others’ choice will affect a user’s choice 2. How can we learn a user’s interest by digging into similar users’ interests Result Data: Yelp Academic Datasets. Split into: 80% training data + 20% testing data Result so far 1. Should we ignore the content about a business? 2. Discontinuous Linear Regression on users with a lot of reviews. Recommendation system based on Yelp 1. More than 40,000 users’ review 2. More than 10,000 businesses’ profile 3. Sparse matrix with 215,000 non-zero entries (0.05%). 4. More than 7700 users take more than 40 reviews (20%) . c is the treatment cut-off, D is a binary variable and equals to 1 when X ≥ c and h is the bandwidth Future Work: 3. Comparing different evaluation criteria. 2. Regularization on loss function Error may come from: 1. Model is too simple 2. Without any regularization 3. Gray Sheep/Black Sheep 3.Ranked evaluation metric by Heckerman et al. (1998) Learn from the crowd Thank you! 1.Mean Squared Error: 1. Loss function 1. Adding Log-based relevance model into Collaborative Filtering 2. Perform Linear Regression Discontinuity Design Memory-based: Pearson Correlation; Vector Similarity Model-based: Log-based Collaborative Filtering (2006) Data Mining 2. Feature selection: Matrix Factorization such as Singular Value Decomposition Evalutaion Learn From the Crowd & Recommendation System Based on Yelp 1. Learn from others: collaborative filtering 1.Build more complex model Data 3. Cross-validation 2. Personalized recommendation: regression discontinuity design for each user Chen Liu, Ruize Lu, Weizhe Ni

Data Mining Presentation

Transcript: Tomato Leaf Disease Detection with CNN Merge Model and Digital Image Processing > Peoples Introduction of the people Behind this project > Meet them Submitted By: Asfia Rahman (ID: 19201103096) Tazmul Hassan (ID:19201103083) Farzana Akter Nipa (ID: 19201103084) Md. Rakibul Islam (ID: 19201103097) Mustain Murtaza Taib (ID: 18193103003, INTAKE-42) Submitted To: Amir-Ul-Haque Bhuiyan Designation: Assistant Professor Department: Computer Science & Engineering Introduction Bangladesh, an agriculture-rich country, faces challenges in identifying and addressing plant diseases such as viruses, fungi, and bacterial infections. This study proposes a convolutional neural network-based approach for identifying and categorizing tomato leaf diseases. The study aims to improve the quality, quantity, and productivity of tomato crops by addressing these issues. The proposed merge model shows outstanding performance in detecting and addressing various tomato leaf diseases. > Introduction > Diseases Different Kinds of Tomato Leaf Disease > Watchout Images of leaves Objective > Objective Objectives > Points Collect and preprocess a diverse dataset of tomato leaf images. Develop a Merge CNN model (MobileNet-V2 and ResNet-50) for accurate identification of tomato leaf diseases Train and evaluate the model's performance on the dataset. Compare the model's performance with existing methods. > Motivation Motivation > Points Motivations Existing methods for disease identification and management are often inefficient and prone to errors. Tomato leaf diseases pose significant challenges to tomato crop productivity in Bangladesh. By improving disease detection and intervention, the model aims to enhance the quality, quantity, and productivity of tomato crops. The model will provide a reliable tool for farmers and agricultural professionals to identify and manage tomato leaf diseases. Methodology >Methodology > Pre-Train Model VGG-16 RESNET-50 > Work Flow W O R K F L O W Result > Result Training Accurecy:0.994 Valid Accurecy:0.9670 >Limitation The model's performance may be affected by the quality and quantity of the training data used The model's performance may be affected by environmental factors such as lighting and camera angle Limitation The model's computational complexity may limit its practical application in real-time scenarios >Conclusion & Future Work Conclusion The use of a CNN merge model and digital image processing has shown great promise in detecting tomato leaf diseases with high accuracy and efficiency, offering a practical solution to enhance crop productivity and sustainability in Bangladesh. > Conclusion Future Work > Future Work Collecting more diverse and extensive datasets to improve the model's accuracy and generalizability Developing a user-friendly interface for the model to make it accessible to farmers and other stakeholders in the agriculture industry Thank You >

Data Mining Presentation

Transcript: Housing Prices using Advanced Regression Techniques Presented By- Xiaoxia Liu Faiz Nassur Mrunal Bokil Chetana Kamble Vineeta Agarwal Data Mining Project Introduction To predict the residential house price in Ames, Iowa Target variable: House Sale Price Independent variable: 79 explanatory variables to predict house price Data Dimensions: 80 columns including dependent variable (categorical and continuous variables), rows - 2930 30% test observations and 70% training Data Exploration Data Exploration Variables showing skewed distribution Missing Values Missing values constituted up to 40% Feature Correlation Feature Correlation Graph All features Important features (threshold > 0.5) Sale price variation with housing data variables Main Attributes Feature Engineering- existing features Numerical features were actually categorical e.g MSSubclass, MonthSold : converted it to ordinal Feature Engineering- creating new features e.g YearBuilt and Yearsold = Age at time of sale Encoding the Categorical features String format to Numeric e.g FoundationType= slab/stone/ wood Created Dummy variables (Retained the Ordinal variables) Data Preprocessing Missing Value Imputation - Mean-continuous - Mode-Categorical Log transformation of skewed continuous variables Standardized the continuous data after test training split Model Selection Six Methods: 1) Support Vector Regression 2) Linear Regression 3) Random Forest 4) Gradient Boosting Machine 5) LASSO 6) RIDGE Model - RMSE Comparison Final Model - GBM Future Scope A Neural Network based model for real estate price estimation Actual Survey questionnaire data Implement other feature scaling techniques Thank You! https://ac.els-cdn.com/S2352146514002300/1-s2.0-S2352146514002300-main.pdf?_tid=1ca03840-4959-4027-99d2-9983dc442274&acdnat=1528033355_532392cc92e4b7196378469b6c817c23 https://www.hindawi.com/journals/aaa/2014/648047/ https://arxiv.org/ftp/arxiv/papers/1403/1403.2877.pdf References

Now you can make any subject more engaging and memorable