You're about to create your best presentation ever

Data Mining Presentation Template

Create your presentation by reusing one of our great community templates.

Data mining presentation

Transcript: The Ethics of Corporate Data Mining James D'Souza 4/28/2020 Corporations Collect personal data for multiple reasons PREMISE WHY The Benefits It is easier to target advertisers Knowing how a person prefers helps better your own algorithm for maximum user attention Information gained can be sold to help mitigate outlying costs Knowing the user-base more intimately can help lead to business decisions Counter Argument Reasons Why its a Bad Idea If you really think about it #1 #3 #2 We do not know who receives our data Google and Microsoft applications come with the computer After "reading" a 60 page user agreement... HOW MUCH BANK WE TALKING 1 Stake- holders 1 FACEBOOK Facebook - net worth $86.2 Billion Stuff Facebook collects: Facebook collects about your browsing history Facebook collects about the apps you visit and your activity within those apps the advertisers who uploaded your contact information to Facebook more than two months earlier ads that you interacted with more than two months prior age, employer, relationship status, likes and location, expected net worth facial biometrics, medical conditions Based on contacts, it can assume social bubbles you are in. The data of an average American is worth between $0.20 and $0.40. assume international mean is $.15. $.15 x 1.62 Billion users is $243 Million. GOOGLE Google - net worth $133.3 Billion Stuff Google collects: Google collects about your browsing history Google collects about the apps you visit and your activity within those apps the advertisers who uploaded your contact information to Google based on how long you've gone since you've cleared your history. ads that you interacted with age, employer, relationship status, likes and location, expected net worth medical conditions, purchase history Watch times and engagement with advertisements. Ambient noise Google does not sell personal data. They manage it all in house. By optimizing their algorithm, they enable the products wanted to be advertised in a one-stop shop BING Microsoft- net worth $1 trillion Stuff MS collects: MS collects about your browsing history MS collects about the apps you visit and your activity within those apps the advertisers who uploaded your contact information to Microsoft media ads that you interacted with more than two months prior age, employer, relationship status, likes and location, expected net worth facial biometrics, medical conditions Watch times and engagement with advertisements. Microsoft does not sell personal data. They manage it all in house. By optimizing their algorithm, they enable the products wanted to be advertise in a one-stop shop Government Governments All chatter gets stored in some facility in New Mexico. I heard it on the internet once The US government buys bulk data to survey possible security threats China mass censors the internet, they set up social credit scores that have internet history among other things affect. Russia and China both use bot accounts to influence foreign policy that benefit from the echochamber environment created by the internet Forms of monetization Types of Monetization Advertisements Sponsorships Subscription based Crowd-sourcing Donations Data mining Advertisements Strengths/Weakness Weaknesses Strengths Money from a source that is not your user base Not effective unless data mining makes targeted advertisements You do not have to seek out patrons Opportunities Threats Disgruntled viewership Minimal intrusion into viewers finances User base get deals on products they might be interested Lauren Ingraham Incident Improves relations with a corporate entity via networking Trickle down economics failures Minimal interaction with sponsors SWOT- Sponsors Strengths/Weakness Weaknesses Strengths Money from a source that is not your user base Opportunities Threats Free merchandise No regular paycheck unless the collaboration is a constant thing False positive reviews User base get deals on products they might be interested Improves relations with a corporate entity via networking Economic sustainability of sponsors Downward spirals faster than advertisements Raid is a turn based rpg done right. In case you’ve been living under a rock and haven’t heard, raid is a badass mobile game that changes everything. The game is crazy popular, with almost 15 million downloads in the last 6 months. Raid is an epic dark fantasy done right. A hero collecting turn based game with over 400 champions to collect and customize. In raid you can get knights orcs undead and more. Raid with friends in a clan, claim glory in the pvp arena. Some other cool features are multi battle auto mode, set battles to run in auto mode while you do something else. Spend less time grinding and more time developing your team and finding the fun stuff. They also have weekly tournaments and events, such as fighting in the arena, running special dungeons, or leveling up your hero’s. There’s always a way to compete and win extra prizes every week. The game is growing In

Data Mining Presentation

Transcript: Model 3. Ensemble as a plus:Bagging/Boosting 2.Mean Integrated Squared Error: Problem Description: Predict user’s star rating on a business, rounded to half-star. Basic Ideas: 1. Overall learning: learn from the user-business review pool. 2. Targeted learning: learn each user individually based on review history. Future Problem 1. How others’ choice will affect a user’s choice 2. How can we learn a user’s interest by digging into similar users’ interests Result Data: Yelp Academic Datasets. Split into: 80% training data + 20% testing data Result so far 1. Should we ignore the content about a business? 2. Discontinuous Linear Regression on users with a lot of reviews. Recommendation system based on Yelp 1. More than 40,000 users’ review 2. More than 10,000 businesses’ profile 3. Sparse matrix with 215,000 non-zero entries (0.05%). 4. More than 7700 users take more than 40 reviews (20%) . c is the treatment cut-off, D is a binary variable and equals to 1 when X ≥ c and h is the bandwidth Future Work: 3. Comparing different evaluation criteria. 2. Regularization on loss function Error may come from: 1. Model is too simple 2. Without any regularization 3. Gray Sheep/Black Sheep 3.Ranked evaluation metric by Heckerman et al. (1998) Learn from the crowd Thank you! 1.Mean Squared Error: 1. Loss function 1. Adding Log-based relevance model into Collaborative Filtering 2. Perform Linear Regression Discontinuity Design Memory-based: Pearson Correlation; Vector Similarity Model-based: Log-based Collaborative Filtering (2006) Data Mining 2. Feature selection: Matrix Factorization such as Singular Value Decomposition Evalutaion Learn From the Crowd & Recommendation System Based on Yelp 1. Learn from others: collaborative filtering 1.Build more complex model Data 3. Cross-validation 2. Personalized recommendation: regression discontinuity design for each user Chen Liu, Ruize Lu, Weizhe Ni

Data mining

Transcript: Data Warehousing • A subject-oriented, integrated, time-variant, non-updatable collection of data used in support of management decision-making processes (1). How does data mining work? Some of the data mining techniques Oracle • 98% of Fortune 500 companies use Oracle • Oracle generate near to 40% revenue in 2011 than they made in 2010 (Yahoo finance) Issues related to data mining Network Setting & Cost Privicy and Security Classical Statistics 5. Forecasting (known as Predictive Analytics) 3. Classification Time efficiency Data Mining 2. Sequence (or Path Analysis) i2 technologies examining the feature of a newly presented object and assigning it to one of a predefined set of classes Not all data are independent and identically distributed. Integration of data mining and knowledge inference. Make confidence diction base on fact not feeling or guessing. Better predict about the future outcomes and Scenarios (3) Data Mining History Strong relationship with customers Time efficiency high-quality of diction making -Data Mining is the threat to an individual's privacy -companies should inform customers about how they will use any data collected from them. 1. "What is a Data Warehouse?" W.H. Inmon, Prism, Volume 1, Number 1, 1995). Difine the Problem dividing a population into a number of subgroups or clusters The process of collecting large quantities of data and then summarizing and analyzing it to produce previously unknown useful information Data Gathering Background There are many application Data Mining 4. Clustering (called Segmentation) Machine Learning 2.Berson, Alex, Stephen Smith, and Kurt Thearling.Building Data Mining Applications for CRM.New York: McGraw-Hill, 2000. Print. SAP Citation • A technique, used in large retail chains, which studies every purchase made by customers to find out which sales are most commonly made together Business impacts (3) Example of data mining application Artificial neural networks (neural networks): Mining Complex Knowledge from Complex Data: • The term itself was introduced relatively recently (in the 1990s), but it’s roots are traced back along three family lines: Salam Almahdi Example: - Finding profitable information -Managers make a diction with short time The organization can seek many competitive advantages by using Data Mining such as: Statistics v.s Data Mining How Data mining is impact the Business? 3.Lange, Kathy. "Differences Between Statistics and Data Mining." Information Management Dec. 2006. Web. 30 Sept. 2011. <http://www.information-management.com/issues/20061201/1069947-1.html>. Model Building & Evaluation Hussain Alqatari High-quality of Diction making: Emptoris 4.Noton, Adriana. "Data Mining and Its Impact on Business." Web. 1 Oct. 2011. Short time of activities • What they need and what is the right time they need it. • Learn how to serve them better consisting of patterns where one event leads to another event (such as the birth of a child and purchasing diapers) Ariba Knowledge Deployment Artificial Intelligence • Statistics is part of data mining which job is to differentiate between random noise and significant finding. Also, it is help to estimates probability of out comes (2). • Data mining is the entire process of data analysis. It is include (Statistics, forecasting, and operation research) (2). Iveta‎ Guneva (3) discovering patterns in data that can lead to reasonable predictions about the future Distinguish between profitable and unprofitable customers Which customers are likely to switch to an alternative supplier in the near future • Decision trees: Wasan‎ Alameer Labor Installation maintenance The Main Data Mining Tasks Nearest neighbor: Customers 1. Association (Market Basket Analysis)

Data Mining Presentation

Transcript: Housing Prices using Advanced Regression Techniques Presented By- Xiaoxia Liu Faiz Nassur Mrunal Bokil Chetana Kamble Vineeta Agarwal Data Mining Project Introduction To predict the residential house price in Ames, Iowa Target variable: House Sale Price Independent variable: 79 explanatory variables to predict house price Data Dimensions: 80 columns including dependent variable (categorical and continuous variables), rows - 2930 30% test observations and 70% training Data Exploration Data Exploration Variables showing skewed distribution Missing Values Missing values constituted up to 40% Feature Correlation Feature Correlation Graph All features Important features (threshold > 0.5) Sale price variation with housing data variables Main Attributes Feature Engineering- existing features Numerical features were actually categorical e.g MSSubclass, MonthSold : converted it to ordinal Feature Engineering- creating new features e.g YearBuilt and Yearsold = Age at time of sale Encoding the Categorical features String format to Numeric e.g FoundationType= slab/stone/ wood Created Dummy variables (Retained the Ordinal variables) Data Preprocessing Missing Value Imputation - Mean-continuous - Mode-Categorical Log transformation of skewed continuous variables Standardized the continuous data after test training split Model Selection Six Methods: 1) Support Vector Regression 2) Linear Regression 3) Random Forest 4) Gradient Boosting Machine 5) LASSO 6) RIDGE Model - RMSE Comparison Final Model - GBM Future Scope A Neural Network based model for real estate price estimation Actual Survey questionnaire data Implement other feature scaling techniques Thank You! https://ac.els-cdn.com/S2352146514002300/1-s2.0-S2352146514002300-main.pdf?_tid=1ca03840-4959-4027-99d2-9983dc442274&acdnat=1528033355_532392cc92e4b7196378469b6c817c23 https://www.hindawi.com/journals/aaa/2014/648047/ https://arxiv.org/ftp/arxiv/papers/1403/1403.2877.pdf References

Now you can make any subject more engaging and memorable