You're about to create your best presentation ever

Best Free Powerpoint Template For Data Mining

Create your presentation by reusing a template from our community or transition your PowerPoint deck into a visually compelling Prezi presentation.

Data mining

Transcript: Data Warehousing • A subject-oriented, integrated, time-variant, non-updatable collection of data used in support of management decision-making processes (1). How does data mining work? Some of the data mining techniques Oracle • 98% of Fortune 500 companies use Oracle • Oracle generate near to 40% revenue in 2011 than they made in 2010 (Yahoo finance) Issues related to data mining Network Setting & Cost Privicy and Security Classical Statistics 5. Forecasting (known as Predictive Analytics) 3. Classification Time efficiency Data Mining 2. Sequence (or Path Analysis) i2 technologies examining the feature of a newly presented object and assigning it to one of a predefined set of classes Not all data are independent and identically distributed. Integration of data mining and knowledge inference. Make confidence diction base on fact not feeling or guessing. Better predict about the future outcomes and Scenarios (3) Data Mining History Strong relationship with customers Time efficiency high-quality of diction making -Data Mining is the threat to an individual's privacy -companies should inform customers about how they will use any data collected from them. 1. "What is a Data Warehouse?" W.H. Inmon, Prism, Volume 1, Number 1, 1995). Difine the Problem dividing a population into a number of subgroups or clusters The process of collecting large quantities of data and then summarizing and analyzing it to produce previously unknown useful information Data Gathering Background There are many application Data Mining 4. Clustering (called Segmentation) Machine Learning 2.Berson, Alex, Stephen Smith, and Kurt Thearling.Building Data Mining Applications for CRM.New York: McGraw-Hill, 2000. Print. SAP Citation • A technique, used in large retail chains, which studies every purchase made by customers to find out which sales are most commonly made together Business impacts (3) Example of data mining application Artificial neural networks (neural networks): Mining Complex Knowledge from Complex Data: • The term itself was introduced relatively recently (in the 1990s), but it’s roots are traced back along three family lines: Salam Almahdi Example: - Finding profitable information -Managers make a diction with short time The organization can seek many competitive advantages by using Data Mining such as: Statistics v.s Data Mining How Data mining is impact the Business? 3.Lange, Kathy. "Differences Between Statistics and Data Mining." Information Management Dec. 2006. Web. 30 Sept. 2011. <http://www.information-management.com/issues/20061201/1069947-1.html>. Model Building & Evaluation Hussain Alqatari High-quality of Diction making: Emptoris 4.Noton, Adriana. "Data Mining and Its Impact on Business." Web. 1 Oct. 2011. Short time of activities • What they need and what is the right time they need it. • Learn how to serve them better consisting of patterns where one event leads to another event (such as the birth of a child and purchasing diapers) Ariba Knowledge Deployment Artificial Intelligence • Statistics is part of data mining which job is to differentiate between random noise and significant finding. Also, it is help to estimates probability of out comes (2). • Data mining is the entire process of data analysis. It is include (Statistics, forecasting, and operation research) (2). Iveta‎ Guneva (3) discovering patterns in data that can lead to reasonable predictions about the future Distinguish between profitable and unprofitable customers Which customers are likely to switch to an alternative supplier in the near future • Decision trees: Wasan‎ Alameer Labor Installation maintenance The Main Data Mining Tasks Nearest neighbor: Customers 1. Association (Market Basket Analysis)

Data Mining

Transcript: Data mining in computer science is the process of discovering interesting and useful patterns and relationships in large volumes of data. Data mining commonly involves four classes of tasks: * Clustering - is the task of discovering groups and structures in the data that are in some way or another "similar", without using known structures in the data. * Classification - is the task of generalizing known structure to apply to new data. For example, an email program might attempt to classify an email as legitimate or spam. Common algorithms include decision tree learning, nearest neighbor, naive Bayesian classification, neural networks and support vector machines. * Regression - Attempts to find a function which models the data with the least error. * Association rule learning - Searches for relationships between variables. For example a supermarket might gather data on customer purchasing habits. Using association rule learning, the supermarket can determine which products are frequently bought together and use this information for marketing purposes. This is sometimes referred to as market basket analysis. For example, one Midwest grocery chain used the data mining capacity of Oracle software to analyze local buying patterns. They discovered that when men bought diapers on Thursdays and Saturdays, they also tended to buy beer. Further analysis showed that these shoppers typically did their weekly grocery shopping on Saturdays. On Thursdays, however, they only bought a few items. The retailer concluded that they purchased the beer to have it available for the upcoming weekend. The grocery chain could use this newly discovered information in various ways to increase revenue. For example, they could move the beer display closer to the diaper display. And, they could make sure beer and diapers were sold at full price on Thursdays. Data mining is primarily used today by companies with a strong consumer focus - retail, financial, communication, and marketing organizations. It enables these companies to determine relationships among "internal" factors such as price, product positioning, or staff skills, and "external" factors such as economic indicators, competition, and customer demographics. And, it enables them to determine the impact on sales, customer satisfaction, and corporate profits. Finally, it enables them to "drill down" into summary information to view detail transactional data. DATA MINING Mathematical algorithms, equations, clustering of data are the back bone of its functioning. COMMERCE AND BUSINESS When Data is being generated..for example- swiping of MasterCard.

Data Mining

Transcript: Each time a user issues a Query, we get ads on positions 1,2 or 3 The depth is at most 3 and it is >= postition. This 3 rows (or less) constitute a search session. Approach #2 Managing sample data: The information is too much, querys take hours, so we will use sample data: Each will be cleaned ( e.g. duplicate removal or too many missing values) sample validation: Is this a good sample? Is it random enough? (how can we make sure of this) Does it has many different values to work with? At this point, we dont believe that we should disregard any of the columns we have, but it is an important point to consider later. "Who do we work for?" If we use different perspectives to look at the information, we hope to find interesting and meaningful information or for different stakeholders. Q & A It also brings up several Questions: Is it useful to identify sessions? Should we stick to this data as given and start working as it is? Understanding our Data What is this? Companies might be intrested in: Ads that are popular among a specific age of gender group. Key words, titles or descriptions with high click rate. Approach#1 Things to verify (preprocess) Actual Data: Setting (Depth, position), may vary on each search session. click registry. Example: Our user clicked the for the second time until the 4th time he issued this query, the 5th time, he did no click Data mining Final Project Introduction soso.com can be intrested in: keywords that trigger unexpected Ads in good positions. Two ads that are not from related items or companies, but appear in the same query. Different querys that trigger the same Ads. Data verification and cleansing Rising Number of Impressions each time the Ad was shown. Different types of "treasure" for different type of people... If this is a Log , then we should be able to verify it like this: For User =1, Query=4 and AdiD=6 (with the same title and description), we will retrieve this log: Miners: Dereck Davis Feng-Ren Tsai Gibril Lowe Intan Maghfirah Ruben Berrios Omar The sessions are broken up and then aggregated from different perspectives. So it is hard to build the original session records.

powerpoint template

Transcript: Nobody knows babies like we do! Quality products . Good Customer service. Every Kid really loves this store.. BABYLOU ABOUT US About Us BabyLou was established in 2004. It has been more than a decade since we started, where we have ensured to take care of every need and want of every child and infant under one roof, true to the caption “NO BODY KNOWS BABIES LIKE WE DO”. Our benchmark is to provide 100% customer service and satisfaction and continue to deliver the same with a wide range of toys, garments and Baby Products. Play and Create We Are Best 01 02 03 Block games Building Blocks help Kids to use their brain. PLAY TO LEARN in Crusing Adventures Our Discoveries Enjoy a sunny vacation aboard a luxury yacht with the LEGO® Creator 3in1 31083 Cruising Adventures set. This ship has all the comforts you need, including a well-equipped cabin and a toilet. Sail away to a sunny bay and take the cool water scooter to the beach. Build a sandcastle, enjoy a picnic, go surfing or check out the cute sea creatures before you head back to the yacht for a spot of fishing. Escape into the mountains Disney Little Princes in Also available for your Babies..... Also... Out of The World… Our reponsibility BABYLOU…. Our Responsibility All children have the right to fun, creative and engaging play experiences. Play is essential because when children play, they learn. As a provider of play experiences, we must ensure that our behaviour and actions are responsible towards all children and towards our stakeholders, society and the environment. We are committed to continue earning the trust our stakeholders place in us, and we are always inspired by children to be the best we can be. Innovate for children We aim to inspire children through our unique playful learning experiences and to play an active role in making a global difference on product safety while being dedicated promoters of responsibility towards children.

Data Mining

Transcript: As is common in association rule mining, given a set of item sets the algorithm attempts to find subsets which are common to at least a minimum number of the item sets. Apriori uses a "bottom up" approach, where frequent subsets are extended one item at a time (a step known as candidate generation), and groups of candidates are tested against the data. The algorithm terminates when no further successful extensions are found. Association rules are usually required to satisfy a user-specified minimum support and a user-specified minimum confidence at the same time The purpose of the Apriori Algorithm is to find associations between different sets of data. Each set of data has a number of items and is called a transaction. The output of Apriori is sets of rules that tell us how often items are contained in sets of data. Apriori is the best-known algorithm to mine association rules. It uses a breadth-first search strategy to count the support of item sets and uses a candidate generation function which exploits the downward closure property of support. Presentation on Apriori Algorithm Association rule generation is usually split up into two separate steps: 1.First, minimum support is applied to find all frequent item sets in a database. 2.Second, these frequent item sets and the minimum confidence constraint are used to form rules. Association Rules In Data Mining context association rule learning is a popular and well researched method for discovering the relations between the variables in large database Many algorithms for generating association rules were presented over time. Some well known algorithms are Apriori, and FP-Growth By Dheeraj Reddy Jonnalagadda (800752576) Sravani Reddy Burri (800736309) It is used for discovering strong rules in databases using the different measures Apriori Algorithm

Data Mining

Transcript: Objective Brainstorming Process Results Statistics Conclusion To reduce the number of condition groups by grouping them in a meaningfull way Variable Clustering Cancer Predictive Model Renal R-squared with 10 Clusters ------------------ Own Next 1-R**2 Cluster Variable Cluster Closest Ratio ---------------------------------------------------------- Cluster 1 PCG_CANCRA 0.0563 0.0010 0.9447 PCG_CANCRB 0.2709 0.0068 0.7342 PCG_CANCRM 0.0660 0.0002 0.9341 PCG_GYNECA 0.2839 0.0015 0.7172 PCG_ODaBNCA 0.4854 0.0138 0.5218 ---------------------------------------------------------- Cluster 2 PCG_AMI 0.5290 0.0127 0.4771 PCG_CHF 0.2255 0.0137 0.7853 PCG_HEART2 0.1228 0.0060 0.8825 PCG_PERVALV 0.0501 0.0014 0.9513 PCG_ROAMI 0.4926 0.0176 0.5165 ---------------------------------------------------------- Cluster 3 PCG_APPCHOL 0.4355 0.0014 0.5653 PCG_GIBLEED 0.4974 0.0183 0.5120 PCG_LIVERDZ 0.0906 0.0020 0.9113 PCG_PNCRDZ 0.1987 0.0009 0.8020 ---------------------------------------------------------- Cluster 4 PCG_CATAST 0.1343 0.0030 0.8683 PCG_COPD 0.5212 0.0115 0.4843 PCG_PNEUM 0.3398 0.0055 0.6638 PCG_RESPR4 0.3373 0.0094 0.6690 ---------------------------------------------------------- Cluster 5 PCG_HIPFX 0.0367 0.0019 0.9651 PCG_MISCL1 0.2258 0.0032 0.7766 PCG_MISS 0.3842 0.0104 0.6223 PCG_RENAL1 0.2953 0.0009 0.7054 PCG_RENAL2 0.2765 0.0042 0.7265 PCG_SEPSIS 0.0353 0.0019 0.9665 ---------------------------------------------------------- Cluster 6 PCG_HEMTOL 0.1082 0.0036 0.8950 PCG_INFEC4 0.1709 0.0038 0.8323 PCG_MISCL5 0.2101 0.0100 0.7978 PCG_MSC2a3 0.2560 0.0166 0.7565 PCG_NEUMENT 0.2975 0.0233 0.7193 PCG_SKNAUT 0.2508 0.0112 0.7577 ---------------------------------------------------------- Cluster 7 PCG_GYNEC1 0.1114 0.0036 0.8918 PCG_RENAL3 0.5059 0.0077 0.4979 PCG_UTI 0.5755 0.0073 0.4276 ---------------------------------------------------------- Cluster 8 PCG_HEART4 0.1804 0.0104 0.8282 PCG_METAB1 0.0915 0.0009 0.9093 PCG_METAB3 0.3510 0.0288 0.6683 PCG_MISCHRT 0.3373 0.0144 0.6723 PCG_PRGNCY 0.0331 0.0008 0.9677 PCG_SEIZURE 0.1410 0.0058 0.8641 PCG_STROKE 0.1310 0.0029 0.8715 ---------------------------------------------------------- Cluster 9 PCG_ARTHSPIN 0.4062 0.0248 0.6089 PCG_FXDISLC 0.3174 0.0016 0.6837 PCG_TRAUMA 0.4766 0.0050 0.5260 ---------------------------------------------------------- Cluster 10 PCG_FLaELEC 0.1849 0.0067 0.8206 PCG_GIOBSENT 0.5385 0.0143 0.4682 PCG_PERINTL 0.3700 0.0001 0.6301 ANTHROPATHY Cluster by PCG percentages Model on normalized data Use member statistics as predictors Variable Clustering Percentage Miscellaneous By: Results Gynecology Ten Clusters Agenda Objectives Antropathy Pulmonary Suggestions Member Statistics Understanding Condition Groups Metabolism Count How? PULMONARY Heart Disease PREG/STROKE Risk Factor Variable clustering does not necessarily lead to good predictors Association is bias towards more frequent PCGs. Manual grouping has an inherent issue of subjectivity 3- Rename and recode Primary Condition Groups MISC Member Statistics Decision Tree Conclusion Hospital 4- Predictive Model Process HEART DISEASE Association Two stage model Clusters were not significant predictors 1- Choose Best Approach Why? BOWEL_DISEASE Manually Grouping 2- Create Variable Clustering Questions? GYNECOLOGY Stomach Disorder Thank you Bowel Disease CANCER STOMACH DISORDER RENAL Annie Nath Hadi Kassaee Kevin Benaim Ricky Hooton Toan Dinh Brainstorm

Now you can make any subject more engaging and memorable