Loading presentation...

Present Remotely

Send the link below via email or IM


Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.


Data Mining Employed by IBM

No description

Elizabeth Owens

on 9 November 2013

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Data Mining Employed by IBM

What is Data Mining?
Method to analyze large amounts of data to "discover" useful information which is then converted into knowledge to make decisions (Palace, 1996)
Think of Data Mining This Way
Imagine you are searching through the mountains in hopes of finding a valuable ore
What Can Be "Discovered"?
Knowledge about internal & external relationships
So what really is data mining?
It is a way to discover valuable knowledge
Computer tools and software search through large amounts of data to find meaning for the user
It can predict future trends, behavior, and growth,
Can also find anomalies, relationships, and exceptions that might go unnoticed
How is This Knowledge Applied?
Using purchase history data of customers to determine which products should be recommended
Use demographic data for specific direct advertisements (Microsoft)
Determine buying habits based on product displays through use of data
Data Mining & Pfizer
Elizabeth Owens
Then what is the difference between data, information, & knowledge?
Data: Information that is unorganized and has little meaning on its own
Information: The patterns, relationships, correlations,
Data: Numbers, text, facts, etc. that are unorganized and have little value (Palace, 1996) (Difference)

Information: Data that has been formatted in some way to assign meaning (Difference)

Knowledge: Patterns and correlations among the information (Palace, 1996)
With data mining, a computer is basically searching through mounds of data in order to find the valuable ore (patterns, trends, etc.
Internal: Price, product placement, quantity, turnaround

External: Customer demographics, buying habits, economic indicators, sales forecasts
Real Company Example: Pfizer
Multinational pharmaceutical corporation
Discovers, develops, and manufactures medicine for both humans and animals
Extensive listing of both prescription and over the counter medicine
Examples include Zoloft, Lipitor, Chantix, and Enbrel
Why Pfizer Uses Data Mining
They are taking data from previous clinical trials
They want to "milk as much out of the data as possible"
Other companies rarely go back and look at past trial data
They want to look back at the data to find patterns of any sort
How They Are Using the Information
Example: Look at data across many trials for drug interactions or safety issues
To improve on future clinical trials
To minimize risk
To look for other uses of an already approved drug
Help with bridging trials (i.e. Show a U.S. approved drug would be effective in China
Advantages of Data Mining
Can perform more effective clinical trials: By using the knowledge obtained through data mining previous trials, the trials can be improved
Increasing revenue: Finding other uses of a drug through data mining means that more people would be a potential user and sales could increase.
Improve safety: By going back and looking for safety issues with drug combinations, Pfizer could then warn people about their safety concerns
Information may not lead to results: While beneficial knowledge may be obtained, it does not mean that it would benefit the company financially or efficiently
Could be costly: Employing knowledgeable staff and data mining analysis tools, such as Insightful Miner, could be expensive
The discovered information by be misinterpreted
"Knowledge Discovery in Databases" (Kriegel et al., 2007)
Future of Data Mining
1. Formatting the data in order to mine is a time consuming process - Only 10% of time is actually spent mining. Improved ways to pre-process the data
2. Data mining methods will become more user friendly
3. Develop methods to process more complex data
4. Continue to be used by companies to better understand internal and external factors
Data Mining Challenges
Preparing the input data
Data must be properly reorganized, formatted, account for missing data and define the information - Very time consuming and most be done properly (Kriegel et al., 2007)
A unifying framework of data mining does not exist - instead only processes for individual problems (Wang & Wu, 2006)
Challenges Cnt.
Developing ways to mine complex data (Wang & Wu, 2006)
Mining a continuous stream of data (Wang & Wu, 2006)
Dealing with not ideal data (i.e. missing pieces, not representative of the sample, etc.) (Hinman, 2013)
Dealing with large sets of data (Hinman, 2013)
Dealing with data from multiple sources and multiple formats (Hinman, 2013)
Factors That Enable Data Mining
Tools exist to assist with data mining
The company already has the data - the data just needs to be formatted and mined
Companies want to succeed and data mining gives them internal and external insight to help with just that
Several approaches exist to mining data (i.e. pattern recognition, statistics, software, etc.)
3 Final Points
1. Being able to effectively mine data can give a company a competitive advantage. They can "discover" key information about their customers and internal/external factors.
2. Data mining has great potential as more software is developed to assist with this process and to capture the full benefit.
3. Unlike some other technology capabilities, every company has data, and thus has the ability to mine data.
source: (Kriegel et al., 2007)
source: (Pfizer)
source: (Palace, 1996)
source: (Alexander)
source: (Palace, 1996
source: (Palace, 1996)
source: (Pfizer Data)
Source: (Pfizer Data)
Scholarly Journal References
WANG, Q., & WU, X. (2006). 10 CHALLENGING PROBLEMS IN DATA MINING RESEARCH. International Journal of Information Technology & Decision Making, 5(4), 597-604. Retrieved from http://cs.uvm.edu/~icdm/10Problems/10Problems-06.pdf

Kriegel, H., Borgwardt, K. M., Kröger, P., Pryakhin, A., Schubert, M., & Zimek,
A. (2007, March 23). Future trends in data mining. Data Mining and Knowledge Discovery, 15(1), 87-97. Retrieved (10.1007/s10618-007-0067-9).

Alexander, D. (n.d.). Data Mining. Retrieved November 8, 2013, from

Data Mining Helps You Make Better Decisions (n.d.). In Microsoft
Business. Retrieved November 8, 2013, from http://

Difference Between Knowledge and Information (n.d.). In Difference
Between. Retrieved November 8, 2013

Hinman, H. (2013, July 23). 9 Data Mining Challenges From Data
Scientists Like You. In Salford Systems. Retrieved November 8, 2013,
from http://1.salford- Systems.com/blog/bid/305673/9-Data-

In Pfizer. Retrieved November 8, 2013, from http://www.pfizer.com/

Kriegel, H., Borgwardt, K. M., Kröger, P., Pryakhin, A., Schubert, M., &
Zimek, A. (2007, March 23). Future trends in data mining. Data
Mining and Knowledge Discovery, 15(1), 87-97. Retrieved (10.1007/

Palance, B. (1996). In Data Mining. Retrieved November 8,
2013, from http://www.anderson.ucla.edu/faculty/

Pfizer Data Mining Focuses on Clinical Trials (2006, February
23). In Bio IT World. Retrieved November 8, 2013

Pintér, G., Madeira2, H., Vieira, M., Majzik, I., & Pataricza, A.
(2005). A Data Mining Approach to Identify Key Factors in
Dependability Experiments. In Dependable Computing -
EDCC 5 (pp. 263-280). N.p.: Springer Berlin Heidelberg.
Retrieved November 8, 2013.

WANG, Q., & WU, X. (2006). 10 CHALLENGING
Journal of Information Technology & Decision Making, 5(4),
597-604. Retrieved from http://cs.uvm.edu

Full transcript