Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Multi-Core, Many-Core, GPU, HPC:A New Perspective for Data Mining

In this presentation I shall talk about the recent developments in Computer HW and its implications on Data Mining Research
by

Nagasuri Bala Venkayteswarlu

on 26 March 2011

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Multi-Core, Many-Core, GPU, HPC:A New Perspective for Data Mining

Multi-Core Many-Core GPU HPC A New Perspective for Data Mining. Professor in Computer Science
AITAM, Tekkali,AP, India My Talk What Do I Cover?
The Status of Today's Computer Architectures
How Data Mining Problems Gets benifitted. http://videolectures.net/kdd08_han_mmrfid/ Thanks A Lot Evidently Data Mining Problems are:
Classification
Clustering
Forecasting and Prediction
Pattern Completion
Estimation HPC in Biology
BlueGene Project
BlueBrain Project Challenges with respect to Data Mining
New parallel data mining algorithms, especially in pattern mining, classification, clustering
Multicore CPU data mining
GPU data mining
Cluster-based or grid-based distributed data mining
Cloud data mining
Peer-to-peer data mining
Performance analysis of parallel data mining algorithms
Theoretical foundations of parallel data mining
Cache-aware / Cache-oblivious data mining algorithms
Parallel data-mining over streams
Parallel implementations of data mining algorithms with programming languages well adapted to parallelism, for example functional languages
Applications using parallel data mining in either an industrial or a scientific context

Prominent Problems with DM
Size
Dimensionality
Includes non-numerical
Imprecise information How Speical Computers Will Solved DM Problems
First, they can meet size and dimensionality
DM algorithms are iterative and aims for sub-optimal solutions. Thus using the special machines, we can have better solutions.
Some of the DM algorithms are online. So, special computers can meet online requirements. Will there be any thing like Y2K rush? Graph Mining The Fourth Paradigm is dedicated to and reflects the vision of the late Jim Gray of Microsoft Research, who envisioned “a world of scholarly resources—text, databases, and any other associated materials—that were
seamlessly navigable and interoperable.” Sailing on 0's and 1's If one were to choose an index that best represents a national military capacity, the high performance computing power of a nation would win as most comprehensive measure. More and more countries view supercomputing technology as a symbol of national military power.

Chinas Tianhe-1A
Site National Supercomputing Center in Tianjin
System Family NUDT MPP
System Model NUDT YH MPP
Computer NUDT TH MPP, X5670 2.93Ghz 6C, NVIDIA GPU, FT-1000 8C
Vendor NUDT
Application area Research
Main Memory 229376 GB
Installation Year 2010

Operating System Linux
Memory 229376 GB
Interconnect Proprietary
Processor Intel EM64T Xeon X56xx (Westmere-EP) 2930 MHz (11.72 GFlops)
Cores Rmax(GFlops) Rpeak(GFlops)
186368 2566000 4701000

Indian HPC Facilities Listed in Top 500 as of Nov 2010

Computational Research Laboratories, Tata Suns, 48
Indian Institute of Tropical Meterology 137
CDAC 299
Govt 491
Other Facilities
Institute of Mathematics
IIT Kanpur
DRDO-Hyderabad
TIFR
IISC, Banglore
IIT-Delhi What applications can use the 128 cores expected in 2013?

Over same time period real-time and archival data will increase as fast as or faster than computing.
Internet data fetched to local PC or stored in “cloud”
Surveillance, Environmental monitors, Instruments such as LHC at CERN, High throughput screening in bio- and chemo-informatics

Results of Simulations
Intel RMS analysis suggests Gaming and Generalized decision support (data mining) are ways of using these Cycles

The Landscape of parallel computing research: A view from Berckely Composition of an application: seven dwarfs Some More Challenging Problems That can be evolving
Live Stock Managing and Monitoring
E-science
M-Science
Sensory Networks
Some of Our Work
S.N. Tirumal Rao, EV Prasad, NB Venkateswarlu, Parallelization of Data Mining Algorithms along with Memory Mapped Files on Dual-Core processors, Accepted in IJCSNS, 2009.
S.N.Tirumala Rao, E.V Prasad, N.B.Venkateswarlu and G.Samba Siva Rao “Performance Evalution of Memory Mapped Files on Dual Core Processors Using Large Data Mining Data Sets” International Journal Of Systems And Technologies Volume: 2, Number: 1, PP 137-148 2009.
S.N. Tirumal Rao, E.V. Prasad and N.B. Venkateswarlu “Performance Evaluation of Memory Mapped Files with Data Mining Algorithms” International Journal of Information Technology & Knowledge Management Vol-II, Issue-II PP.365-370 of Dec. 2009.
S.N.Tirumala Rao, E.V Prasad And N.B.Venkateswarlu “A Critical Performance Study of Memory Mapping on Multi-Core Processors: An Experiment with k-means Algorithm with Large Data Mining Data Sets”, IJFCA journal (International Journal on Futuristic Computer Applications), number 9, March 2010. www.ijcaonline.org/archives/number9/211-358.
Venkateswarlu, N B; Boyle, R D. New segmentation techniques for document image analysis. Image and Vision Computing, pp. 573-583. 1995.
Full transcript