CESWP Mind Map

Mind map for CESWP project »
Cybera TechRadar

CESWP
CESWP Goal: To incorporate modeling and simulation tools into the Canadian Space Sciences Data Portal (CSSDP)*.
Why? To make it easy for space scientists to run simulations and models on the data. It must be easy to use and flexible.
Virtual Organization:
Patrick Mann,  Cybera
Robert Rankin, University of Alberta (Principal Investigator)
Hans De Sterck, Waterloo
How? 
Move CSSDP to the cloud (i.e. virtualize it using cloud technologies)*
Use Amazon Web Services interfaces
Use MapReduce and Hadoop to do parallel computation for modeling and simulations.
What is MapReduce?
"A programming model and implementation for processing and generating large data sets"
What is Hadoop?
"A framework for running applications on large clusters built of commodity hardware."
Uses Map/Reduce paradigm
Success? "Measured by broad acceptance and use by the space sciences community."
*CSSDP is the first generation CANARIE
Network Enabled Platforms project (NEP).
CESWP is the second generation NEP project.
The projects overlap (CSSDP Oct'08-Dec'10,
CESWP Oct'09-Jun'11).
*NB: Move CSSDP platform to cloud, not data. Data stays at the sites.**
**Q: Why does data stay at sites? Why not
put it in the cloud?
Simulation Tools (used by Space Physicists)
IDL
MatLab
TechPlot
High Performance FORTRAN
C, C++
*IDL is expensive and not quite as good as MatLab.
Licenses available for up to 10 concurrent users on portal. 
Need to consider licensing issues...
**TechPlot is like IDL, but better for visualization.
Enabling Technologies
KVM virtual machines
Eucalyptus (Amazon EC2-style virtualization)
Ubuntu Linux
MapReduce/Hadoop
CANARIE/Cybera Cyberinfrastructure
Web Services
AJAX
Scope of work
Milestone 1
Cloud Simulation Foundation:
Virtualization
Milestone 2
Initial Research 
Cloud Platform
Milestone 3
Cloud-Enabled
Simulations Phase 1:
No MapReduce
Milestone 4
Geographically
Distributed Cloud
Platform
Milestone 5
Simulation
Distributed Cloud
Milestone 7
Cloud-Enabled
Simulations Phase 2:
MapReduce
Milestone 6
International Cloud
Platform
Solar (and other) radiation  "blows" over the Earth
Cloud Enabled
Space Weather
and
Modeling
Data Assimilation
Platform
Cloud-Enabled 
Space Weather Platform
...actually, it's the
Let's look at the parts of this fancy title...
stands for...
Cloud Computing means taking stuff off physical machines that are near you, and putting it on virtual machines that you access through the Internet. In principle, you don't have to know (physically) where the machines are (or where the software or the data is, for that matter). It all just works. Magic.
Why move stuff to the cloud?
1. Simplify IT management. The physical infrastructure required can be rented and managed by a vendor (e.g. Amazon Elastic Cloud Computing, a.k.a. EC2).
2. Make it really easy for scientists. They don't have to worry about what kinds of machines are running where. All they need is a well designed web-based user interface, and they can get their job done.
Earth's magnetosphere deflects much of it...
What is Space Weather?
Some follows the north and south polar flux lines and strikes atmospheric gases, ionizing them and causing the Aurora Borealis.
What does "Cloud-Enabled" mean?
Project Details
How do space scientists study space weather?
they collect data 
they look for patterns in the data 
they combine data sources to try to detect complex events 
they create models and run simulations to predict behaviour and test hypotheses
3. Let scientists get on with their work instead of purchasing and running complex and expensive hardware.
4. Run simulations in parallel on the most appropriate hardware available (to the group).
Once developed, a simulation can be used repeatedly (by different scientists) with different data and parameters

Simulations can often be run in parallel and/or on High Performance Computers (HPC) to speed up execution times
1. Move the portal system into a virtualized environment in preparation for movement to an infrastructure-as-a-service cloud computing environment
2. Install and configure several cloud computing nodes (hardware and software) at the University of Alberta, and move the virtualized system into the cloud environment.
3. Move existing smaller simulations (custom code or run in commercial tools like IDL or Matlab) into the new cloud environment and integrate them with the virtualized system. Multiple instances of simulations can be run; however, input data will not be parallelized across instances in a single run via a MapReduce-style algorithm.
4. Add nodes at the University of Waterloo and the University of New Brunswick to create a cross-Canada cloud infrastructure using the CANARIE network to transport data and virtual machines.
6. Add international nodes at partner sites in the US (Colorado) and 
China (Beijing) to the research cloud.
7. Adapt one or more simulations capable of multi-processing via MapReduce or a similar method to the cloud environment, ensuring geographic sensitivity. 

[NB: This is a timeboxed / proof-of-concept activity due to budget constraints, and its result may not be deployed into the production system.]
5. Test and validate the smaller analytics and simulations in the 
geographically distributed environment. Increased ability to use high-speed network to reduce computation times for numerically 
intensive research efforts.
Simulations
Types of Simulation
1. Simulations run many times with varying parameters
Suitable for cloud with no code changes.
2. Simulations with partially parallelized code
Potentially suitable for cloud, with some code changes.
3. Simulations with highly parallelized code (specifically for grids or clusters)
Not suitable for cloud, but can still be scheduled on specialized HPCs through portal.
Out of Scope: "Embarassingly parallel" pattern recognition on data sets (falls into first category). Will be included if time permits.
Michael Hesse, NASA Goddard Space Flight Center
Raymond Walker, UCLA
William Liu, Canadian Space Agency
Aaron Ridley, University of Michigan
Moritz Heimpel, University of Alberta
A: Because CSSDP doesn't own the data, it just federates it 
and accesses it when requests are made.
Rob Simmonds, Grid Research Centre, University of Calgary
Scope of work
Milestone 1
Cloud Simulation Foundation:
Virtualization
Milestone 2
Initial Research 
Cloud Platform
Milestone 3
Cloud-Enabled
Simulations Phase 1:
No MapReduce
Milestone 4
Geographically
Distributed Cloud
Platform
Milestone 5
Simulation
Distributed Cloud
Milestone 7
Cloud-Enabled
Simulations Phase 2:
MapReduce
Milestone 6
International Cloud
Platform
1. Move the portal system into a virtualized environment in preparation for movement to an infrastructure-as-a-service cloud computing environment
2. Install and configure several cloud computing nodes (hardware and software) at the University of Alberta, and move the virtualized system into the cloud environment.
3. Move existing smaller simulations (custom code or run in commercial tools like IDL or Matlab) into the new cloud environment and integrate them with the virtualized system. Multiple instances of simulations can be run; however, input data will not be parallelized across instances in a single run via a MapReduce-style algorithm.
4. Add nodes at the University of Waterloo and the University of New Brunswick to create a cross-Canada cloud infrastructure using the CANARIE network to transport data and virtual machines.
6. Add international nodes at partner sites in the US (Colorado) and 
China (Beijing) to the research cloud.
7. Adapt one or more simulations capable of multi-processing via MapReduce or a similar method to the cloud environment, ensuring geographic sensitivity. 

[NB: This is a timeboxed / proof-of-concept activity due to budget constraints, and its result may not be deployed into the production system.]
5. Test and validate the smaller analytics and simulations in the 
geographically distributed environment. Increased ability to use high-speed network to reduce computation times for numerically 
intensive research efforts.

Loading comments...

Please log in to add your comment.

Report abuse

More presentations by Cybera TechRadar