Send the link below via email or IMCopy
Present to your audienceStart remote presentation
- Invited audience members will follow you as you navigate and present
- People invited to a presentation do not need a Prezi account
- This link expires 10 minutes after you close the presentation
- A maximum of 30 users can follow your presentation
- Learn more about this feature in our knowledge base article
CESWP Resource Management Strategy
Transcript of CESWP Resource Management Strategy
Analogy: Fixed amount of space available, with guarantee that monthly parkers (e.g. core cloud users) get space when they need it, and still have available space on-demand (but no guarantee) for casual users. Amazon EC2 provides three types of instances:
On-Demand Instances: ask when you need it, but no guarantee that you'll get it.
Reserved Instances: pay up front for either 1 or 3 yr term and get guarantee that you will get the resources whenever you ask for them.
Spot Instances: bid for instances. If you are the high bidder, you get the instance(s), but they can be terminated without warning if Amazon needs them.
Can have max of 20 On-Demand or Reserved Instances, and max of 100 Spot Instances. If you want more, need to make a request and include a description of your intended use. Wikipedia: Use an optimistic strategy for use. Vetting articles before allowing them to be published failed; publishing articles and vetting and cleaning them up afterwards succeeded. Analogy is to keep rules for cloud-use very loose initially, and deal with abuses on a case-by-case basis. If necessary, tighten the rules up as required based on experience. YAGNI: "You Ain't Gonna Need It": Agile software development principle that you shouldn't develop software that you think you may need. Just develop the stuff that you know you need. Analogy is to keep rules for cloud governance as simple as possible to start with, and only for things that you know you absolutely must do. Observation: Is it possible to tie the resource management strategy to the type of use (and the associate lifetime for that use)? Should different percentages of the cloud be allocated to each type of use?
For "Develop" use case, need one VM with a long lifetime
For "Collaboration" use case, need one VM with a medium lifetime
For "Run" use case, need 1-many VMs with a short lifetime
For "Analyze/Visualize" use case, need 1 (or possibly many) VMs with a short-medium lifetime
For "Archive" use case, need long storage lifetime for machine image and data. Q:Will CESWP support Automated pre-emption of VMs?
A: No. Recommendations (from Hans De Sterck, Rob Simmonds, and Patrick Mann): Restrict number of resources available to user (a la Amazon) But make the limits as large as possible to encourage use! Consider using "maximum processing seconds" (MPS) as a combined measure of both quantity and duration. For example, a given MPS could resolve to a large number of cores for a short duration, or a few cores for a longer duration, depending on a user's needs. Consider setting limit based on total resource pool (e.g. user can't try to take more than 50% of available resources). If you want more resources, make a request (e.g. fill out a form, a la Amazon) Since it is a relatively small community (i.e. the Space Physics community), consider letting everyone see which resources are being used by whom.
Community can then be more or less self-regulating
e.g. If I need resources but none are available, and I see that someone has a bunch, I can contact them and ask them to free them up. (suggested by Robert Rankin) Have a referee that can be contacted in case of problems. If resource request exceeds maximum (or available), allow user to burst out to commercial cloud (e.g. Amazon). But don't run job as hybrid. Either run the whole job in the CESWP cloud or the whole job in the Amazon cloud. Monitor usage (who, what, how long, what for, how much, etc.). Other notes/questions: Q: How big will the cloud be?
A: Small: 16-24 cores/zone for 3 zones; 80 cores for one zone; 128 cores for one zone. Average 2GB RAM/core. But it can grow over time... Q: How many users?
A: 6-10 for first few months. Over time it will grow to support the Space Physics research community. Q: Does it really make sense to provide VMs for development ("Develop" use case)? These are long lasting. Why wouldn't a researcher just use the machine on their desk?
A: We don't want to pre-judge the best uses for the cloud. We want to start by allowing all use cases that may make life easier for scientists and then winnow out the ones that don't make sense or aren't used in practice. There is also an argument to be made that having a development environment in the cloud allows a scientist to be location independent (e.g. at a conference, visiting colleagues, at home) without caring about location of development machine. But...we will keep it in mind in case project scope allows and if it seems important. Keep the rules as simple as possible. Only add as much complexity as is absolutely necessary. CESWP Cloud Resource Management Strategy Recommendation: Consider checkpointing and quiescing long running instances that are not in use to free up cloud resources (e.g. "Develop" use case). We will keep this in mind, but it is out of scope.