By Lars Vilhuber
Social Science Gateway to TeraGrid @ Cornell University What? Usual social science data management and analysis tools not available on TeraGrid (TG) itself: SAS, SPSS, Stata Also provides standard TG tools: R, compilers, etc. Graphical desktop access to SSG compute nodes High-speed (10Gbps) access to TeraGrid and thus to thousands of compute nodes and/or data Local data storage (base allocation includes some storage, additional storage from Cornell storage cloud at cost) TeraGrid U.S. Census RDC What? More information How? Request TG allocation Request SSG account Start using SSG Ready for TG? What else? Other Cornell resources Cornell TG Matlab cluster Additional storage from Cornell storage cloud at cost SSG = VirtualRDC VirtualRDC replicates the environment of the U.S. Census RDC network (Linux OS, GUI) Provides zero-obs datasets replicating data structure of RDC confidential data Provides (where available) synthetic versions of confidential datasets, including early-release access Who? Researcher has existing theory/model/data Project requires substantially more resources than typically locally available Researcher is comfortable with the traditional data tools (SAS, SPSS) Less comfortable with TeraGrid tools Iterate ! Until finished TeraGrid? What is that? You have this great idea... But it requires substantial compute resources, and your real expertise is with SAS/SPSS/Stata (the usual social science tools) Now you've received your TG allocation, and your TG credentials.... Now you have TG access Access to SSG and its tools Now you need to transfer your data to the SSG Log on to the graphical desktop, "pull" or "push" your data to the SSG, and start using SAS/SPSS/Stata Your data has been prepped on the SSG, your model has been tested, now you push your model to the TeraGrid for the "real" computation.... Oops... something not quite right... back to the SSG... After several iterations between data prep stage and estimation stage (the usual...), you're done! (RDC) The Census RDC network provides secure and controlled access to a large number of very detailed (and thus confidential) datasets on the U.S. population and economy.