The Open Connectome Project
Community Access to High-Resolution Turbulence Simulations and Massive Graphs
»
The Open Connectome Project Community Access to EM Brain Images, Automated Annotations, and Neural Graphs I/O Challenges for Massive Graphs supercomputing cloud batch analysis engines custom hardware (Cray XMT) All require in-memory or semi-external memory representations of graphs small and shrinking diameter no good cuts no locality bad I/O lower bounds I/O unfriendly (traversal) algorithms Source: https://http://sixdegrees.hu/last.fm/ I/O friendly (sort-like) algorithms list ranking connected components minimum spanning forest maximal matching breadth-first search shortest paths transitive closure diameter and anything else you really want to know n-body: 10N particles, 1M HALOS, 1k snapshots 500M FB users 1B phone calls We reject in-memory graphs as undemocratic! I/O Tricks Don't Work Can we build data-intensive analysis for graphs? What's Hard about Graphs?? Too big for in-memory. Can't do I/O reasonably. Partition and parallelize (no good partitions) Localize and cache (no locality) Stream data (no natural orderings) human dmri human brain: 10 vertices, 10 edges 11 15 Current computing paradigms: Brains aren't social networks. spatial properties good cuts? locality? Final Thoughts Data-intensive services have revolutionized public access to high-resoluation scientific data Astrophyics (surveys and n-body) Turbulence/MHD/Fluids Environmental (observational and models) Genomics, biology, and bioinformatics Data in graphs and networks represent the next frontier (with fundamental obstacles) Describe the brain as a graph of neurons and neural connections Create a data-intensive Web service for: (1) analysis (statistical inference) on brain graphs (2) queries that correlate graph and spatial (image) data Fill in the knowledge gap between neurons (~10) and functional regions (10 ) and use functional knowledge to implement neural circuits and algorithms in silico. Open Connectome Science Goals Science Questions (from Jo. Vogelstein) How does the brain compute/store information so efficiently? How many neuron “cell types” are there? Which models of neural computation underlie various cognitive phenomena? How can we predict/modify human behavior? The Good (the data) The Bad (annotations) Human brain has 10 neurons and 10 connections Teams of poorly paid undergraduates have annotated 15! The 15TB of raw data contain only ~300 neurons 11 15 Noisy/transparent/anisotropic data makes automated annotation difficult. 3x3x40 nm pixels 135,200 x 119,600 images ~1154 slices 15 TB The Ugly (the data) An Automated Annotation Platform Provide raw image data (2d-subslices and 3d-subvolumes) via a RESTful Web service. Scienctist implements algorithm on local computer and creates annotations. Upload annotations to Open Connectome to visualize We capture all annotations to assemble knowledge about the brain structure! Algorithmic crowd sourcing? Build a client-side GPU-enabled viewer to show arbitrary slices of a sub-volume and the outlines of annotations. 3-d volume annotation representation fast intersection/visibility techniques Not my job. (M. Chuang and M. Kazhdan, CS/JHU) 3-d Annotations 8 image from cover of Nature vol 4771, # 7337
More presentations by
Copy of Democratizing Data-Intensive Computing
Randal Burns on
Community Access to High-Resolution Turbulence Simulations and Massive Graphs
Democratizing Data-Intensive Computing
Randal Burns on
Community Access to High-Resolution Turbulence Simulations and Massive Graphs