Loading presentation...

Present Remotely

Send the link below via email or IM


Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.


Research Storage at UNSW for Questnet - working

No description

Luc Betbeder

on 30 April 2014

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Research Storage at UNSW for Questnet - working

Building a long-term
Research Data Storage
Service for UNSW.

Luc Betbeder
Current State
Strategic Investments
Building the Service
speaking for the whole team...
Large amounts of
different kinds of
research data
being stored
current state
The grey person icons represent different kinds of research activity taking place on campus.
Digital Microscopy using Aperio ScanScope.

Creates super high resolution image files.
Slides service.
Average Filesize: 3 Gig
Directory size: 3.5 TB
and many
fit-for purpose
local specialist
systems are
being used for analysis and computation...
Computer cluster: Leonardi - 3000 nodes
eg: Models of diesel spray and combustion in egines
Computer cluster: Leonardi - 3000 nodes

Storage capacity: no-long term storage...

1 x RAID5 (3 x 2TB) + 1 HS (2TB) :
/home (3TB) + /share/apps (750GB)

1 x RAID50 (5 x 5 x 2TB) + 1 HS (2TB) :
/share/scratch (37TB)

UNSW IT Hosted and Fac IT moderated
Disk - NetApp H/W used
Model and Service Work well
Cost is a barrier for smaller groups = $1K/TB

But is being used:
Sci - 380 TB
Med - 116 TB
Eng - 58 TB
Existing Research Storage Service
Forced to delete files and/or store them on external USB HDD.
Strategic Investments
USB Drives in cupboard in Mech.ENG
From IT: Vishal Sehgal, Amany Nuseibeh, Chris Will,
Sergey Sashin, Seri Charoensri, Berhard Semtner,
Dusan Munizaba, Jim Leeper, Denise Black... (and yes comms too... Greg Sawyer and team). And the key business stakeholders: Barbara Chmielewski, Prof Mark Hoffman, Greg Leslie, Grainne Moran, Maude Frances and all the wonderful participants in the Advisory Groups and Pilot sites. Thank you.
Strategic Investment Planning (3 yr plan)

> process:
Align IT investment to UNSW "Business Domains"
(ie: Research, Academic, Other)

> method:
In-domain prioritising > estimating > voting

> outcome:
Multi-year / multi-stream investment for research storage (To support "Research Practice", meet Policy obligations, reduce Risk and "Providing an excellent research environment, with cutting-edge facilities and equipment.")
Long-Term Storage
A long-term "accessible archive" with metadata capability to support research practice at UNSW.

> No direct charge to researchers or project.

> Principles: large, functional, cheap, extensible, supportable, aligned, secure.

NOT a store for computation or analysis.

Interface / Portal
Devices and Bookings
Enhanced Metadata
Active Storage
A portal for accessing and using the UNSW long-term store and other data storage services (e.g. RDSI node(s), vendor/cloud).

Linked to other UNSW systems (library, research projects, data warehouse, authentication etc.)

It is policy-driven. You create a data plan to use the store.

We would like to link our research device booking system to the portal.

And eventually to provide tools to enable direct ingestion of research device data, device metadata and booking system metadata into store.
Over time we want to provide the Store with additional metadata, integration, search and collaboration functionality.

This may include federated search (searching across stores) and improving the ingest metadata capabilities of the store.
A self-service capability for researchers and research projects to access "active storage" (for compute) via the portal.

Active storage costs (unlike long-term storage) would be charged back to the projects.
Starting the work.

> strategy and architecture:
Roadmap / vison document created and aligned with other streams of work taking place at UNSW and elsewhere. (Primarily Library and RDSI/Intersect)

> governance:
Boards and Business advisory Groups established.

Installation and configuration
Testing: July-Aug
Piloting: Sept-Dec

Long-Term Storage
Interface / Portal
Devices (Bookings)
Enhanced Metadata
Active Storage
Architecture and Solution Design
Build: Aug-Dec
Release: 2014

Architecture and Solution Design

Long-Term Storage
Interface / Portal
Devices (Bookings)
Enhanced Metadata
Active Storage
Run the service.
Expand capacity / capability.

> space allocation
Mix of existing targeted high risk and high value projects (100) and all new projects (1000).

> support
Mix of project-supported on-boarding for the messy existing projects and self-service data-plan-driven-through-the-portal.

Smarter Data
Smart Devices
> system tests then pilots

> ingest via Web / Script / NFS

> leveraging local IT support

UNSW Storage Service
Local System and Store
Local Support
2013 Pilot Phase
is being built...
NFS (Test)
> procurement process.
2014 Service Established
UNSW Long Term Research Data Storage Service
LiveArc Web
LiveArc Script
> on-boarding of high-risk and high value projects.

> new projects fill in a data plan and use the interface

> allocations and provisioning behind-the-scenes

> links to RDSI Node (Intersect)

> support functions established

UNSW Storage Service
Local System and Store
NFS (Test)
LiveArc Web
LiveArc Script
Local Support
Leverage existing Systems and Services at UNSW
IT Support
Data Warehouse
IT Systems
Authentication, Monitoring, Backup, Security, Networks....
Data Feed from INFOed (Research Management System), HR, Student, Org data...
Library Systems
Create Research Data Plan
Request UNSW and RDSI Storage
Automagic Provisioning
Manual Provisioning
Vendor Cloud
Allocation Model and Process not visible to end-user.
How big is our data problem?
Evaluation process based on these design principles.
> Very strong vendor responses.
Object-Based Storage Devices (OSD)
Hierarchical Storage Management (HSM)
Expanding current Storage System
Oracle SL3000 with 700 Slots / 4 Drives
SGI IS5500 storage array.
0.5 PB
1.0 PB
HSM... Disk and Tape
via SGI
SGI LiveArc = Arcitecta MediaFlux
Ingest with MetaData
Script / API
Web Client (Java)
+Growing (to 3PB)
+Protecting (second copy)
+Staging (dev, UAT)
> Grow the store (3PB)

> Second Copy options (3PB)

> Pre-Prod environments

2013 Extend Phase
UNSW Storage Service - Questions for 2013
Primary Store
Second Copy
On Prem?
2 or 3 Env?

VMs goodenough?
Reporting for Governance.
Improving use of Meta-data tools.
Scripting support.
Using RDSI or other stores
Challenges and Next Steps:

Linking to UNSW Storage at Intersect
Accessing RDSI storage at Intersect

Work on Automation and Connectivity
including Authentication.

Putting public Vendor cloud or UNSW private cloud options into the Interface. "DIY-user-pays-compute"
Four Pilot Groups
Mark Wainwright Analytical Centre (Research Division)

Climate Change Research Centre (SCI)

Australian Wetlands & Rivers Centre (SCI)

Leonardi HPC (Mechanical Engineering)
Pilot Groups
via Interface
Manage Access / Groups
Is the timing and price right to use a vendor cloud as the "Second-Copy" for the store?

Building the Service
Full transcript