Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

The Life of a Data Point at the HII

No description
by

Ken Young

on 30 April 2014

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of The Life of a Data Point at the HII

The Life of a Data Point at the
University of South Florida
Health Informatics Institute

Screening
Recruitment
Sources
Marketing
Social
Media
Partners
Forms
Labs
Vocabularies
Participants
Institutions
Data
Storage

Security
Data
Formats
Data
Collection

Data
Processing

Structuring
Cleaning
Data
Warehousing

Policies
Data
Transformation
Techniques
Private Cloud
Download
RESTful
API
tranSMART
Exploration
Analysis
Publication
Tools
PlinQ
Metaphlan
Impact
World's largest grant funded Type I Diabetes Data Coordinating Center
Steven W. Fiske, M.A.
Kenneth G. Young II, M.Ed.
Unstructured
Semi
Structured
Structured
Information Technology team comprises about 50 employees
Software Engineers
Computer Engineers
Solutions Architects
Database Engineers
Statistical Programmers
Quality Assurance Analysts
Business Analysts
What does IT do in an epidemiological research environment?
Holistically involved from data collection to analysis
Technology plays a vital role in the conduction and operation of scientific research
Other marketing approaches
Social media, partners, campaigns, and to recruit participants
Contact registries
Campaigns
Various systems used to store data
Oracle
Hadoop (Big Data)
SQL Server
File System
90% of world's data was created in the last few years
Structured Data
Semi-structured Data
Unstructured Data
1TB of information is stored during each trading session on Wall Street
Forms
80% of world's data is unstructured
Data formats vary
Accelerometer
Images
Video
Data is in more readable form and easily accessible for planning and executing clinical research
Consolidates data from a variety of sources to present a unified view of the data
Consists of clinical, laboratory, operational, and financial data
The emerging field of data science is revolutionizing how we explore data
Data scientists assist in the planning, collection, transformation, analysis and reporting of clinical trial data and communication of their results
Your Name
Sponsor a Rack!
E-mail
Medical image archives are increasing by 20-40% annually
30MB X-RAY
150MB MRI
1GB 3D CT SCAN
120MB
MAMMOGRAM
Software Engineers
Computer Engineers
Statistical Programmers
Database Administrators
Quality Assurance
Data Scientists
Solutions Architects
Business Analysts
Dr. Doe
Constantly evolving to meet rapid changes in technology
Improves data integrity
Removes data structure complexity for researchers
Provide a suite of options to access and download curated data.
Secure interfaces give external collaborators the ability to run analysis on our high performance computing cluster (Big Data)
Developed systems to improve data quality through automated data cleaning
Program systems to aggregate data from various sources
Data sharing policies to comply with established standards in the field
Patient portal
7 billion = World population
6 billion = People with cellphones
Electronic case report forms
Specimen/lab system
In 1979 a 250MB hard drive weighed 550lbs
In 2014 a 16GB microSD card weighs 4/10 of a gram
Cost of storage has decreased, but amount of data has increased
504 TB Hadoop Cluster
1 TB = 1,048,576 MB
336 core processing units
30 nodes
1,792 GB RAM
High-density storage servers
Horizontally scaling compute and storage system
Online screening
Google Analytics
Recruitment can be a challenging part of a clinical research study
For example, over 100,000 participants were screened on the TEDDY study for a study sample size of less than 10,000
500,000+ clinical research study participants
187+ million participant responses to clinical research questions
Secure web based system to facilitate the collection of clinical research data
Adverse Event system
Numerous large scale international multi-institute clinical research trials
Globalization/localization
Internal and external researchers need access to transformed data
Data sets can be large and complex
High powered computing cluster
Grid computing
Graph databases
Data mining
Data values must be verified for integrity
Validation rules must be applied to data
A date of birth should not be April 29, 3014
Data should be run through a series of data checking operations
Leverage tools to migrate and transfer data
The Data Warehouse
Data must be combined from various sources and transformed into a readable format
R
Interoperable with statistical programs
Technologies and tools provide researchers with advanced capabilities
Researchers must perform advanced statistical analysis and predictive analytics on large amounts of data
SAS
R
Crucial component of our institute
Full transcript