Data Management at UWM

Policies, procedures, and resources for managing your data at the University of Wisconsin-Milwaukee.
by Brad Houston on 28 August 2013

Transcript of Data Management at UWM

Policies, Procedures, and Resources
Data Management at UWM
Somebody usually requires that you do so as condition for funds.
Why do I need to manage data?
Both NIH and NSF have specific data management requirements.
Similar requirements are not far off for other government-funded agencies.
Even non-government grantors want proof that you're spending their money appropriately!
It's also good practice for research in general.
You can find the specific data you need more easily.
You're more easily able to share data with lab-mates, colleagues, or external researchers.
You make sure that data is available for future research (including by you!)
But there are resources available to you!
Depending on how much data you have, managing it can seem overwhelming.
What is meant by "data" in this context?
Research Data: The Basics
In general, there are three sets of policies to be aware of:
Research Data Policies to Know
Organizing your Data
Helping yourself: Managing your own data
Information Security
Us Helping You: UWM Resources for Data Management
OMB Circular A-110 defines 4 types of data:
Examples: Sensor data, telemetry, survey data, sample data, neuroimages.
Examples: gene sequences, chromatograms, toroid magnetic field data.
Examples: climate models, economic models.
Derived or compiled
Examples: text and data mining, compiled database, 3D models, data gathered from public documents.
And I have to manage ALL of that?!
Well... Yes and No.
This data is generally directly created by you, so you're more responsible for its management in general. More often irreplacable.
This data may be reproducible, so describing the steps used to produce it may be more important.
This data is almost always extensive and difficult to reproduce. Document your models and metadata.
The data itself is elsewhere; you are responsible for maintaining the procedures for deriving your results.
What do I need to do to manage data?
(This question will be answered more fully elsewhere. But to get you started:)
What data are you collecting or making?
Can it be recreated? How much would that cost?
How much of it? How fast is it growing? Does it change?
What file format(s)?
What’s your infrastructure for data collection and storage like?
How do you find it, or find what you’re looking for in it?
How easy is it to get new people up to speed? Or share data with others?
Who are the audiences for your data?
You (including Future You), your lab colleagues (including future ones), your PIs
Disciplinary colleagues, at your institution or at others
Colleagues in allied disciplines
The world!
What are your obligations to others?
Funder requirements
Confidentiality issues
IP questions
How do you and your lab get from where you are to where you need to be?
Document, document, document all decisions and all processes!
Secret sauce: the more you strategize upfront, the less angst and panic later.
“Make it up as you go along” is very bad practice!
But the best-laid plans go agley... so be flexible.
And watch your field! Best practices are still in flux.
Preliminary analyses
Raw data is included in this definition
Drafts of scientific papers
Plans for future research
Peer reviews or communications with colleagues
Physical objects, such as gel samples
You should also be thinking about your
Data Management Plan.
(a document attached to your application that indicates to the grantor how you will manage your data.)
Many agencies require this; it's a good idea to have one even if they don't.
All submitted plans must include, at minimum:
Expected Data: types, physical/electronic collections, materials to be produced
Standards for data and metadata format and content
Policies for access and sharing, including provisions for appropriate protection of privacy, confidentiality, security, intellectual property, etc.
Policies and provisions for re-use, re-distribution, and the production of derivatives
Plans for archiving data, samples, and other research products, and for preservation of access to them
Other agencies may have additional requirements, but these are a good starting point.
Federal Policies
Agency Policies
UW-System and UWM Policies
These may or may not apply to you depending on the type of research you're doing.
OMB Circular A-110
NSF: Data Management Plans
NIH: Public Access Policy
Outside Activities Reporting
IRB and/or Animal Care programs
Work with non-university partners
Special Cases?
Research Grant Initiative Awards
Inventions and other Intellectual Property
What if I leave UWM?
Data Security
Data Preservation
File format considerations
Storage media considerations
Data Plans: External Resources
Data Sharing
Graduate School Research Portal
UWM Libraries
UWM Cyberinfrastructure
The least you need to know:
Research data may be subject to Freedom of Information Act (FOIA) requests and disclosure
Data should be retained for a minimum of three years after the close of project
If your research is paid for in part or whole by federal funds*, this applies to you!
*If your granting agency has an "N" somewhere in it (NIH, NSF, NEH, etc.) that's a good indicator, but don't rely on this-- check it yourself!
The least you need to know:
You must inventory and classify your data by information security level
You must meet certain minimum security requirements
The security of your data will be constantly monitored
Note that you are only obligated to provide the raw data-- any interpretation thereof is the requestor's responsibility.
(The good news: FISMA only applies if you are
for the Feds, not for merely taking research money.)
Though making sure your data security is robust is probably not a bad idea in general.
These requirements are by far the most stringent, so get specialist help to make sure you're doing them right.
If you are working with medical/health records, this will apply to you.
The least you need to know:
"Privacy Rule" puts limits on disclosure of protected health information (PHI)
"Security Rule" outlines security requirements for electronic PHI
Rule of thumb re: disclosure under Privacy Rule: Don't disclose PHI to anyone other than the patient unless you get written permission from the patient to do so.
For the security rule, follow basic information security safeguards:
Restrict user permission to only those who need this data for their jobs
Keep physical data storage media in a secure area
Ensure all technical safeguards are in place to prevent network data breach
For MUCH more information:
This will apply to you if you are working with data pertaining to students.
The least you need to know:
You need to obtain written permission from students and/or their guardians to use student data
(Possible exception: if you're doing your research on behalf of UWM itself)
You need to have safeguards in place to prevent accidental disclosure of data
Aggregate or statistical student data (which does not contain info on specific students) is not covered
Again, much more on this at
NSF requires you to submit a data plan of up to 2 pages as part of your grant application.
Your application won't be accepted without it!
Once your article goes live, you must submit your data to PubMed Central.
NIH wants you to have a data plan as well!
NSF and NIH are only the first grant organizations to require a data plan.
Other grantors will require them sooner or later, so get practice in now!
You should be filling one of these out anyway to disclose any potential conflicts of interest, but grant funding ensures that you will have something to declare.
UWM emails Faculty/Staff when it's time to do these, so be on the lookout!
Animal Subjects require submission of an Animal Care/Use Protocol.
You must take (and pass!) the online Animal Care Program certification.
More info:
Research involving human subjects must be approved by the Institutional Review Board.
In addition to a general research protocol, you will usually need signed consent forms from subjects.
You must take (and pass!) an online training program
More information:
If your IRB project will last for more than 1 year, be prepared for a review that shows you remain in compliance.
Determine whether you or your partner will hold the "official copy" of data.
Figure this out before you start your research!
Be careful to delineate between university and non-university funds. Keep them in separate accounts.
More info:
Be sure you are in compliance with all relevant federal, grantor, or university-level policies.
Be prepared to report quarterly to the RGI administration about your progress, in addition to any grantor reports.
You will assign intellectual property from a RGI grant to the University for "protection and development as the University deems appropriate".
You must report any inventions or discoveries as a result of research to the university.
You may do this via the UWM Inventors Portal.
Report inventions BEFORE sharing data or publicly disclosing.
If working with an outside partner, the Office of Technology Transfer will negotiate research agreements and intellectual property assignment.
UWM's Open Research policy prohibits research with restrictions on openness or intellectual freedom--be aware before accepting funds!
This policy may be waived by the VC for research under special circumstances.
In most cases, your data will follow you to your new home.
(This is not an automatic process, however!)
Work with Office of Sponsored programs to negotiate a contract on transfer of funds and infrastructure.
(I'm not going to go into a ton of detail here, because that's what the rest of the Boot Camp is for.)
If you do nothing else here*, make sure your data is organized so you (or someone else) can find it later!
*Please don't do "nothing else" here... it's all important, but this has the most effect on your workflow.
Consider the following:
Metadata: How you describe the data as a whole.
May include author info, data subject, project info, etc.
Ontologies: How your data items relate to each other through a hierarchical framework
Usually includes a "controlled vocabulary" to make browsing easier
Tags: "Assigned vocabulary" to make data findable by terms that may or may not appear in the metadata
You should, at a minimum, familiarize yourself with your granting agency's requirements for writing/submitting data management plans.
A number of tools exist to help you format your data plan according to agency requirements.
Google search "Data Management Plan Templates" will get you a lot of these.
DMPTool includes not just templates, but customizable forms for a number of different granting agencies.
Be aware of your responsibilities to maintain data securely.
If it falls under any of the federal statutes I mentioned earlier, assume that extra security is required.
Security at home: make sure that only trusted employees have access.
Security Abroad: Keep your computer with sensitive data off the network!
If you NEED to have data on the network, get info security in on this. Don't try it yourself.
In general, keep secure data out of Cloud Services (Dropbox, Google Docs, etc.)
(There are certain exceptions to this rule which will be discussed in the data security session.)
Provision for this should be in your data plan!
Email or PantherFile
Will I be able to expose my work to a large audience?
What happens when I'm not personally available to share my data?
Journal Repositories
How long is the journal committed to keeping my research data?
Will keeping my data with this journal give me the audience I want?
Discipline Repositories
How long is the repository committed to keeping my research data?
Does my article/data meet any metadata requirements the repository might have?
A (partial) list of repositories is available at
Save your files in non-proprietary formats, if possible.
Examples: TXT for written documents; CSV for spreadsheets; TIFF for images
If non-proprietary formats are not an option, make sure you save the application WITH the data.
This applies to software you wrote specifically for this data as well.
Any physical media you can readily get your hands on is probably NOT going to work for long-term storage.
Can you still read one of these, for example?
Still, media such as DVDs, Flash Drives, external hard drives, etc. may be useful in the short term.
(Provided, of course, you don't walk away with the media and lose your data.)
For longer-term storage (5 years plus), consult an info professional.
The Libraries run an Institutional Repository (IR) which may provide a safe space for storing and sharing your data.
If you have a relatively small amount of discrete, downloadable files, the IR may be an appropriate resource for storing your data.
You will need to have the IR coordinator at the libraries set up a collection and/or give you permission to post to an existing one.
UWM Info Security maintains a list of Data Classification at
If your data is anything other than "unclassified", you will probably require Info Security help with encryption, firewalls, etc.
Researcher Central provides most of the information you will need to apply for and administer research grants.
The UWM Data Plan site discusses managing and sharing research data in particular.
Any special IT needs for your research should be directed here.
Cyberinfrastructure also manages the High-Performance Computing cluster:
Thank You
Brad Houston, University Records Officer
This presentation available on-line at
UWM Records Management also offers consults on writing your data management plan.
Take advantage of this service-- it can help you get the "lay of the land" in terms of what is possible for storage, sharing, etc.
In the medium term, your departmental drive may be an option...
It's distributed computing, so in theory it's a better backup solution. Having said that...
Check with your Unit IT before embarking on this plan.
Storage space is not infinite!
What kind of security can they provide on a dept. server?

Consults available for:
Writing computer technology-related portions of grant proposals.
Determining what computing resources should be included in your grant proposals.
Locating computing services to support your research.
