Introducing 

Prezi AI.

Your new presentation assistant.

Refine, enhance, and tailor your content, source relevant images, and edit visuals quicker than ever before.

Loading…
Transcript

Preservation policies define how to manage digital assets in a repository to avert the risk of content loss. They specify, amongst other things, data storage requirements, preservation actions, and responsibilities. A preservation policy specifies digital preservation goals to ensure that:

  • Digital content is within the physical control of the repository;
  • Digital content can be uniquely and persistently identified and retrieved in the future;
  • All information is available so that digital content can be understood by its designated user community;
  • Significant characteristics of the digital assets are preserved even as data
  • Carriers or physical representations change;
  • Physical media are cared for;
  • Digital objects remain renderable or executable;
  • Digital objects remain whole and unimpaired and that it is clear how all
  • The parts relate to each other; and
  • Digital objects are what they purport to be.

Why should we care?

Preservation Risks

Digital Preservation

Tutorial

Preservation Risks

Research Data

PREMIS

Sustainable Digital Preservation

Why should we care?

Websites

Preservation Approaches

OAIS

Video/Audio

Case Studies

Stick figures by Randy Borum (http://community.articulate.com/members/RandyBorum/default.aspx)

Digital Preservation Tutorial

By Nir Sherwinter

Agenda:

Outdated media

  • Why should we care?
  • Preservation Risks
  • Preservation Approaches
  • What is OAIS?
  • What is PREMIS?
  • What is METS?
  • How to get to sustainable digital preservation solution

Outdated formats,

applications and systems

Organizational failure

Now, let's get practical...

Over time all kinds of digital media become outdated. Technology is driven by innovation which unfortunately leads to very short periods of relevancy before redundancy. Data stored on redundant media becomes effectively useless if the appropriate hardware is not available to read it. This is a particularly difficult issue to manage where data is stored over long periods of time. Ideally, long term data storage should be technology independent, however this is not practical. A Cornell University website (mentioned above in another post) has actually documented the lifespan of various storage media with floppy disks lasting a whopping five years.

As hardware becomes redundant, so do file formats and the software which interprets them. A good example of this is Word Perfect; try to find a computer today which can read a Word Perfect document properly. Fortunately, system and format redundancy does not usually happen at quite as rapid a pace as hardware.

Viking Lander data

Why should we care?

Preservation Risks

Preservation Approaches

This is a massive threat to long term digital storage of any kind. Technology is so dynamic not only in innovations but also movement with vendors and competition killing off what seemed to be at one point very strong tech players. For this reason it would be a folly to rely too heavily on any one vendor/system/sponsoring organization because they change and often change quickly. Digital assets which need to be preserved long term must be protected from the failure of any one organisation. Unfortunately this is easily said but hard to plan for in such a dynamic environment.

What is OAIS?

What is PREMIS?

What is METS?

NASA's early space records are suffering a similar fate, as Joe Miller recently discovered.

The University of Southern

California neurobiologist

couldn't read magnetic tapes

from the 1976 Viking landings

on Mars. With the data in an

unknown format, he had to

track down printouts and hire students to retype everything.

"All the programmers had died or left NASA," Miller said. "It was hopeless to try to go back to the original tapes."

http://www.cbsnews.com/stories/2003/01/21/tech/main537308.shtml

Loss of context

Massive storage failures

Why Traditional Storage Systems Don't Help Us Save Stuff Forever?

Baker, M. Keeton, K. Martin, S. June 27 2005

HP Labratories Palo Alto

Basically no matter how much money you spend on the system housing your data there are still many ways in which it can fall over and create opportunities for data to be lost. This may be from hardware/software failure or an act of war. The longer you try to store data the more likely this will occur.

How to get to sustainable

digital preservation solution

Some data can be related, and this relationship can be vital to data interpretation. A good example of this might be the Rosetta Stone, discovered in Rashid, Egypt. The stone is engraved with hieroglyphics in three different languages and without the "key" of what these symbols meant noone was able to read the inscription. It took a French scholar Jean François Champollion fourteen years to decipher the inscription. Can you imagine if you had to take that amount of time to decipher each document on your PC because someone had forgotten to preserve the relationship between that document and its key? It would be like trying to assemble Ikea furniture without instructions, a complete waste of time. Unfortunately, if this relationship is not identified and preserved when information is first stored it is unlikely to ever be recovered. The longer the data is kept without this relationship, the less likely it is to ever be resolved.

It's important to understand that...

Berkeley, How Much Info?

Proportion of original, unique publishing (2003)

We are living in a digital era:

Mistaken erasure

Sometimes people accidentally delete things and if it's the only copy, then it's gone. On the other hand sometimes people think that they no longer need a piece of data and delete it on purpose only to find that it was in fact useful. The longer you try to store data the more likely this will occur.

A study by University of California, Berkeley more than 10 years ago showed that 92% of the materials the university published are digitally-born, meaning the original copy was digital. Paper publications are only 0.01% (for every 10,000 publications only 1 is paper !!).

Intentional attacks

Unfortunately in the world we live in there are some people who intentionally destroy or damage digital assets for a variety of reasons. As much of the information is currently located in open access repositories accessible via the internet it is also vulnerable to attack. This is a threat to both long and short term storage.

Using the LIFE project cost model, we can create a complete checklist:

Bit rot

Lack of resources

No affordable digital storage is completely reliable over a long period of time. For example some CD's have recently been shown to have a life span of only 2 years which could cause significant problems for anyone relying on them. Other media such as magnetic tape also suffers various types of bit rot. The worse thing about this threat is that is often undetected until it's too late to recover the material. You would very nearly have to employ someone to check all your media all the time to minimise data losses which would make most of these mediums too expensive to seriously consider in a preservation project. Bit rot is inevitable with any storage medium over a period of time.

Many institutions simply do not have the resources, usually financial, to consider digital preservation. These strategies are often overlooked as low priority and are likely to remain so until a major data loss scares people into action.

Why should we care?

Preservation Risks

L = Complete lifecycle cost over time 0 to T

Aq = Acquisition

I = Ingest

M = Metadata

Ac = Access

S = Storage

P = Preservation

Preservation Approaches

What is OAIS?

What is PREMIS?

What is METS?

How to get to sustainable digital preservation solution

Thank You!

And always make sure that Mr. Bean won't get access to your valuable assets!

Preservation Approaches

http://www.records.nsw.gov.au/recordkeeping/topics/digital-recordkeeping/digital-records-preservation-discussion-paper/approaches-to-digital-records-preservation

OAIS, PREMIS and METS

Bitstream Preservation

BitstreamEncapsulation Preservation

In the encapsulation approach, records are packaged as bitstream with metadata enabling a user in the future to display them. The leading example of this approach is the Victorian Electronic Records Strategy (VERS), the digital preservation program of ADRI member the Public Record Office Victoria.

In the VERS approach, record content is accepted in formats including Text files, PDF, PDF-A, JPEG, TIFF and MPEG, encapsulated using an XML 'wrapper' containing a standard set of metadata elements and authenticated using a digital signature. Each record that is 'encapsulated' can contain multiple documents that together form a record.

Bitstream preservation can be used as a foundation for other preservation strategies but is not adequate on its own for ensuring long term accessibility and authenticity. It involves simply storing the binary code (1s and 0s) that comprises a digital object bearing in mind that the object will not be reproducible without the original combination of hardware and software that created it. The advantages of carrying out bitstream preservation include:

Is not suitable as a preservation strategy on its own.

Can be 'records-centric' - not as effective for recording contextual information about people, organisations and functions.

Content and contextual information kept together to minimise risk of loss.

Having the opportunity to go back to the 'original' record in this form to carry out different preservation techniques in the future.

Preservation Approaches

PREMIS (PREservation Metadata: Implementation Strategies) is an international working group concerned with developing metadata for use in digital preservation.

Definition

Emulation

Migration

Digital repositories are computer systems that ingest, store, manage, preserve, and provide access to digital content for the long-term. This requires them to go beyond simple file or bitstream preservation. They must focus on preserving the information and not just the current file-based representation of this information. It is the actual information content of a document, data-set, or sound or video recording that should be preserved, not the Microsoft Word file, the Excel spreadsheet, or the QuickTime movie. The latter represent the information content in a specific file format that will become obsolete in the future.

The Metadata Encoding and Transmission Standard (METS) is a metadata standard for encoding descriptive, administrative, and structural metadata regarding objects within a digital library, expressed using the XML schema language of the World Wide Web Consortium. The standard is maintained in the Network Development and MARC Standards Office of the Library of Congress, and is being developed as an initiative of the Digital Library Federation.

An Open Archival Information System (or OAIS) is an archive, consisting of an organization of people and systems, that has accepted the responsibility to preserve information and make it available for a Designated Community.

  • Reference Model for an Open Archival Information System (OAIS)
  • Development led by the Consultative Committee for Space Data Systems (CCSDS)
  • Issued as CCSDS Recommendation (Blue Book) 650.0-B-1 (January 2002)
  • Also adopted as: ISO 14721:2003

This approach involves preserving the bitstream of the record and developing a tool which will be capable of reproducing the intellectual content of the record in a different format. The tool must be developed before the record becomes obsolete. Migration is then only performed when a record is requested.

Emulation is the replicating of functionality of an obsolete system. According to van der Hoeven, "Emulation does not focus on the digital object, but on the hard- and software environment in which the object is rendered. It aims at (re)creating the environment in which the digital object was originally created.". Examples are having the ability to replicate or imitate another operating system. Examples include emulating an Atari 2600 on a Windows system or emulating WordPerfect 1.0 on a Macintosh. Emulators may be built for applications, operating systems, or hardware platforms. Emulation has been a popular strategy for retaining the functionality of old video game systems, such as with the MAME project. The feasibility of emulation as a catch-all solution has been debated in the academic community.

What is PREMIS?

What is METS?

Converting to a different format may cause the record to lose authenticity if essential characteristics are affected.

What is OAIS?

  • Data formats which are open standards or which have published codes allow records to be reconstructed if applications are lost. Converting to a different format may cause the record to lose authenticity if essential characteristics are affected.

  • Tools for converting records to XML formats are now available as open source software

Still relatively untested in digital records preservation.

All of these preservation functions depend on the availability of preservation metadata—information that describes the digital content in the repository to ensure its long-term accessibility. While the Open Archival Information System (OAIS) reference model defines a framework with a common vocabulary and provides a functional and information model for the preservation community, it does not define which specific metadata should be collected or how it should be implemented in order to support preservation goals.

Concepts

Has the potential to be more effective for preservation of databases and multimedia.

Descriptive metadata

OAIS environment:

  • Producer provides the information
  • Management sets overall policy (not the day-to-day operations)
  • Consumer finds and acquires preserved information of interest
  • Designated Community is the set of Consumers who should be able to understand the preserved information.

METS Profiles in use:

Now we can understand the diagram:

Describes the intellectual entity through properties such as author and title, and supports discovery and delivery of digital content. It may also provide an historic context, by, for example, specifying which print-based material was the original source for a digital derivative (source provenance).

Some examples of a digital file’s potential semantic units would include:

  • the program on which the file was created
  • the version of that program
  • the operating system on which that program ran
  • who created the file
  • the rights associated with the file
  • when the file was ingested into the preservation system
  • dates the file was validated
  • and so on.

Structural metadata

Information is any type of knowledge that can be exchanged, and is expressed by some type of data.

For example: The information in a book is typically expressed by characters (the data) which, when combined with a knowledge of the language used (the Knowledge Base), are converted to more meaningful information. If the recipient does not know the language, then the book needs to be accompanied by dictionary and grammar (i.e., Representation Information) in a form that is understandable using the recipient’s Knowledge Base

Captures physical structural relationships, such as which

image is embedded within which website, as well as logical

structural relationships, such as which page follows which

in a digitized book.

  • Musical Score (may be a score, score and parts, or a set of parts only)
  • Print Material (books, pamphlets, etc.)
  • Music Manuscript (score or sketches)
  • Recorded Event (audio or video)
  • PDF Document
  • Bibliographic Record
  • Photograph
  • Compact Disc
  • Collection

Technical metadata for physical files

In order for this Information Object to be successfully preserved, it is critical for an OAIS to clearly identify and understand the Data Object and its associated Representation Information.

- For digital information, this means the OAIS must clearly identify the bits and the Representation Information that applies to those bits.

- The OAIS must understand the Knowledge Base of its Designated Community to understand the minimum Representation Information that must be maintained.

Includes technical information that applies to any file type, such as information about the software and hardware on which the digital object can be rendered or executed, or checksums and digital signatures to ensure fixity and authenticity. It also includes content type-specific technical information, such as image width for an image or elapsed time for an audio file.

Administrative metadata

The unit of exchange between an OAIS and its surrounding the environment is an Information Package.

An Information Package is a conceptual container of two types of information:

- Content Information and

- Preservation Description Information (PDI).

The resulting package is viewed as being discoverable by virtue of the Descriptive Information

Includes provenance information of who has cared for the digital object and what preservation actions have been performed on it, as well as rights and permission information that specifies, for example, access to the digital object, including which preservation actions are permissible.

Information Package variants:

- Submission Information Package (SIP)

- Archival Information Package (AIP)

- Dissemination Information Package (DIP)

Packages will need to vary depending upon their role, for example:

Imaging and e-journal projects often differentiate between their well-managed (and described) "master" files and the derived versions (thumbnails, JPEG files, PDFs) made available through the Web

http://easydigitalpreservation.wordpress.com/2009/10/30/premis-for-preservation-metadata/

http://www.loc.gov/standards/premis/FE_Dappert_Enders_MetadataStds_isqv22no2.pdf

http://www.dlib.org/dlib/september10/vermaaten/09vermaaten.html

So, which approach should be taken?

A combination of all!

Let's learn from DigiMan!

OAIS defines the preservation model and components

PREMIS defines the metadata elements that OAIS requires

METS is used to hold the elements defined by PREMIS

In a standard system:

An AIP as defined by OAIS holds the metadata elements defined by PREMIS using METS container.

Learn more about creating dynamic, engaging presentations with Prezi