Last updated by Sara Mannheimer, Brandon Watson, and Natalie Bond on 30 July 2019

Table of Contents

1. Overview of digital collections      

2. Content scope for digital collections

3. Preservation masters and access copies

4. Preservation masters procedure

5. Storage and backup

6. Roles and responsibilities

7. Staffing and training

8. Sustainability and financial planning 

Appendix A. Identification of content

Appendix B. Born-digital accession record template

Appendix C. Certificate of gift

Appendix D. Digital preservation levels - decision flowchart (draft)

Appendix E. National Digital Stewardship Alliance levels of digital preservation

Appendix F. OAIS reference model definitions

Appendix G. Review history

Appendix H. Collection-level JSON readme file template

1. Overview of digital collections

The digital collections at MSU Library (henceforth “The Library”) currently comprise both born-digital and digitized archival materials, scholarly publications in ScholarWorks, and audio files made available in the Acoustic Atlas. Please see ScholarWorks Preservation and Migration Policy for detailed information about ScholarWorks content. Please see the Montana State University Library Digital Preservation Policy for general policy guidelines. For detailed content profiles, please see Appendix A.

2. Content scope for digital collections

The Library’s content scope is outlined in the MSU Library Collection Development Policy. http://www.lib.montana.edu/collections/cdpolicy.html 

2.1 Types of digital content

The procedures in this document apply to digitized content and born-digital content. Born-digital content includes content from ScholarWorks and Acoustic Atlas (text files and sound files), special born-digital projects such as Angling Oral Histories, and born digital archival acquisitions.

3. Preservation masters and access copies

For all digitized materials in the Library’s collection, the high-quality digital file will be considered the preservation master, and a lower-quality access copy will be created for online access via the Library website. This policy will be reevaluated as the collection grows.

Especially for at-risk analog materials, the high-quality digitized file will be considered the master. For example, for VHS tapes, the digitized AVI file is considered the master.

Some materials may not be subject to full digital preservation treatment. For born-digital materials, we strongly suggest that donors/creators provide materials in a supported format (see Table 1). If materials are donated in other formats, we cannot guarantee full digital preservation treatment—only bit-level preservation. The Library also reserves the right to migrate formats if necessary. A format review and migration for supported formats is conducted every five years, in years ending in 5 or 0. Format review and migration history are listed in Appendix G.

 

Table 1. Supported formats for digital preservation

Non-proprietary, openly documented formats are preferred. For a full list of preferred formats, please see Library of Congress preferred digital formats. https://www.loc.gov/preservation/resources/rfs/

Type of content

Filetype

Notes

Structured data

XML, JSON, CSV

 

Moving images

MOV, MPEG, AVI, MP4

 

Sounds

WAV, FLAC, AIFF, MP3

 

Still images

TIFF, JPEG 2000, PDF, PNG

 

Tabular data

CSV, TSV

 

Text

XML, PDF/A, HTML

 

Web archive

WARC

 

Compressed/

archived formats

TAR, GZIP, ZIP

Files should only be compressed and/or archived when it is necessary due to large file size or the need to gather files together in a particular directory structure in order for them to be understood.

Common proprietary

formats

We can also currently support MS Office formats such as XSL, DOC, and PPX

This policy may be re-evaluated in the future

 Please see Appendix D for more information on levels of digital preservation at MSU Library.

4. Preservation masters procedure

Master storage. When a collection is fully scanned and "complete," the collection is set to read-only and becomes a master collection. Masters are stored in the “DP12” and “DP13” folders in the local “DP” drive. Folders DP10 and DP11 have been allocated for future masters storage.

Inventories. An inventory of masters folders is stored alongside the masters, in the root directory of the DP13 folder. This inventory is updated annually in time for the October quarterly Digital Preservation Group meeting. Inventory update history is listed in Appendix G. 

4.1 Masters procedure tasks by role

Archivist/Curator tasks

Accession record. Archivist or curator creates an accession record for all collections, including digital collections. Includes either a unique collection identifier, which is either an accession number for University records or a collection number for manuscript collection records. For accession record templates, please see Appendix B.

 

Digital Operations Manager tasks

Collection-level readme. See 4.2 Technical metadata.

 

App Developer tasks

Database backup. See 4.2 Technical metadata.

 

Systems Administrator tasks

Compression. Systems Administrators individually compress files in DP12 and DP13. When supplying high-quality TIFFs to patrons, library employees either unzip the file or send the compressed file to patrons.

Read only permissions. Systems Administrator moves masters from “changes” folder to DP12 and DP13 and sets them to read only.

Long term storage. Systems Administrators copy files from DP12 and DP13 to TACC cloud storage (see 5.2 TACC cloud backup).

Checksums. Fixity of metadata and files is checked using checksums, which are created with MD5 Deep.

 

4.2. Technical metadata

Database backup

A backup of the digital collections SQL database, which contains descriptive metadata, is stored with the masters, in a folder named “database-backup.” There will be one database backup file for each collection. Objects in database correspond with unique URL of materials.

A backup of the special collections accessions database (CSV exported from ProCite) is stored with the masters, in a folder named “masters-special-collections-procite-backup.”

 

Collection-level readme files

Digital Operations Manager creates a collection-level JSON readme for each digital collection in DP12 and DP13. The readme includes the title of the collection, date accessioned, date digitized, date moved to DP12/DP13, scope and content note, whether there are sensitive or restricted files in the collection, and any other special information. For a complete readme template, see Appendix H.

 

4.3. File naming conventions [last updated August 2018]

***These naming conventions apply to future collections. All legacy collections will keep their existing naming schema.

 

MSS Collections:

0000-0000-000-00000 [Col#-Box-Folder-Item]

0000-0000-000-img00000 [Col#-Box-Folder-Image]

 

Accessions:

00000-0000-000-00000 [Accn#-Box-Folder-Item]

00000-0000-000-img00000 [Accn#-Box-Folder-Image]

 

Born Digital:

0000-0000-0000-00000 [Col#-Series-Folder-Item]

 

Trout Art:

0000-0000-0000-00000 [Col#-Series-Folder-Item]

 

Trout Oral: [use lower-case only]

smith-bob-2018-05-10 [Last name-First name-Date of Interview (year-month-day)]

 

apple-jane-bert-2018-06-04 [If there are 2 people in it (as a couple), it is Last name-2 First names-Date of Interview (year-month-day)]

 

ford-betty-reagan-nancy-2018-08-17 [If 2 or more people with different last names for each Last name-First name-Last name-First name-Date of Interview (year-month-day)]

 

***Future oral programs: assign a collection number and use born digital naming scheme

4.4. Sensitive or nonpublic data

Most digital materials in the Library's collections are freely available, with some exceptions:

  • Electronic theses and dissertations may be embargoed for a limited time.
  • Some materials in Special Collections have restricted use policies. For example, the Aubrey Haynes Papers are restricted to one computer in the reading room.

5. Storage and backup

5.1. Tape backup

  • All Library data are onsite, and all data are backed up monthly. Each month, one backup copy of the monthly data is sent offsite to a location 0.75 miles from MSU.
  • Any changes to the monthly backup copies are backed up nightly (onsite), and every 2 weeks (offsite).
  • Twice yearly (June and October), tape backups of all Library data are sent to an Iron Mountain storage vault in Cincinnati, OH.

5.2. TACC cloud backup of digital masters

  • Yearly (February), a full copy of all digital masters in DP12 and DP13 is uploaded to the TACC cloud backup service.
  • All files are compressed, using TAR to maintain permissions.

5.3. Disaster recovery

In the unlikely event of complete destruction of the onsite data (i.e. earthquake, fire, flood, human error in file handling), the recovery process is as follows:

  • acquire internet enabled facility
  • purchase and install common hardware (the hardware used at MSU Library is readily available)
  • recover systems and data from offsite backup(s).

We estimate that disaster recovery procedure could take up to a week, barring any problems obtaining access to a properly networked facility. This scenario assumes MSU would recover their network domain and DNS, so that hostname changes would be unnecessary.

 

6. Roles and responsibilities

Permissions

DP drives are accessible to multiple employees in the following departments: Digital Library Initiatives (DLI); Cataloging, Access, & Technical Services (CATS); and Special Collections & Archival Informatics (SCAI).

  • DP12 and DP13 are the only storage locations for all preservation masters. Folders DP10 and DP11 are allocated for future masters storage.
  • File permissions are set to “read only” at the point of transfer to DP12/DP13.
  • SCAI employees may access the “read only” preservation masters for patron reuse and reproduction. These requests generally occur once or twice a month.

 

Responsibilities

  • The MSU Library Digital Preservation Group oversees digital preservation in the Library. The Digital Preservation Group meets quarterly in January, April, July, and October to review current local practices and evolving digital preservation best practices.
  • The MSU Library Digital Preservation Working Group develops and implements concrete actions toward digital preservation goals. The Digital Preservation Working Group meets every other week.
  • The Digital Operations Manager provides staffing oversight.
  • The Systems Administrators oversee storage and backup.
  • For specific tasks in the preservation masters procedure, please see 1 Masters procedure tasks by role.

7. Staffing and training

This digital preservation procedures will be used in new staff training to support standardized digital preservation practices in the Library.

 

8. Sustainability and financial planning

While there are no dedicated funds to digital preservation at MSU Library, Library Administration has agreed to support digital preservation activities. With the exception of offsite backups, most of the tools used to implement digital preservation at MSU Library are free and open source.

  

Appendix A. Identification of content

Last updated: April 2018

 

Content type = Main area or content stream

Description = high level descriptive information about content type

Acquisition = how content types are created and/or acquired by the Library

Size = current total data size of content type

Complexity level = designation related to variety / range of file formats that are typically included in content type

Current management / storage = how / where digital objects within content type are currently stored and managed. Locally vs. vendor / hosted

Rights = how rights are captured / transferred to Library

Value = designation related to value and preservation commitment for content type

Priority = numerical assignment of priority for preservation activities (1=highest priority; 3=lower priority)

  

Content

type

Description

Acquisition

Size

Complexity level

Current management / storage

Rights

Value

Priority

Digitized content

Digital content created through digitization of analog materials by Digital Production Unit

Created internally by Library staff

5

TB

Medium – Primarily consists of image and document file formats

Internal Library storage network, preservation masters procedure, backups both in the library and offsite.

Licenses for reuse are obtained by MSU when possible. Users have responsibility to obtain permission for reuse beyond “fair use”

Long-term

1

Institutional repository content

Open Access scholarly work (including publications and data) created primarily by MSU faculty and graduate students

Created externally and acquired by Library staff or submitted through IR platform

500 GB

High – We have preferred formats, but we accept all file formats

Internal Library storage network, backups both in the library and offsite, some preservation actions (provenance metadata and checksums) through DSpace.

For OA copies of scholarly papers: Scholarly Communication Librarian assures that we have the right to repost.

 

For ETDs: students agree to SW posting when submitting thesis

Long-term

2

Acoustic Atlas

Born-digital sound recordings acquired from researchers, primarily Jeff Rice

Created externally and transferred to the library (Molly Arrandale is project manager)

1 TB

Medium – all sound files

Internal Library storage network, backups backups both in the library and offsite

 

Long-term

1

Born-digital content in SCAI

 

Born-digital materials acquired from individuals/ orgs. Archival collections, oral history collections, and special collections materials

Created externally and acquired by SCAI (or in the case of Angling Oral Histories, created by SCAI)

2.5 TB

High – Wide range of file formats including high amount of legacy and obsolete formats

For materials created externally: only storage, no management or preservation of any kind.

For Angling Oral Histories: full preservation

Transferred via Certificate of Gift agreement with Donor. (See Appendix D)

Angling Oral History: created with agreement from interviewee

Long-term

1

Web Archives content

(trial beginning in 2018)

Web content created by University departments, units, and affiliated entities

Will be created externally, acquired through automatic web harvesting via Archive-It service

0 GB

High – Wide range of file formats and complex relationships between digital objects

External storage through vendor hosted solution (Archive-It)

No explicit transfer of rights from creators?

This will be re-evaluated following the creation of a TRAILS-wide web archiving ethical framework

Long-term

2

Licensed electronic resource (permanent acquisitions)

Electronic resources purchased with perpetual access

Purchased from vendors

 

Low – vendors typically have a standard file format, however metadata are often lacking

Disaster backup through Portico

Managed individually through license agreement

Near-term

3

Licensed electronic resource (subscription)

Electronic resources purchased on a subscription basis

Purchased from vendors

 

Low – most subscriptions are preserved through Portico

Portico

Managed individually through license agreement

Near-term

3

 

Appendix B. Born-digital accession record template

To be determined [in progress as of July 2019]

Appendix C. Certificate of gift

 

I/We_________________________________________of______________________________

                                            (name)                                                                   (city, state)

 

convey to the Montana State University Library the following:

 

 

This is an unrestricted gift that transfers to the Montana State University Library all legal title, copyright and literary property rights insofar as I/we hold them unless exceptions or restrictions are specifically noted below:

 

I/We agree that any materials described above that are determined to be inappropriate to the Special Collections or general library collection shall be disposed of by the library as it sees fit or the items be returned to me/us if l/we expressly state this below:

 

 

Signature ____________________________________________Date___________________

 

 

Signature ____________________________________________Date___________________

 

 

Title (Organizations or Businesses)_______________________________________________­­­

 

 

Witness______________________________________________Date___________________

 

 

 

The gift described is gratefully accepted by Montana State University Library by

 

 

Signature ______________________________________________Date__________________

 

 

Printed Name____________________________________ Title _________________________

 

Appendix D. Digital preservation levels - decision flowchart (draft)

Is this a unique collection of “enduring value” that was not commercially mass produced?

  • No —> do not archive content
  • Yes —> go to next question

Do you have permission from the copyright holder (if necessary) to archive this digital collection?

  • No —> do not archive content
  • Yes —> go to next question

Are any objects in this collection available in another Trusted Digital Repository?

  • Yes —> which repository? _____________________. If collection is in another Trusted Digital Repository, Do not archive content.
  • No —> go to next question

Is this collection available in a Trusted Print Repository or is there a hard copy available that will be kept long-term?

  • No —> full preservation
  • Yes —> go to next question

Is the hard copy deteriorating or in poor condition or on a near-obsolete format or media?

  • Yes —> full preservation
  • No —> go to next question

If the content is born digital, is it in a supported file format?

  • Yes —> discuss with SCAI and DLI whether the content warrants preservation
  • No —> bit-level preservation

 

 Appendix E. National Digital Stewardship Alliance levels of digital preservation

 

Level 1 (Protect your data)

Level 2 (Know your data)

Level 3 (Monitor your data)

Level 4 (Repair your data

Storage and Geographic Location

- Two complete copies that are not collocated

- For data on

heterogeneous media (optical discs, hard drives, etc.) get the content off of the medium and into your storage system

 

- At least three complete copies

- At least one copy in a different geographic location.

- Document your storage system(s) and storage media and what you need to use them

- At least one copy in a geographic location with a different disaster threat

- Obsolescence monitoring process for your storage system(s) and media

- At least three copies in geographic locations with different disaster threats

- Have a comprehensive plan in place that will keep files and metadata on currently accessible media or systems

File Fixity and Data Integrity

- Check file fixity on ingest if it has been provided with the content

- Create fixity info if it wasn't provided with the content

- Check fixity on all ingests

- Use write-blockers when working with original media

- Virus-check high risk content

- Check fixity of content at fixed intervals

- Maintain logs of fixity info; supply audit on demand

- Ability to detect corrupt data

- Virus-check all content

- Check fixity of all content in response to specific events or activities

- Ability to replace/repair corrupted data

- Ensure no one person has write access to all copies

Information Security

- Identify who has read, write, move, and delete authorization to individual files

- Restrict who has those authorizations to individual files

- Document access restrictions for content

- Maintain logs of who performed what actions on files, including deletions and preservation actions

- Perform audit of logs

Metadata

- Inventory of content and its storage location

- Ensure backup and non-collocation of inventory

- Store administrative metadata

- Store transformative metadata and log events

- Store standard technical and descriptive metadata

- Store standard preservation metadata

File Formats

- When you can give input into the creation of digital files, encourage use of a limited set of known open formats and codecs

- Inventory of file formats in use

- Monitor file format obsolescence issues

- Perform format migrations, emulation and similar activities as needed

 

Appendix F. OAIS reference model definitions

These procedures follow the best-practices defined by the OAIS Reference Model.[1]

This model defines three information packages: Submission Information Package (SIP), Dissemination Information Package (DIP), and Archival Information Package (AIP)

 

OAIS Reference Model

SIP, DIP, and AIP contents at Montana State University

SIP – physical object to be scanned, or in rare cases a born-digital object.

 

DIP – PDF, JPG, or other derivative file format, as well as descriptive metadata [at MSU Library, these metadata are documented in a SQL database].

 

AIP – TIFF or other master file format, collection-level readme, and SQL database snapshot.

 

 

Appendix G. Review history

Format review and migration (Review is conducted every 5 years, in years ending with 5 or 0)

Name and title of responsible party

Review date

Resulting format migration actions

Notes

 

2020

 

 

 

 

 

 

 

 

 

 

 

Masters inventory (Review is conducted annually, in time for the October quarterly Digital Preservation Group meeting)

Name and title of responsible party

Review date

Notes

Brandon Watson, Digital Operations Manager

Summer 2018

Inventoried masters and relocated any masters that were not in the designated masters folders.

 

 

 

 

 

 

 

 Appendix H. Collection-level JSON readme file template

 

{  "default_locale": "en",  "lang": "en-US",  "dir": "ltr",  "name": "ADD-COLLECTION-TITLE",  "description": "ADD-BRIEF-DESCRIPTION",  "categories": ["books", "education"],  "owner": {    "name": "ADD-COLLECTION-OWNER-HERE",    "contact": "",  },  "display": "standalone",  "orientation": "any",  "start_url": "./index.html",  "scope": "/~jason/",  "filetypes": "",  "date_digitized": ""}

 

[1] Consultative Committee for Space Data Systems (CCSDS). (June 2012). Reference Model for an Open Archival Information System (OAIS): Magenta Book. https://public.ccsds.org/pubs/650x0m2.pdf