Montana State University Digital Preservation Procedures
Last updated by Sara Mannheimer on 21 October 2021
Table of Contents
1. Overview of digital collections
2. Content scope for digital collections
3. Preservation masters and access copies
4. Preservation masters procedure
8. Sustainability and financial planning
Appendix A. Identification of content
Appendix B. Born-digital accession record template
Appendix C. Certificate of gift
Appendix D. Digital preservation levels - decision flowchart (draft)
Appendix E. National Digital Stewardship Alliance levels of digital preservation
Appendix F. OAIS reference model definitions
Appendix H. Collection-level JSON readme file template
1. Overview of digital collections
The digital collections at MSU Library (henceforth “The Library”) currently comprise both born-digital and digitized archival materials, scholarly publications in ScholarWorks, and audio files made available in the Acoustic Atlas. Please see ScholarWorks Preservation and Migration Policy for detailed information about ScholarWorks content. Please see the Montana State University Library Digital Preservation Policy for general policy guidelines. For detailed content profiles, please see Appendix A.
2. Content scope for digital collections
The Library’s content scope is outlined in the MSU Library Collection Development Policy. http://www.lib.montana.edu/collections/cdpolicy.html
2.1 Types of digital content
The procedures in this document apply to digitized content and born-digital content. Born-digital content includes content from ScholarWorks and Acoustic Atlas (text files and sound files), special born-digital projects such as Angling Oral Histories, and born digital archival acquisitions.
3. Preservation masters and access copies
For all digitized materials in the Library’s collection, the high-quality digital file will be considered the preservation master, and a lower-quality access copy will be created for online access via the Library website. This policy will be reevaluated as the collection grows.
Especially for at-risk analog materials, the high-quality digitized file will be considered the master. For example, for VHS tapes, the digitized AVI file is considered the master.
Some materials may not be subject to full digital preservation treatment. For born-digital materials, we strongly suggest that donors/creators provide materials in a supported format (see Table 1). If materials are donated in other formats, we cannot guarantee full digital preservation treatment—only bit-level preservation. The Library also reserves the right to migrate formats if necessary. A format review and migration for supported formats is conducted every five years, in years ending in 5 or 0. Format review and migration history are listed in Appendix G.
Table 1. Supported formats for digital preservation
Non-proprietary, openly documented formats are preferred. For a full list of preferred formats, please see Library of Congress preferred digital formats. https://www.loc.gov/preservation/resources/rfs/.
Type of content |
Filetype |
Notes |
Structured data |
XML, JSON, CSV |
|
Moving images |
MOV, MPEG, AVI, MP4 |
|
Sounds |
WAV, FLAC, AIFF, MP3 |
|
Still images |
TIFF, JPEG 2000, PDF, PNG |
|
Tabular data |
CSV, TSV |
|
Text |
XML, PDF/A, HTML |
|
Web archive |
WARC |
|
Compressed/ archived formats |
TAR, GZIP, ZIP |
Files should only be compressed and/or archived when it is necessary due to large file size or the need to gather files together in a particular directory structure in order for them to be understood. |
Common proprietary formats |
We can also currently support MS Office formats such as XSL, DOC, and PPX |
This policy may be re-evaluated in the future |
Please see Appendix D for more information on levels of digital preservation at MSU Library.
4. Preservation masters procedure
Master storage. When a collection is fully scanned and "complete," the collection is set to read-only and becomes a master collection. Masters are stored in the “DP12” and “DP13” folders in the local “DP” drive. Folders DP10 and DP11 have been allocated for future masters storage.
Inventories. An inventory of masters folders is stored alongside the masters, in the root directory of the DP13 folder. This inventory is updated annually in time for the October quarterly Digital Preservation Group meeting. Inventory update history is listed in Appendix G.
4.1 Masters procedure tasks by role
Archivist/Curator tasks
Accession record. Archivist or curator creates an accession record for all collections, including digital collections. Includes either a unique collection identifier, which is either an accession number for University records or a collection number for manuscript collection records. For accession record templates, please see Appendix B.
Digital Operations Manager tasks
Collection-level readme. See 4.2 Technical metadata.
App Developer tasks
Database backup. See 4.2 Technical metadata.
Systems Administrator tasks
Compression. Systems Administrators individually compress files in DP12 and DP13. When supplying high-quality TIFFs to patrons, library employees either unzip the file or send the compressed file to patrons.
Read only permissions. Systems Administrator moves masters from “changes” folder to DP12 and DP13 and sets them to read only.
Long term storage. Systems Administrators copy files from DP12 and DP13 to TACC cloud storage (see 5.2 TACC cloud backup).
Checksums. Fixity of metadata and files is checked using checksums, which are created with MD5 Deep.
4.2. Technical metadata
Database backup
A backup of the digital collections SQL database, which contains descriptive metadata, is stored with the masters, in a folder named “database-backup.” There will be one database backup file for each collection. Objects in database correspond with unique URL of materials.
A backup of the special collections accessions database (CSV exported from ProCite) is stored with the masters, in a folder named “masters-special-collections-procite-backup.”
Collection-level readme files
Digital Operations Manager creates a collection-level JSON readme for each digital collection in DP12 and DP13. The readme includes the title of the collection, date accessioned, date digitized, date moved to DP12/DP13, scope and content note, whether there are sensitive or restricted files in the collection, and any other special information. For a complete readme template, see Appendix H.
4.3. File naming conventions [last updated August 2018]
***These naming conventions apply to future collections. All legacy collections will keep their existing naming schema.
MSS Collections:
0000-0000-000-00000 [Col#-Box-Folder-Item]
0000-0000-000-img00000 [Col#-Box-Folder-Image]
Accessions:
00000-0000-000-00000 [Accn#-Box-Folder-Item]
00000-0000-000-img00000 [Accn#-Box-Folder-Image]
Born Digital:
0000-0000-0000-00000 [Col#-Series-Folder-Item]
Trout Art:
0000-0000-0000-00000 [Col#-Series-Folder-Item]
Trout Oral: [use lower-case only]
smith-bob-2018-05-10 [Last name-First name-Date of Interview (year-month-day)]
apple-jane-bert-2018-06-04 [If there are 2 people in it (as a couple), it is Last name-2 First names-Date of Interview (year-month-day)]
ford-betty-reagan-nancy-2018-08-17 [If 2 or more people with different last names for each Last name-First name-Last name-First name-Date of Interview (year-month-day)]
***Future oral programs: assign a collection number and use born digital naming scheme
4.4. Sensitive or nonpublic data
Most digital materials in the Library's collections are freely available, with some exceptions:
- Electronic theses and dissertations may be embargoed for a limited time.
- Some materials in Special Collections have restricted use policies. For example, the Aubrey Haynes Papers are restricted to one computer in the reading room.
5. Storage and backup
5.1. Tape backup
- All Library data are onsite, and all data are backed up monthly. Each month, one backup copy of the monthly data is sent offsite to a location 0.75 miles from MSU.
- Any changes to the monthly backup copies are backed up nightly (onsite), and every 2 weeks (offsite).
- Twice yearly (June and October), tape backups of all Library data are sent to an Iron Mountain storage vault in Cincinnati, OH.
5.2. TACC cloud backup of digital masters
- Yearly (February), a full copy of all digital masters in DP12 and DP13 is uploaded to the TACC cloud backup service.
- All files are compressed, using TAR to maintain permissions.
5.3. Disaster recovery
In the unlikely event of complete destruction of the onsite data (i.e. earthquake, fire, flood, human error in file handling), the recovery process is as follows:
- acquire internet enabled facility
- purchase and install common hardware (the hardware used at MSU Library is readily available)
- recover systems and data from offsite backup(s).
We estimate that disaster recovery procedure could take up to a week, barring any problems obtaining access to a properly networked facility. This scenario assumes MSU would recover their network domain and DNS, so that hostname changes would be unnecessary.
6. Roles and responsibilities
Permissions
DP drives are accessible to multiple employees in the following departments: Digital Library Initiatives (DLI); Cataloging, Access, & Technical Services (CATS); and Archives and Special Collections (ASC).
- DP12 and DP13 are the only storage locations for all preservation masters. Folders DP10 and DP11 are allocated for future masters storage.
- File permissions are set to “read only” at the point of transfer to DP12/DP13.
- ASC employees may access the “read only” preservation masters for patron reuse and reproduction. These requests generally occur once or twice a month.
Responsibilities
- The Data Librarian conducts annual reviews of digital preservation policies and procedures.
- The Digital Operations Manager provides staffing oversight.
- The Systems Administrators oversee storage and backup.
- For specific tasks in the preservation masters procedure, please see 1 Masters procedure tasks by role.
7. Staffing and training
This digital preservation procedures will be used in new staff training to support standardized digital preservation practices in the Library.
8. Sustainability and financial planning
While there are no dedicated funds to digital preservation at MSU Library, Library Administration has agreed to support digital preservation activities. With the exception of offsite backups, most of the tools used to implement digital preservation at MSU Library are free and open source.
Appendix A. Identification of content
Last updated: April 2018
Content type = Main area or content stream
Description = high level descriptive information about content type
Acquisition = how content types are created and/or acquired by the Library
Size = current total data size of content type
Complexity level = designation related to variety / range of file formats that are typically included in content type
Current management / storage = how / where digital objects within content type are currently stored and managed. Locally vs. vendor / hosted
Rights = how rights are captured / transferred to Library
Value = designation related to value and preservation commitment for content type
Priority = numerical assignment of priority for preservation activities (1=highest priority; 3=lower priority)
Content type |
Description |
Acquisition |
Size |
Complexity level |
Current management / storage |
Rights |
Value |
Priority |
Digitized content |
Digital content created through digitization of analog materials by Digital Production Unit |
Created internally by Library staff |
5 TB |
Medium – Primarily consists of image and document file formats |
Internal Library storage network, preservation masters procedure, backups both in the library and offsite. |
Licenses for reuse are obtained by MSU when possible. Users have responsibility to obtain permission for reuse beyond “fair use” |
Long-term |
1 |
Institutional repository content |
Open Access scholarly work (including publications and data) created primarily by MSU faculty and graduate students |
Created externally and acquired by Library staff or submitted through IR platform |
500 GB |
High – We have preferred formats, but we accept all file formats |
Internal Library storage network, backups both in the library and offsite, some preservation actions (provenance metadata and checksums) through DSpace. |
For OA copies of scholarly papers: Scholarly Communication Librarian assures that we have the right to repost.
For ETDs: students agree to SW posting when submitting thesis |
Long-term |
2 |
Acoustic Atlas |
Born-digital sound recordings acquired from researchers, primarily Jeff Rice |
Created externally and transferred to the library (Molly Arrandale is project manager) |
1 TB |
Medium – all sound files |
Internal Library storage network, backups backups both in the library and offsite |
Long-term |
1 |
|
Born-digital content in ASC
|
Born-digital materials acquired from individuals/ orgs. Archival collections, oral history collections, and special collections materials |
Created externally and acquired by ASC (or in the case of Angling Oral Histories, created by ASC) |
2.5 TB |
High – Wide range of file formats including high amount of legacy and obsolete formats |
For materials created externally: only storage, no management or preservation of any kind. For Angling Oral Histories: full preservation |
Transferred via Certificate of Gift agreement with Donor. (See Appendix D) Angling Oral History: created with agreement from interviewee |
Long-term |
1 |
Web Archives content (trial beginning in 2018) |
Web content created by University departments, units, and affiliated entities |
Will be created externally, acquired through automatic web harvesting via Archive-It service |
0 GB |
High – Wide range of file formats and complex relationships between digital objects |
External storage through vendor hosted solution (Archive-It) |
No explicit transfer of rights from creators? This will be re-evaluated following the creation of a TRAILS-wide web archiving ethical framework |
Long-term |
2 |
Licensed electronic resource (permanent acquisitions) |
Electronic resources purchased with perpetual access |
Purchased from vendors |
Low – vendors typically have a standard file format, however metadata are often lacking |
Disaster backup through Portico |
Managed individually through license agreement |
Near-term |
3 |
|
Licensed electronic resource (subscription) |
Electronic resources purchased on a subscription basis |
Purchased from vendors |
Low – most subscriptions are preserved through Portico |
Portico |
Managed individually through license agreement |
Near-term |
3 |
Appendix B. Born-digital accession record template
To be determined [in progress as of July 2019]
Appendix C. Certificate of gift
I/We_________________________________________of______________________________
(name) (city, state)
convey to the Montana State University Library the following:
This is an unrestricted gift that transfers to the Montana State University Library all legal title, copyright and literary property rights insofar as I/we hold them unless exceptions or restrictions are specifically noted below:
I/We agree that any materials described above that are determined to be inappropriate to the Special Collections or general library collection shall be disposed of by the library as it sees fit or the items be returned to me/us if l/we expressly state this below:
Signature ____________________________________________Date___________________
Signature ____________________________________________Date___________________
Title (Organizations or Businesses)_______________________________________________
Witness______________________________________________Date___________________
The gift described is gratefully accepted by Montana State University Library by
Signature ______________________________________________Date__________________
Printed Name____________________________________ Title _________________________
Appendix D. Digital preservation levels - decision flowchart (draft)
Is this a unique collection of “enduring value” that was not commercially mass produced?
Questions to help assess "enduring value":
-
- Does the collection clearly align with our collection development policy?
- Does the collection relate to existing collections?
- Does the collection have high research value?
- Does the research value justify the time, effort, and resources that would be used to process and preserve it?
- No —> do not archive content
- Yes —> go to next question
Do you have permission from the copyright holder (if necessary) to archive this digital collection?
- No —> do not archive content
- Yes —> go to next question
Are any objects in this collection available in another Trusted Digital Repository?
- Yes —> which repository? _____________________. If collection is in another Trusted Digital Repository, Do not archive content.
- No —> go to next question
Is this collection available in a Trusted Print Repository or is there a hard copy available that will be kept long-term?
- No —> full preservation
- Yes —> go to next question
Is the hard copy deteriorating or in poor condition or on a near-obsolete format or media?
- Yes —> full preservation
- No —> go to next question
If the content is born digital, is it in a supported file format?
- Yes —> discuss with ASC and DLI whether the content warrants preservation
- No —> bit-level preservation
Appendix E. National Digital Stewardship Alliance levels of digital preservation
Level 1 (Protect your data) |
Level 2 (Know your data) |
Level 3 (Monitor your data) |
Level 4 (Repair your data |
|
Storage and Geographic Location |
- Two complete copies that are not collocated - For data on heterogeneous media (optical discs, hard drives, etc.) get the content off of the medium and into your storage system
|
- At least three complete copies - At least one copy in a different geographic location. - Document your storage system(s) and storage media and what you need to use them |
- At least one copy in a geographic location with a different disaster threat - Obsolescence monitoring process for your storage system(s) and media |
- At least three copies in geographic locations with different disaster threats - Have a comprehensive plan in place that will keep files and metadata on currently accessible media or systems |
File Fixity and Data Integrity |
- Check file fixity on ingest if it has been provided with the content - Create fixity info if it wasn't provided with the content |
- Check fixity on all ingests - Use write-blockers when working with original media - Virus-check high risk content |
- Check fixity of content at fixed intervals - Maintain logs of fixity info; supply audit on demand - Ability to detect corrupt data - Virus-check all content |
- Check fixity of all content in response to specific events or activities - Ability to replace/repair corrupted data - Ensure no one person has write access to all copies |
Information Security |
- Identify who has read, write, move, and delete authorization to individual files - Restrict who has those authorizations to individual files |
- Document access restrictions for content |
- Maintain logs of who performed what actions on files, including deletions and preservation actions |
- Perform audit of logs |
Metadata |
- Inventory of content and its storage location - Ensure backup and non-collocation of inventory |
- Store administrative metadata - Store transformative metadata and log events |
- Store standard technical and descriptive metadata |
- Store standard preservation metadata |
File Formats |
- When you can give input into the creation of digital files, encourage use of a limited set of known open formats and codecs |
- Inventory of file formats in use |
- Monitor file format obsolescence issues |
- Perform format migrations, emulation and similar activities as needed |
Appendix F. OAIS reference model definitions
These procedures follow the best-practices defined by the OAIS Reference Model.[1]
This model defines three information packages: Submission Information Package (SIP), Dissemination Information Package (DIP), and Archival Information Package (AIP)
SIP, DIP, and AIP contents at Montana State University
SIP – physical object to be scanned, or in rare cases a born-digital object.
DIP – PDF, JPG, or other derivative file format, as well as descriptive metadata [at MSU Library, these metadata are documented in a SQL database].
AIP – TIFF or other master file format, collection-level readme, and SQL database snapshot.
Appendix G. Review history
Format review and migration (Review is conducted every 5 years, in years ending with 5 or 0)
Name and title of responsible party |
Review date |
Resulting format migration actions |
Notes |
|
2020 |
|
|
|
|
|
|
|
|
|
|
Masters inventory (Review is conducted annually, during the summer)
Name and title of responsible party |
Review date |
Notes |
Brandon Watson, Digital Operations Manager |
Summer 2018 |
Brandon inventoried the masters and relocated any masters that were not in the designated masters folders. |
|
|
|
|
|
|
Appendix H. Collection-level JSON readme file template
{ "default_locale": "en", "lang": "en-US", "dir": "ltr", "name": "ADD-COLLECTION-TITLE", "description": "ADD-BRIEF-DESCRIPTION", "categories": ["books", "education"], "owner": { "name": "ADD-COLLECTION-OWNER-HERE", "contact": "", }, "display": "standalone", "orientation": "any", "start_url": "./index.html", "scope": "/~jason/", "filetypes": "", "date_digitized": ""}
[1] Consultative Committee for Space Data Systems (CCSDS). (June 2012). Reference Model for an Open Archival Information System (OAIS): Magenta Book. https://public.ccsds.org/pubs/650x0m2.pdf