Created by Shannon Smith in Spring 2019, with guidance from Jason Clark and Sara Mannheimer. 
Revised by Sara Mannheimer, Jodi Allison-Bunnell, and Brandon Watson in Fall 2020.

Last updated by Sara Mannheimer on December 18, 2020

Table of Contents

Overview of web archiving

Scope of the MSU web archives

Prioritizing content for web archiving

Web archiving procedures

Takedown policy

Staff and training

Acknowledgements

Appendix A: Web archiving request and opt-out opportunity

Appendix B: Web archiving opt-out form

Appendix C: Designing websites for archivability

Overview of web archiving    

The purpose of web archiving initiative at Montana State University (MSU) Library is to capture, preserve, and provide access to websites documenting the history and culture of MSU and other entities related to the Library’s collecting areas. The Library aims to collect and preserve selected web content at a particular point in time (snapshots) or over a period of time (e.g., daily, monthly, quarterly, etc.). MSU Library's web archive collections are hosted and stored at the Internet Archive data centers. Content archived as part of MSU Library collections supports the Library’s strategic plan objective 2.3: Expand, diversify, and adapt our collections and services  

Please see the MSU Library Collection Development Policy, MSU Archives and Special Collections Collection Development Policy, MSU Archives and Special Collections scope and Archive-It’s Storage and Preservation Policy for related policy guidelines.

[A note on the relationship of the Archives and Special Collections collection development policy and the Archives and Special Collections scope: The former is a 2010 document that has not been comprehensively updated. The latter is a set of categories developed for the 2019 website redesign that complement, but do not change, the collection development policy.]

Scope of the MSU web archives

MSU Library’s web archiving program especially focuses on  

  • Documenting the administrative functions of MSU;  
  • Documenting activities of the MSU community; 
  • MSU Library also captures some websites as a complement to or component of manuscript collections held in MSU Archives and Special Collections. 

As of 2020, MSU ASC web archive scope includes websites relating to: 

  • MSU history, culture, and operations 
  • MSU student life 
  • Extraordinary events at MSU and in Montana  
  • Created by or for individuals or organizations that are the creators or collectors of manuscript collections and that compliments the main collection 

For content identified outside the MSU domain (montana.edu), the Library will contact the content creators to provide an opportunity for them to opt out of inclusion in the MSU Library web archives.  

At the request of content creators, and depending oncapacity,the Library may also be able to support self-documentation and storage of web content.  

A list of all seed URLS is available via the Montana State University Web Archive.  

Prioritizing content for web archiving

The MSU Library selects web sites for its permanent collections that rank high on the following list of criteria:

  • Informational, administrative, or artifactual value. Does the website address a gap or missing perspective? Does it overlap with websites that other institutions in Montana are collecting? How unique is the information provided by the website?
  • Likely use or need. How relevant is the website to library collection development initiatives, research, and teaching? Does the website complement an existing collection? Does it have high likely use?
  • Risk. Is the website at a high level of risk of loss? Priority is given to older or temporary websites that might be shut down soon, websites dependent on obsolete technology, websites that change often.
  • Rights. Priority is given to Montana State University content and content that can be made available through fair use.

Web archiving procedures

Tools

The Library maintains an organization subscription to Archive-It. Archive-It serves as the primary tool through which the library will capture web content. Conifer is used to capture static web content such as digital scholarship projects or web-based theses and dissertations.

Duplication with the Internet Archive

Subscribing to Archive-It provides added value to our digital collections through high quality capture, metadata, searchability for website collections. Please see below for more detail on how Archive-It differs from the general Internet Archive. (Adapted from Archive-It documentation.)

  •  Curation, scoping, and management.
    • Archive-It allows us to:
      • Create focused and topical collections
      • Control how deep and how often a site is crawled
      • Exclude content from being crawled
      • Surpass robots exclusions
      • Catalog our content with metadata at the collection, seed, and document level.
  • Institutional attribution
    • Archive-It collections attribute archived web pages to a specific collection and the organization that captured it.
  • Full text search
    • Archive-It collections are full-text searchable, with advanced search options. Full text search is not available for general Internet Archive collections.
  • Social media capture
    • Archive-It uses a crawler add-on called Umbra to capture social media sites such as Flickr, Twitter, Instagram, Vimeo and Facebook. Umbra is not used in the general Internet Archive collections, and therefore many social media sites are not represented therein.
  • Technical support
    • Archive-It provides extensive training documentation, user forums, and technical support throughout the process to help with scoping and other issues.
  • Data access
    • Archive-It partners may retrieve a back-up copy of their data at any time, which is not available for content collected as part of the general Internet Archive.
    • By default, websites that are captured with Archive-It and made public in our Archive-It collection also appear in the general Internet Archive within 24 hours. However, Archive-It supports trial and training content as well as restricted content, and such content remains inaccessible in the general Internet Archive. 

Duplication within our MSU Archive-It collection

ArchiveIT does not archive duplicate content and only captures new data if it is not represented elsewhere in our collection.

  • Archive-It does not capture the same page twice for the same account.
  • If we ask Archive-It to crawl a website that is already in our collections as a sub-domain, the new crawl acts as an access point that gives users more information on a specific page and allows users to directly access that page. However, the particular "document" (i.e. the page) is only archived once, so it doesn't double-dip into our data budget.

Permissions

Web archiving falls under the purvue of the Archives and Special Collections (ASC) department. Permissions to access the Archive-It manage interface are currently limited to the Data Librarian and Head of ASC. In the future, these permissions may be extended to ASC staff and students. 

Roles and responsibilities

Routine: As of 2020, we crawl sites every 30 days. After assessing rates of changes, some may be able to crawled less often, some more.

Staffing: 

Data Librarian

  • Initial set-up
  • Set up new URLs
  • Test crawls
  • Provide quality assurance and metadata workflows for Archives Technician to assign to student employees 

Archives Technicion

  • Direct student employees' web archiving work

Digital Operations Manager

  • Email archiving

Head of Archives and Special Collections

  • Re-assess collecting quarterly
  • Update and maintain finding aids with archived web content 

Metadata

The Data Librarian oversees metadata. Student employees create bulk metadata. In the future, the CATS department may advise on metadata practice.

Current metadata fields used in the MSU web archives are:

  • Title
  • Creator
  • Collector (Montana State University Library)

Access

Preservation

 

Takedown policy 

Content within official Montana State University websites is predominantly considered public record. 

For websites captured outside of the university domain, the Library will provide an opt-out opportunity to organizations or individuals whose websites are selected for archiving. The Library will actively work to ensure compliance with copyright laws. 

The Library acknowledges that organizations and individuals as content creators of websites have agency over their born-digital content. If you believe the Library may have harvested your web content in error, or that maintaining your content in our archive does not adequately reflect your organization please contact Archives and Special Collections. Content related to your organization can either be removed from our collection entirely, or treated as sensitive or nonpublic data per Montana State University Library Digital Preservation Procedures section 4.4.

 

Staff training

This web archiving procedure document will be used in new staff training to support web archiving practices in the Library. Current local practices will be reviewed periodically to access alignment with current best practices in the field.

 

Acknowledgements

Parts of this document are adapted from Web Archiving policies and procedures at Columbia UniversityNorth Carolina State University, George Washington University, and Library of Congress.

Our policies are also informed by the following article: Christensen, M. K, & Maches, T. (2020). Web archiving: Policy and practice. Journal of Digital Media Management, 8(3), 201-214. Retrieved from https://escholarship.org/uc/item/3wc5t8nm 

 

Appendix A: Web archiving request and opt-out opportunity

Hello [Insert Contact Name], 

As part of our mission to collect and preserve the history of Montana and the immediate geographical region, Montana State University Library archives a selection of websites. I am reaching out to seek your permission to archive the website [Insert website name and URL] for inclusion in our web archive collections.  

Archiving your website will include an initial capture as well as ongoing quarterly or semi-annual captures of the site. Website captures are completed using the Internet Archive’s Heritrix web crawler, and generally last for a few days. Once a crawl is complete, the crawler no longer interacts with your server. We capture websites at a slow rate so as not to interfere with access to your website. 

You may request that we stop archiving or take down your archived website at any time. 

If you do not wish to include your website in the collections at Montana State University Library, please opt out using this form. Thank you in advance for your consideration.

 

Regards,

[Name & contact]

 

Appendix B: Web archiving opt-out form

To exclude your website / your organization's website from the web archiving collections at Montana State University Library, please complete thie form below. If you represent an organization, please enter the name of the organization.

 

Appendix C: Designing websites for archivability

https://www.loc.gov/programs/web-archiving/for-site-owners/creating-preservable-websites/

https://library.columbia.edu/bts/web_resources_collection/guidelines_for_preservable_websites.html