Web Archiving Policies and Procedures
Created by Shannon Smith in Spring 2019, with guidance from Jason Clark and Sara Mannheimer.
Revised by Sara Mannheimer, Jodi Allison-Bunnell, and Brandon Watson in Fall 2020.
Last updated by Sara Mannheimer on December 18, 2020
Table of Contents
The purpose of web archiving initiative at Montana State University (MSU) Library is to capture, preserve, and provide access to websites documenting the history and culture of MSU and other entities related to the Library’s collecting areas. The Library aims to collect and preserve selected web content at a particular point in time (snapshots) or over a period of time (e.g., daily, monthly, quarterly, etc.). MSU Library's web archive collections are hosted and stored at the Internet Archive data centers. Content archived as part of MSU Library collections supports the Library’s strategic plan objective 2.3: Expand, diversify, and adapt our collections and services.
Please see the MSU Library Collection Development Policy, MSU Archives and Special Collections Collection Development Policy, MSU Archives and Special Collections scope and Archive-It’s Storage and Preservation Policy for related policy guidelines.
[A note on the relationship of the Archives and Special Collections collection development policy and the Archives and Special Collections scope: The former is a 2010 document that has not been comprehensively updated. The latter is a set of categories developed for the 2019 website redesign that complement, but do not change, the collection development policy.]
MSU Library’s web archiving program especially focuses on
- Documenting the administrative functions of MSU;
- Documenting activities of the MSU community;
- MSU Library also captures some websites as a complement to or component of manuscript collections held in MSU Archives and Special Collections.
As of 2020, MSU ASC web archive scope includes websites relating to:
- MSU history, culture, and operations
- MSU student life
- Extraordinary events at MSU and in Montana
- Created by or for individuals or organizations that are the creators or collectors of manuscript collections and that compliments the main collection
For content identified outside the MSU domain (montana.edu), the Library will contact the content creators to provide an opportunity for them to opt out of inclusion in the MSU Library web archives.
At the request of content creators, and depending oncapacity,the Library may also be able to support self-documentation and storage of web content.
A list of all seed URLs is available via the Montana State University Web Archive.
The MSU Library selects web sites for its permanent collections that rank high on the following list of criteria:
- Informational, administrative, or artifactual value. Does the website address a gap or missing perspective? Does it overlap with websites that other institutions in Montana are collecting? How unique is the information provided by the website?
- Likely use or need. How relevant is the website to library collection development initiatives, research, and teaching? Does the website complement an existing collection? Does it have high likely use?
- Risk. Is the website at a high level of risk of loss? Priority is given to older or temporary websites that might be shut down soon, websites dependent on obsolete technology, websites that change often.
- Rights. Priority is given to Montana State University content and content that can be made available through fair use.
The Library maintains an organization subscription to Archive-It. Archive-It serves as the primary tool through which the library will capture web content. Conifer is used to capture static web content such as digital scholarship projects or web-based theses and dissertations.
Duplication with the Internet Archive
Subscribing to Archive-It provides added value to our digital collections through high quality capture, metadata, searchability for website collections. Please see below for more detail on how Archive-It differs from the general Internet Archive. (Adapted from Archive-It documentation.)
- Curation, scoping, and management.
- Archive-It allows us to:
- Create focused and topical collections
- Control how deep and how often a site is crawled
- Exclude content from being crawled
- Surpass robots exclusions
- Catalog our content with metadata at the collection, seed, and document level.
- Archive-It allows us to:
- Institutional attribution
- Archive-It collections attribute archived web pages to a specific collection and the organization that captured it.
- Full text search
- Archive-It collections are full-text searchable, with advanced search options. Full text search is not available for general Internet Archive collections.
- Social media capture
- Archive-It uses a crawler add-on called Umbra to capture social media sites such as Flickr, Twitter, Instagram, Vimeo and Facebook. Umbra is not used in the general Internet Archive collections, and therefore many social media sites are not represented therein.
- Technical support
- Archive-It provides extensive training documentation, user forums, and technical support throughout the process to help with scoping and other issues.
- Data access
- Archive-It partners may retrieve a back-up copy of their data at any time, which is not available for content collected as part of the general Internet Archive.
- By default, websites that are captured with Archive-It and made public in our Archive-It collection also appear in the general Internet Archive within 24 hours. However, Archive-It supports trial and training content as well as restricted content, and such content remains inaccessible in the general Internet Archive.
Duplication within our MSU Archive-It collection
ArchiveIT does not archive duplicate content and only captures new data if it is not represented elsewhere in our collection.
- Archive-It does not capture the same page twice for the same account.
- If we ask Archive-It to crawl a website that is already in our collections as a sub-domain, the new crawl acts as an access point that gives users more information on a specific page and allows users to directly access that page. However, the particular "document" (i.e. the page) is only archived once, so it doesn't double-dip into our data budget.
Web archiving falls under the purvue of the Archives and Special Collections (ASC) department. Permissions to access the Archive-It manage interface are currently limited to the Data Librarian and Head of ASC. In the future, these permissions may be extended to ASC staff and students.
Roles and responsibilities
Routine: As of 2020, we crawl sites every 30 days. After assessing rates of changes, some may be able to crawled less often, some more.
- Initial set-up
- Set up new URLs
- Test crawls
- Provide quality assurance and metadata workflows for Archives Technician to assign to student employees
- Direct student employees' web archiving work
Digital Operations Manager
- Email archiving
Head of Archives and Special Collections
- Re-assess collecting quarterly
- Update and maintain finding aids with archived web content
The Data Librarian oversees metadata. Student employees create bulk metadata. In the future, the CATS department may advise on metadata practice.
Current metadata fields used in the MSU web archives are:
- Collector (Montana State University Library)
- Access to websites archived using Archive-It is provided through the Archive-It web interface.
- Access to websites archived using Conifer is provided through MSU ScholarWorks or the Filmmaking Archive of MSU Science and Natural History MFA Program.
- For websites archived using Archive-It: please see the Archive-It Storage and Preservation Policy.
- For websites archived using Conifer: please see MSU Library Digital Preservation Policy and Procedures.
Content within official Montana State University websites is predominantly considered public record.
For websites captured outside of the university domain, the Library will provide an opt-out opportunity to organizations or individuals whose websites are selected for archiving. The Library will actively work to ensure compliance with copyright laws.
The Library acknowledges that organizations and individuals as content creators of websites have agency over their born-digital content. If you believe the Library may have harvested your web content in error, or that maintaining your content in our archive does not adequately reflect your organization please contact Archives and Special Collections. Content related to your organization can either be removed from our collection entirely, or treated as sensitive or nonpublic data per Montana State University Library Digital Preservation Procedures section 4.4.
This web archiving procedure document will be used in new staff training to support web archiving practices in the Library. Current local practices will be reviewed periodically to access alignment with current best practices in the field.
Our policies are also informed by the following article: Christensen, M. K, & Maches, T. (2020). Web archiving: Policy and practice. Journal of Digital Media Management, 8(3), 201-214. Retrieved from https://escholarship.org/uc/item/3wc5t8nm
Hello [Insert Contact Name],
As part of our mission to collect and preserve the history of Montana and the immediate geographical region, Montana State University Library archives a selection of websites. I am reaching out to seek your permission to archive the website [Insert website name and URL] for inclusion in our web archive collections.
Archiving your website will include an initial capture as well as ongoing quarterly or semi-annual captures of the site. Website captures are completed using the Internet Archive’s Heritrix web crawler, and generally last for a few days. Once a crawl is complete, the crawler no longer interacts with your server. We capture websites at a slow rate so as not to interfere with access to your website.
You may request that we stop archiving or take down your archived website at any time.
If you do not wish to include your website in the collections at Montana State University Library, please opt out using this form. Thank you in advance for your consideration.
[Name & contact]
To exclude your website / your organization's website from the web archiving collections at Montana State University Library, please complete thie form below. If you represent an organization, please enter the name of the organization.