Date Published: February 5. 2015


This year, we at Publication and Data Services have started looking into adding unique identifiers to the content in our digital collections, including ScholarWorks. Almost as soon as scholarly content began to be published online, there arose the problem of “reference rot”when links to online content no longer work. A recent study in PLOS ONE looked at millions of articles and found that one in five reference links were broken. (And just yesterday, an update was published on the Impact of Social Sciences blog.) A New York Times article from last year highlighted reference rot in Supreme Court cases. It’s not only inconvenient to click on a link that leads to a 404 error page, it threatens our scholarly legacy. To combat this problem, several persistent identifier formats have been developed, including the Handle System, Universal Resource-Identifier/Locator/Name (URI/URL/URN), Digital Object Identifier (DOI), Archival Resource Key (ARK), and Universally Unique Identifier (UUID). Academic journals and digital libraries now commonly use these persistent identifiers in order to make sure that their digital content is available into the future. (Just a note: we use the word “persistent” in order to hedge against the more forceful “permanent.” These identifiers are designed to help combat reference rot, but they are only as permanent as the institutions that mint and maintain them. If you’d like to learn more, here’s an interesting blog post discussing some of the nuances of DOIs.)

Right now, we use a hybrid of Handles and URIs in ScholarWorks. When we upload a record to the repository, it automatically gets a unique URI that contains a Handle (for example, the URI has Handle 1/3413). The idea is that this URI is a persistent link for citation purposes. But a couple of factors have gotten us thinking about alternatives to our system. First, while looking into how to make our DSpace repository better looking, we realized that it might help to switch from the XMLUI interface to the JSPUI interface of DSpace. We don’t have to get into the differences between these interfaces. But you can see in the example URI from ScholarWorks that “XMLUI” is actually part of our unique identifier. If we were to switch interfaces, all of our so-called “persistent” URIs would break.

Our second consideration is that Digital Object Identifiers (DOIs) are quickly becoming the standard for scholarly articles and data sets, and a few recent publications have shown DOIs to be robust persistent identifiers, especially for data. We’ve also seen some examples of ARKs being used for digital archival content, and right now our digital photos and documents don’t have persistent identifiers at all. We decided to look into assigning DOIs to our articles and data, and assigning ARKs to our digital collections.

The main difference between DOIs and ARKs is that DOIs are generated and managed by a few specific organizations, whereas ARKs can be generated and managed by any institution. The process of becoming a DOI-minting agency is expensive, and therefore DOIs are only offered by a couple of services. DOIs are used more often by publishers and online data providers, and the DOI agencies make most of the technical decisions surrounding DOI minting and metadata. On the other hand, it is free to procure a Name Assigning Authority Number (NAAN) in order to generate ARKs, and open source software can be used to mint ARKs and create associated metadata. ARKs tend to be used by cultural institutions, and each ARK-generating institution is free to define its own policies and services.

Right now there are two main minters of DOIs: California Digital Library (CDL) EZID service, and CrossRef. For a PhD granting research institution like MSU, EZID’s annual subscription fee is $2500, with a million DOIs and unlimited ARKs included. CrossRef’s pricing is determined by publishing revenue; since we make less than $1 million per year from our publishing ventures, CrossRef would only cost us $275 per year, with an additional fee of $1.00 for each publication and $.06 for each data set. Since at this point, we don’t plan to mint many DOIs, it looks as though CrossRef might be the way to go. The one hitch in the plan is if we still want to assign ARKs to our digital archival collections. CrossRef doesn’t provide an ARK generating service, so we’d have to get a Name Assigning Authority Number from CDL and set up a system for creating our own.

Is this all too much trouble? Should we just pay the $2500 for EZID and call it good? The answer, of course, has to be determined by our administrators. I’m meeting with our Executive Team tomorrow, and I’ll post an update next week about decision.