Free the Authorities!

Linking authority records and online catalog files

Mark R. Lindner

UIUC, GSLIS

"Free" authority control?

 

What I mean by "freeing" authority control

B. Tillett tells us that "The virtues of AC have been debated and restated for decades. ... Some said it was unnecessary, most said it was essential to fulfill the objectives of the catalog to find and collocate the records for bibliographic resources. Still others said, stop debating and just get on with it, and we have, but ever mindful of the costs" (Tillett, 24).

AC is one of the most expensive parts of cataloging, and while it may necessarily remain so, we can significantly reduce the cost of AC. To do so, we must set the authorities free!

Authority work is expensive and time-consuming. It is under utilized in most of our ILSs and other bibliographic tools. It also has a much wider applicability in today's wired world. As Tillett also says, "Here's where libraries have an opportunity to contribute to the infrastructure of the future Web..." (39).

Our authorities need to be updated and expanded, and then fully integrated into our ILSs and OPACs, and into the wider world of AI services, biographical dictionaries, telephone directories, web directories, and other reference tools and resources.

They need to be expanded across communities, to include: rights management agencies, archives, museums, other cultural heritage institutions, and possibly even for-profit groups such as publishers.

They most certainly need to be internationalized.

And last, but certainly not least and possibly even primarily, they need to be formally and fully integrated into the training and production resources of catalogers.

See Tillett and Wolveton (among others) on these issues.

Agenda

What?

Why?

How?

Having tentatively covered the What, I will briefly cover the Why, with most of my focus on How we might broaden the scope of AC.

Before moving on though, let me state that by "Authorities" I am refering to names (personal name, corporate name, meeting name, and uniform title (including series) name authorities, and subject names authorities, which primarily include topical, geographic and genre/form terms. I am also including LCC and DCC class numbers. In refering to these assorted records and files, I also mean to include not just the headings themselves, but their cross-references, notes, and especially their syndetic structures.

What AC can do in/for our catalogs?

enhance precision by use of "authorized" forms, lead user from form searched to form used & improve recall by the system of refrences created

syndetic structure enables navigation and provides the user with explanations for variations & inconsistencies

controlled forms of names, titles & subjects help collocate results in displays

serve as a reference tool for catalogers & all other librarians (and users)

help distinguish a person or corp. body, one from another

can be linked to bib files so that data elements display in user's preferred language and script

also serve to document cataloger decisions about choosing controlled access points and creating new ones

See Patton, Taylor (Org of Info) and Tillett, primarily, on these ideas.

What AC can do on/for the Web?

 

Wait! Do I mean authorities linked to Wikipedia? You bet I do!

All of the previous.

  • Link authorized forms to various tools and resources of all types, thereby bringing all of the benefits of AC to electronic media of all types, and greatly increasing the perceived importance of AC.
  • How to "Free" the authorities?

    I have spent just a few minutes covering the what and the why of promiscuous authority control, now I'd like to switch to the how. What are some of the proposals and even better, what has or is being tried in this arena?

    Chan presented this paper at the Conference on Classification as Subject Enhancement in Online Catalogs in 1986.

    Relates a study that explored the effectiveness of DDC as a searcher's tool in online access: improved subject access. This paper took a (theoretical) look at doing the same with LCC, particularly: enhanced vocabulary, subject browsing, and class number searching.

    Indexes: Most indiv. LCC schedule has an index, but there is no index to the whole CS. Indexes should be created for those missing them, and a complete unified index should be generated. And it certainly needs to be in a machine readable form.

    Vocabulary: Much of the improvement in the DDC experiment was based on an enhanced vocab. Different records were retrieved based on keywords for in the Relative Index, and schedule and caption notes. This begs the question of how much improvement might be expected if LCC schedule captions and index entries were available for searching.

    Imroth deplored the 3 vocabularies he found in the LCSH, the LCC schedules and the indexes. But it is just this diversity and richness of terminology that holds promise for improving access, as pointed out by Chan (184).

    Study by Richmond in 1974 found surprisingly little overlap between terms in DDC and LCC for "computers." Rich variety suggests great potential for enriching entry vocabulary.

    What if we linked LCSH and LCC in our catalogs? Chan tells us about 3 modes of online subject browsing:

    • Shelf-order browsing: almost all systems can display in call number order.
    • Full-schedule browsing: None can do this (with LCC) because the schedules are still not in machine readable format. This is a priority along with indexes.
    • Subject outline browsing: (was implemented in the DDC experiment) Chan reports on 2 possibilities for this: Nested classification records as proposed by Cochrane and Markey, and a hierarchical breakdown based on the indentations in the schedules.

    Because of differences in the way DDC and LCC organize knowledge, if we brought into our subject systems users would gain the benefit of two classed catalogs in one (186).

    Class number searching: Not every effective "for open-ended seaching" based on the way LC class numbers are constructed. Although LC call numbers are particularly effective for known item searches as "they are unique to particular works and are uniform across databases" (186).

    Dewey Numbers in Authority Files

    Mitchell, Joan S. Dewey Numbers in Authority Files (Discussion Paper)

    Mitchell, Joan S. Dewey Numbers in Authority Files (Discussion Paper) prepared for the Decimal Classification Editorial Policy Committee (EPC) meeting at the LoC, Oct. 11-13, 2006.

    Since the early 1990s have been attempting to map Dewey numbers into LCSH, MeSH and Canadian Subject Headings. Reports on proposed mappings of Dewey numbers into LCSH authorities.

    Name authorities in Wikipedia!

    Die Deutsche Bibliothek (DDB) links thousands of Personennamendatei (PND) to biographical articles in Wikipedia (press release).

    A news note in CCQ 42 (1) 2006 alerts us to a press release dated 3 Aug 2005 reporting on Die Deutsche Bibliothek (DDB) linking thousands of personal name authorities to biographical articles in the German-language version of Wikipedia.

    Wikipedia gains "a highly structured authoritative cross-referencing structure for access to its biographies; DDB obtains new visibility and a means of bringing new patrons to its catalog" (146).

    Help catalogers / Auto-generate authorities

    Seven years ago at a conference on "AC: Why It Matters," Arlene Taylor presented a paper entitled, "AC: Where It's Been and Where It's Going."

    I commend it to you as an excellent lit review of research on authorities up until then, but our concern with it today is her comments on a visit she had made to Oxford University's Bodleian Library just prior to the conference.

    The Bodleian had a Geac ILS with integrated authority control. "As cataloging proceeds, each name, uniform title and subject heading can be checked against the authority file simply by clicking in the field. If there is a name match, but the cataloger is not certain it's the same person, the bibliographic records to which the name has been assigned are called up with a click, not a search of a separate file. If there is a match with a different form of name, the authority file can be automatically copied into the bibliographic record. If the name is not there, a minimal authority record is created by the system automatically, using information from the bibliographic record at hand. It seems to me to be much more efficient, not to mention effective, for a cataloger to be able to do this kind of authority checking with item in hand, than for names and subjects to be checked against authority fiels later when the cataloged item is long gone" (2).

    When speaking of the future of AC, Taylor brings up one of her "soapbox issues." As she says, "I honestly don't know how we can say we have AC over names in our catalogs, thus implying that everything by a person will be found under the authorized form of the person's name, when we have a rule that doesn't even allow us to enter some people's names in a bibliographic record, let alone construct their names in the AC form" (5). Multilple authored items are growing--her and 590TR.

    (Tillett) ACIG survey: 1984. Catalogers wanted:

    • Comprehensive, internationally shared resource authority file with any library able to add
    • Keyword access
    • Browsable files, esp. for author/title uniform titles
    • Easy navigaton of among parts of hierarchies and earlier/later names
    • Automatic id of changed authority records in their local files and an easy way to update and resolve conflicts.
    • More identifying information--more dates; bring back history and scope notes
    • Local customization

    Multiple forms of names / Browsing

    The Getty Union List of Artist Names

    Getty Thesurus of Geographic Names Online

    Download

    ULAN: Ovile Master

    TGN: Mali. Notice hiearchy. Notice view physical features versus political entities.

    AAT, ULAN, TGN are all available for download and licensing. Available in MARC, XML and relational tables. http://www.getty.edu/research/conducting_research/vocabularies/download.html

    Internationalization

    Challenges:

    • Different catalogs with different rules to serve their users
    • Systems we wish to link to with no rules
    • Different languages and scripts. See: Aliprand article
    • Records in different communications formats, various versions of MARC, XML, proprietary database formats; e.g., InMagic.

    Emphasis on interoperability has been and is increasing

    Crosswalks and mapping strategies have been implemented: MARC to ONIX, MARC to XML, etc.

    Recent projects:

    • AUTHOR Project: converted samplings of authority records from 7 countries into UNIMARC
    • LEAF Project: "developing a model architecture for collecting, harvesting, linking of, and providing access to existing local or naitonal name authority data, independent from their creation in libraries, archives, museums, or other institutions and independent from national differences" (Weber, 227). Usable for/by everyone. Access via ftp, Z39.50, OAI and SOAP (Simple Object Access Protocol [XML])
    • HKCAN: Hong Kong Chinese Authority for names. Enables romanized forms of headings and Chinese traditional and simplified character forms. Is a working internartional shared authority file.

    IFLA: GARR = Guidelines for Authority Records and References issued in 2001. MLAR = Minimal Level Authority Records. FRANAR = Functional Requirements for Authority Numbers and Records

    Joint Information Systems Committee (JISC) [UK] (with cooperation from OCLC) is embarking on a two-year project of immense scope. "Terminology Services (TS) are a set of services that present and apply vocabularies, both controlled and uncontrolled, including their member terms, concepts and relationships. This is done for purposes of searching, browsing, discovery, translation, mapping, semantic reasoning, subject indexing and classification, harvesting, alerting etc." "TS can be m2m [machine-to-machine] or interactive, user-facing services and can be applied at all stages of the search process. Services include resolving search terms to controlled vocabulary, disambiguation services, offering browsing access, offering mapping between vocabularies, query expansion, query reformulation, combined search and browsing. These can be applied as immediate elements of the end-user interface or can underpin services behind the scenes, according to context."

    Virtual International Authority File (VIAF)

    Barbara Tillett has been talking about a VIAF for several years now, but as she reminded us in a 1998 article the concept of international AC began with the 1961 Paris Principles (ref in Taylor, 4)

    VIAF Project: OCLC, LOC, DDB : explores virtually combining the name authority files of both institutions into a single name authority service.

    Beneifts:

    • Extending AC via display of notes and references, links to related resources (official Web sites of the entities, authoritative biographical dictionaries, etc.)
    • Switch display to preferred language and script of users

    Distributed vs. Centralized: Independent vs. Union. "Physcical" union vs. a virtual union

    VIAF: OCLC matching algorithms to compare LCNAF (about 5M) and DDB's PNF (about 1M).

    An article in LRTS (Library Resources & Technical Services), Volume 50, No. 3 (July 2006) by Jenny Toves, Ed O'Neill, and Thom Hickey.

    Conclusion

     

    "Here's where libraries have an opportunity to contribute to the infrastructure of the future Web..." (Tillett, 39) and to so very much more than simply that.

    So in response to my motivating question, "Should subject access mechanisms on MARC bibliographic records be linked in autohrity files and online catalogs?" I can only respond with a heart "Yes, but..." The but is that we must take them even further, we must "free" them for use wherever and whenever they can be of value.

    Bibliography and associated resources 1

    [any material that should appear in print but not on the slide]

    Bibliography and associated resources 2

    [any material that should appear in print but not on the slide]

    Bibliography and associated resources 3

    [any material that should appear in print but not on the slide]

    Bibliography and associated resources 4

    [any material that should appear in print but not on the slide]

    Technology Credits