Tag Archives: marc
An interesting project to make it easier to integrate the records from digital resources into a library public OPAC.
I now consider digital resources–OER, images, videos, audio files, slideshows, documents, ebooks, maps, art, student work, data sets, interactivities, simulations, and especially the elements of the world’s richest museums and archives–intrinsic parts of my collection.
I may not own them, but I use them. And I want them discoverable and openable directly from the tool I purchased for discovery and access–my OPAC.
I envision the entry page to a library OPAC to look much the same as Google: an empty text box just waiting for the patron’s request. The OPAC I envision has a link underneath providing “advanced” searching – the more familiar keyword or Boolean searches. The default, however is that the patron enters his free-form question in the box, and the software is capable of taking that question, translating it into terms that the software can use to search all available library resources, and then displaying a list of materials within the library collection (including books, serials, and any digital material) that intelligently provide the information the patron requests.
This is what the patrons expect … Read More on theberdinka.com.
The ontology layer is what should be between the LCSH authority records and natural language interface in the optimal OPAC design. An ontology defines classes, sets, attributes, and relations of the domain – in this case the LC Subject Headings. Also, it includes synonyms of the members of the domain. Michael K. Bergman has written an extremely informative article entitled An Executive Intro to Ontologies.
The user interface would need to be programmed more intelligently in order to do more than perform the brute queries that are typical of an OPAC. There has been a lot of research on natural language processing in the computer science field using a number of programming languages. For example, the Jena Semantic Web Framework for Java provides tools and libraries to allow a software developer to design a search engine that can take a topic and find titles in the catalog database to answer the patron’s query.
The patron, on the other hand, doesn’t need to know all of this. All he needs to know is the same thing he knows when he uses a modern search engine – what is the question being asked. With my proposed design, our software will provide much more accurate results than a keyword based search engine. This is what both the patron and the librarian wants.
A pressing problem confronted by Information Organization practitioners is subject access of the OPAC. Keyword searches often lead to either failure or retrieving too many references. When searches yield zero hits, whatever the reason, many users will abandon the search. This is not a function of computer literacy. The OPAC is a black box to users and they know very little about what happens inside the system… and they shouldn’t need to. This is why post patrons prefer “free-text” searching, because it requires little thought – however it produces less intelligent results. Ranganathan’s law, “Save the time of the reader,” is applicable here. It is the responsibility of those of us organizing information to allow the searcher to find his information accurately in a little time as possible.
Three possible solutions to the problem are increasing instruction in Boolean search techniques, allowing users to browse subjects, and natural language searching.
Boolean Search Techniques
One solution is to increase instruction in Boolean database search techniques at the library. Classes or instructional materials are provided for a patron to “follow” while he is learning how to effectively search the OPAC. Many colleges require these classes for incoming Freshmen to learn these techniques to search the OPAC and library databases. Boolean operators can greatly help users to increase precision. This is a very effective searching strategy when used correctly. One can broaden or narrow the search using the applicable Boolean terms (AND, OR, NOT). It is inexpensive – it does not require the library to invest more scarce resources in the design of the OPAC, other than the costs associate with training.
However, Google does not require instruction – and that is the interface that our patrons are coming to the library with. Boolean logic is not common sense for most people. Libraries are to serve humanity; humanity should not have to serve the library. Most end users search the OPAC only occasionally and do not access the system on a regular basis and they tend to learn only enough to do simple searches reasonably quickly and to regard further instruction as unnecessary and more extensive expertise as a burden.
Browsing Subject Indexes
Another solution is to programmatically permit the patron to browse subject indexes. Subject headings could be clicked through, such as through a tree, until arriving at the subject desired, using Java Script or a similar user interface. This author used a database at the Foundation Center that allowed users to browse the grant database using a similar interface. The user does not have to guess at key terms to search, reducing the introduction of human error and ending with relevant search results. Authority control is easily maintained, since the tree must be traversed and the results are within the domain of the controlled vocabulary. When the patron finds the subject term he wishes, he will find accurate results to his query. This would be a highly usable method for any user to find subjects of interest without having to know ahead of time exactly what term is being searched for.
However, the patron might “get lost” if he traverses the tree in the wrong direction, and become frustrated. This can also be time consuming search method if the desired term is deep within the tree. This writer believes that, while this was an effective method of searching the Foundation Center grant database, the exhaustive scope of the LC Subject Headings would make this a very cumbersome method of searching a library OPAC.
Natural Language Front-End / Subject Headings in the Back-End
We can create an intelligent natural-language front end which uses subject headings and an ontology for searching the OPAC, through enhancements of the database and DBMS layers. The OPAC search engine would understand the concept in the question rather than just matching the text. The database would include an ontological metadata layer. An ontology is more than a taxonomy which places concepts into a hierarchy. An ontology allows concepts to be connected through a variety of multidimensional relationships, which reflects the vocabulary of the domain – in this case the LC Subject Headings. Semantic links are descriptive terms which illustrate the connections between ideas (such as “is the parent of” or “is a subset of”) – which are used to create the relationships in an ontology. This provides better routes of topic exploration, and allows for search techniques based on natural language parsing. A programming framework can then be used to interpret the patron’s request and query the database via the ontology. The patron does not have to change his behavior to work with the software – the software will work with what the patron gives it, including suggestions in the case of a typographical error. Authority control is maintained because the software takes the patron’s language and, via the ontology, translates the meaning to the controlled vocabulary of the Subject Headings. The results will be accurate and relevant – finding the intended concept instead of the precise word (please see King, B. E., & Reinold, K. (2008). Finding the concept, not just the word: A librarian’s guide to ontologies and semantics. Oxford: Chandos for a much more extensive discussion of this idea).
However, often subject cataloging is superficial and inadequate. This is critical to help guide the patron to the subjects he is seeking. The average bibliographic record contains less than 2 subject headings. Also, ontology creation and programming the user interface can be a very expensive proposition for a budgetary strapped library system.
This writer believes that the most effective solution is clearly creating a natural language front-end; creating a user interface that makes the search process for the user “Google-ish.” This is much like conducting a reference interview. One doesn’t require the patron to ask the question “correctly,” nor does one refuse to serve the patron if the terminology is not within our controlled vocabulary. It is the responsibility of the information professional to lead the patron to the information he desires. The gathering process is often invisible to the patron. The librarian connects what a patron asks for with what else is out there – other topics connected to it, how old the information might be, and where she might look for an answer. The librarian then evaluates results to see if it answers the question.
So should it be with the OPAC. The OPAC should function as the librarian does at a reference interview, and deliver accurate results that answers the patron’s question.
‡biblios.net is a free browser-based cataloging service with a data store containing over thirty-million records. It grants the user a “nonexclusive right to use Data under the terms of the Open Data Commons Public Domain Dedication and Licence.”
I searched for Cluetrain Manifesto and came up with ten MARC records, including this one:
|000||01375cam a2200289Ia 4500|
|008||010516r20012000mau 000 0 eng|
|245||04|aThe cluetrain manifesto : |bthe end of business as usual /|cRick Levine … [et al.].|
|260||##|aCambridge, Mass. : |bPerseus Pub.,|cc2001.|
|300||##|axxii, 190 p. ;|c25 cm.|
|505||00|gThe|tcluetrain manifesto –|tIntroduction –|tInternet apocalypso /|rChristopher Locke –|gThe|tlonging /|rDavid Weinberger –|tTalk is cheap /|rRick Levine –|tMarkets are conversations /|rDoc Searls and David Weinberger –|gThe|thyperlinked organization /|rDavid Weinberger –|tEZ answers /|rChristopher Locke and David Weinberger –|tPost-apocalypso /|rChristopher Locke.|
|650||#0|aElectronic commerce|xSocial aspects.|
|650||#0|aCustomer relations|xTechnological innovations.|
|650||#0|aIntranets (Computer networks)|
|650||#0|aInformation superhighway|xEconomic aspects.|
Typically when a library obtains a book, they go to OCLC to obtain the record. Then the cataloger might add data specific to their library – for example an 099 local call number if it doesn’t use Dewey or LC. OCLC is not free, however. Also, the library cannot edit OCLC records. It is not a collaborative effort, but a hierarchal procedure.
‡biblios.net, however, makes it possible to have collaborative records. In the case of Cluetrain Manifesto, for example, there were 10 records from which to choose. The library could choose whichever record suited its needs, and in fact submit a different record if it so chooses. This is harnessing the collective intelligence – the “wisdom of crowds.”
Even a source seen as “authoritative” is not necessarily always correct. Back in the 1960s when my husband was a teenager he was clamming in the bay near Westhampton with his father. A small airplane caught its pontoons on some electric lines and crashed in the bay. There was no one else around, so he and his father rowed over to the airplane and rescued the occupants, leaving them on the shore. In the next issue of the local paper in the fire department blotter was a story about how the fire department had rescued the occupants of that plane! This is just one example of why one cannot rely on any resource by itself as “the truth.”
Collective cataloging (like what is beginning to happen on ‡biblios.net) and moving away from a system where one source is seen as canonical, is perhaps a path to greater overall accuracy than the current dependence on traditional cataloging procedures and systems.
I would encourage you to listen to the podcast produced by the Library 2.0 Gang entitled The Cataloging Services Landscape, where they discuss this issue and the ramifications of a more competitive cataloging landscape to libraries.