Competency Statement: Demonstrate understanding of basic principles and standards involved in organizing information such as classification and controlled vocabulary systems, cataloging systems, metadata schemas or other systems for making information accessible to a particular clientele.
Introduction
An unorganized creative genius may be able to find a book hidden under stacks of other books cluttered on their desk or a file deep within a tree of folders on their computer, but their retrieval strategy works for them and only them. A collection shared by many users needs a standardized and consistent organization system that is predictable and makes it easy to find materials consistently. This competency gives information professionals the foundations of providing users access points to resources, while embedding organizational and descriptive features on materials for retrieval.
Classification & Controlled vocabulary systems
A predictable and consistent search for a user community that is numerous and diverse requires commonly understood classification. Often users aren’t sure what they are looking for besides their chosen topic. Ensuring subject access to resources is an intuitive solution, but also a highly subjective task where the user, author, and even cataloger may use different, but synonymous, terms to describe what a resource is about. Further, words can refer to more than one concept and be nuanced by figures of speech (Weedman, 2017). The consequences are items being aggregated even though they are irrelevant, or items being overlooked entirely due to inexact search terms.
Classification and controlled vocabularies have provided ways to control linguistic chaos. Indexers can limit subject terms by pointing synonymous entry terms to designated terms and by making terms representing more than one concept distinct from one another (Weedman, 2017). While developing and maintaining a controlled vocabulary and assigning classification to an item can be time-consuming and costly, controlled vocabularies allow users to select from what exists and can be assured that the content is substantially about the subject heading assigned to the resource. Index terms in controlled vocabularies are not limited to subject access, and can be used with other access points, such as using authority lists with standardized authors’ names, or made more user-friendly by considering the jargon of the information community who will be using the collection.
Classification schemes are used to dictate where physical resources are shelved and can be retrieved. The Dewey Decimal Classification is popular at public libraries for its user-friendliness and universally understood numeric notation, while the Library of Congress Classification includes alphabet letters for an expansive and expanding well of knowledge. Both classification schemes are built for browsability from generic to deeper levels of specificity.
Cataloging Systems
To find and retrieve a resource, it must be organized in a cataloging system, which holds records of each resource and provides different access points for search. Libraries use specialized integrated library systems (ILS) that store, search, and display library data in the form of MARC records and accomplish other library tasks, such as circulation and cataloguing new acquisitions. These systems have features that allow libraries to create and import catalog records of their physical materials and modify them for their own purposes to avoid starting from scratch. These back-end features are invisible to patrons, who are instead presented with a more user-friendly front-end interface to perform searches.
Modern ILS, especially research libraries, can connect to discovery systems that allow patrons to find not only physical resources at their library, but also electronic resources, such as databases and outside repositories. (Moulaison and Wiechert, 2015). This is made possible with metadata schemas.
Metadata schemas
As information professionals, we rely on machines to enter and manage cataloging and retrieval information for items in a collection. Communicating with these machines require using a metadata schema. Machines do not have tolerance for nuanced communication as we humans do, so standardization is a requirement for them to work at all.
Metadata schema is designed using a typology of standards for structure, content, value, and exchange (Miller, 2011). Defining what is mandatory and optional in the element sets, how data should be entered into those elements, controlled vocabulary or authority lists to use as values, and technical information for data encoding and interchange are specifications for a functionally useful metadata schema. Public libraries use the MARC21 metadata schema that enables them to manage library functions, such as a circulation, and be able to share their collection records with other libraries or branch locations for interlibrary loans, increasing access to materials.
Special libraries, museums, or archives may create their own or adopt and modify an existing metadata schema to adequately describe the scope and content of their collection materials easily. For instance, MODS, a scheme developed by the Library of Congress, is popular amongst digital libraries (The Library of Congress, 2024). It is highly compatible with MARC21 because it includes many MARC elements, but it uses schema declarations to flexibly define language-based tags for resource descriptions, instead of MARC codes. It is unbound by content rules like RDA, and its XML-encoding format makes it compatible with web standards to be viewed online (Miller, 2011). While creating one’s own metadata schema is not impossible, information professionals should consider integrating existing metadata schemes for compatibility and sharing with other systems or interfaces.
Let’s not forget that digital resources are even easier to reproduce and distribute than their physical counterparts. As they become more available, it is important to include in the metadata schema context for descriptive, structural, and administrative information, so that a resource can be found and identified, move across discovery systems, and have its provenance identified so that it can be used legally.
Evidence
The evidence I chose to present to demonstrate this competency include a classification assignment for INFO 287 (Classification Schemes), a learning outcomes report on my cataloging tasks during my internship at the American Film Institute, and a video metadata analysis for INFO 282 (Digital Asset Management).
Evidence #1 – INFO 287 – Classification schemes – Dewey Decimal Classification
Since much of my focus has been on digital curation, I wanted to learn more about the organization of physical items. I was familiar with Dewey Decimal numbers from my childhood, but I had no idea what the numbers meant, and only relied on the notation to find the right shelves and follow the number order to my target resource. Reflecting on being able to navigate the library system with the DCC reinforced my understanding as to why this classification is so popular in public and school libraries. Also, when presented with a list of resources to classify, I understood the usefulness of copy-cataloging and being critical of it. Being critical of copy-cataloging forced me to look up the classification and evaluating the accuracy and specificity of it for myself before deciding to use it or modify it.
This evidence demonstrates my ability to understand the importance of organizing materials for findability.
Evidence #2 – Internship – Learning Outcomes Final Report
For my internship at the American Film Institute in the summer of 2023, my cohort and I were assigned to catalog digitized artifacts simply organized in folders from the Mayer Library, including video and audio interviews and verifying photo sets that were tagged using AI-generated facial recognition. Some standards I had to abide by was selecting the metadata scheme intended for a type of digital object, entering titles with a specific format, entering copyright information, and entering the identifiers exactly as they appear. As for identifying what the content was about, I had to research names of interviewees, people, and movies mentioned in the interviews. If a name wasn’t already in the system, I would rely on IMDB as the authority list for the correct format and spelling. Cataloging subject terms integrated both tagging and controlled vocabularies: the system would bring up existing categories after I typed a word, and if there was nothing suitable, I would enter a new category. This meant having to think of different ways to describe a subject to find an existing one first to keep the number of categories manageable. For the photos, I took the AI-generated facial recognition results and added names to subjects to train the AI and verified its training on the rest of the set, so that users can find all photos available of a specific actor/actress.
This evidence demonstrates my ability to catalog materials in a way so that they may be found.
Evidence #3 – INFO 282 – Digital Asset Management – Video metadata analysis
This assignment presented a collection of video clips that appeared to be random at first, but upon closer examination, it was clear that the clips came from movies. With the help of my classmates, I was able to place movie titles on the clips, and with my knowledge from a beginner film course, I found features that could be used to aggregate and discriminate them in a search. In addition, I considered including technical metadata for interoperability, publisher metadata for legal use, and controlled vocabularies that can be used to classify the clips, such as The Getty’s Art & Architecture Thesaurus and the Library of Congress’s Genre/Form terms and promote compatibility with other search systems.
This evidence demonstrates my ability to examine a collection to determine relevant descriptive, structural, and administrative information to define a functional metadata schema to organize a collection.
Conclusion
It is easy to think that alphabetizing authors, titles, and categorizing what a resource is about is all it takes to organize a collection but assigning that information to a resource to ensure that machines provide relevant aggregation, findability, and identification are functional requires more thoughtful effort. Together with this competency, I can rely on the subject experts of authoritative institutions, such as the Library of Congress, The Getty, Dublin Core, and others, to provide controlled vocabularies, formatting, and metadata standards to be able to stay current and use a common language shared with other repositories to find and describe materials. Although natural language processing and indexing technology is making leaps and bounds to aggregate data, machines will continue to rely on information professionals to define standards.
References
Miller, S. J. (2011). Metadata for digital collections. Neal-Shuman Publishers, Inc.
Moulaison, H. L. & Wiechert, R. (2015). Crash course in basic cataloging with RDA. Libraries Unlimited.
Network Development & MARC Standards Office (2024). Metadata Object Description Schema: MODS. The Library of Congress. https://www.loc.gov/standards/mods/
Weedman, J. (2017). Lecture 3 supplement: Subject metadata. In V. Tucker (Ed.) Information retrieval system design: Principles & practice (6.1 Ed.). Academic Pub.
Leave a Reply