Competency E

Competency Statement: Design, query, and evaluate information retrieval systems.

Introduction

As mediators between users and a treasure trove of information, it is necessary for information professionals to understand how information retrieval systems are designed, how to operate them, and how their interdependency informs one another. It is also important to understand what makes a good information retrieval system as reasons to continue its use and development.

Designing IR systems

Design is a creative process that involves planning, implementation, and evaluation. Planning begins with defining a problem, and in the case of information retrieval systems, the problem is how to easily find information. The requirement for solving this problem is to determine how information is stored and organized because it will dictate how easy it is to find (Weedman, 2018). Further, organizing information requires information about information, or metadata. With today’s technology, metadata can be any attribute that describes information, including the resource’s publication date, classification number, author’s name, title, subject headings, entire text, and even relationships to other resources. The more metadata that’s provided about a resource, the more attributes there are to discriminate and aggregate information. There are still other important design questions to ask, such as what is the purpose of the information retrieval system, and who is it for? Answering these questions can help uncover and prioritize metadata that makes sense for users to aggregate information and leave out unnecessary metadata that would discriminate information if not administered properly. For example, if the IR system is intended to allow users to borrow resources, having price metadata available might be informative, but not useful for searching. Many items would be left out if the user was expected to specify something unrelated to their information need.

Querying IR systems

Choosing the right tools to query an IR system begins with understanding what metadata is available and how they can be used to express an information need as a search query. Simply entering search terms in the ubiquitous search box may not yield useful results without a way for users to give it more context. Some ways to add context include using Boolean operators to delimit the search terms and modify the results to include and exclude content or limiting the search terms to specific metadata fields.

Understanding how resources are represented by metadata can also lead to other search strategies. For example, Weedman (2018) describes four ways to represent subject metadata, including full-text (the entire text of a resource), controlled vocabularies (assigned categories using authority terms), tagging (crowd-sourced assigned categories), and classification (assignations of one category). These representations allow search queries that aggregate using natural language to group documents that contain matches to search terms or categories assigned by a variety of users, or aggregate by selecting or post-coordinating pre-determined authority terms or classification.

Evaluating IR systems

The challenge of information retrieval systems is getting relevant information by “maximizing the ability to discriminate between relevant and irrelevant documents” (Weedman, 2018, p. 182). Evaluating an IR system for relevancy is to measure how well the system retrieves all relevant documents, and how well it retrieves only relevant documents, otherwise known as recall and precision. An IR system that uses fast and simple search implementations such as full-text and tagging may find that the nuanced and changing nature of natural language negatively affects relevance, making its search results unreliable. Controlled vocabularies and classification may be more time consuming to create, but standardization and authoritative specification provides more precision and improved recall (Weedman, 2018).

To find relevant materials, the interface of an IR system must also be well designed and easy to perform search queries. A good interface has consistency and coherency in its presentation and operations that lets the user see clearly where to enter their search query and set its parameters, the state of the search query and its parameters, and continuous feedback and results of actions (Norman, 2013).

In some cases, users may not even know what terms to start searching with and can benefit from alternative search strategies, such as an index showing all available topics. Some websites, for example, have a site index that represents an overall map linking to its contents. Organizing information in different ways and providing alternative search strategies increases the effectiveness of an IR system.

Evidence

Evidence #1 – INFO247 – Vocabulary Design – Search strategies for indexed images.

In this assignment, I described and compared my image search process using concept-based and content-based searches for abstract, object, and scenic images. I found that searching for images using abstract concept terms may not be a good strategy due to its subjectivity, unless the intended goal is to explore broad visualizations of the abstract concept. In the process, I discovered how language played a huge role in providing underlying context. My search for “nostalgia” in English and in Japanese conjured very different images. Based on this experience, I can confidently recommend adding region or language context to search queries to maximize image recall of abstract concepts.

This evidence shows my ability to articulate the differences in search goals and strategies for image searching.

Evidence #2 – INFO287 – Linked Data – SPARQL query assignment

In the Linked Data course, I used SPARQL to query information from DBPedia. DBPedia stores data as Linked Data, which is “a method of publishing RDF data on the Web and of interlinking data between different data sources” (DBPedia, n.d.). Below is a short description of RDF metadata model and structure:

RDF is a directed graph. composed of triple statements. An RDF graph statement is represented by: 1) a node for the subject, 2) an arc that goes from a subject to an object for the predicate, and 3) a node for the object. Each of the three parts of the statement can be identified by a Uniform Resource Identifier. (URI). An object can also be a literal value. This simple, flexible data model has a lot of expressive power. to represent complex situations, relationships, and other things of interest, while also being appropriately abstract. (“Resource Description Framework,” 2024, para. 2)

In the assignment, I described my query process that began with getting the URI links of matching resources, which I iteratively refined to be able to humanly identify found resources’ information and filter my results to only include resources that contained a specific phrase. In another search, I referenced a specific data source, DBPedia’s ontology for “Birds” to search for resources about a specific type of bird. I also experimented with a language restricting statement to show only English resources. This assignment challenged me to familiarize myself with triple-store database design to create valid query statements that would not only return results, but actually work in the first place!

This evidence shows my ability to understand the design of an information system and use a dedicated query language to retrieve information from that system.

Evidence #3 – Work Experience – Song suggestions design proposal

When I was employed at Ubisoft between 2016 and 2024 as game designer on their Rocksmith+ product, one of the biggest complaints from players was entering search terms and be consistently faced with a disappointing empty screen with the words “Sorry, we couldn’t find a match for that.” Users were searching for artists and songs that weren’t available and faced a dead end with no other search strategy to move forward. I studied Netflix’s strategy and realized that what the music library lacked was an authority file of artists, albums, and songs. This prevented it from making synonymous suggestions that would allow the user to correct their search terms or offer the user available alternatives. I created a design proposal explaining these issues and the need for an authority file in order to provide other search strategies, such as synonym rings, to users.

This evidence demonstrates my ability to assess and identify the issues in an information retrieval system and propose improvements based on design principles for IR systems.

Conclusion

Weedman (2018) explains that the origins of “information retrieval systems” is an abbreviated form of “information storage and retrieval systems” (p. 173). This abbreviation indicates that IR systems should prioritize finding information over storing it. But this fallacy can cause library materials to quickly lose value if storage design is not carefully considered to meet the expectations of the search query. Conversely, query interfaces can employ creative ways of finding relevant materials when faced with limited metadata. Information professionals may find ourselves learning unique ways to search different information retrieval systems, or even design and build one, during our careers. This competency allows information professionals to have a core understanding of information retrieval systems to quickly be able to navigate new information environments, assess them for changes and improvements, and design and build effective information retrieval systems that maximizes relevance.

References

DBpedia. (n.d.). About DBpedia. DBpedia Association. Retrieved September 18, 2024, from https://www.dbpedia.org/about/Links to an external site.

Resource Description Framework. (2024, September 3). In Wikipedia. https://en.wikipedia.org/wiki/Resource_Description_FrameworkLinks to an external site.

Norman, D. (2013). The design of everyday things. Basic books.

Weedman, J. (2018). Information retrieval: Designing, querying, and evaluating information systems. In K. Haycock & M. J. Romanicuk (Eds.). The portable MLIS: Insights from the experts (2nd ed., pp. 171-185). Libraries Unlimited.

Weedman, J. (2018). Designing for search. In V. Tucker (Ed.). Information retrieval system design: Princples and practice (6.1 ed., pp. 118-139). AcademicPub.

Timeslip Services