Abstract

Understanding information-dense documents like recipes and scientific papers requires readers to find, interpret, and connect details scattered across text, figures, tables, and other visual elements. These documents are often long and filled with specialized terminology, hindering the ability to locate relevant information or piece together related ideas. Existing tools offer limited support for synthesizing information across media types. As a result, understanding complex material remains cognitively demanding.

This dissertation presents a framework that supports the close reading of multimedia documents through fine-grained augmentations. The framework is inspired, designed, and evaluated using a human-centered approach. We begin with a needs-finding study that identifies challenges in navigating documents and searching for information. These insights guide the design of a framework that surfaces connections between related details. We instantiate the framework in an augmented reading interface, which populates a scientific paper with clickable points on figures, interactive highlights in the body text, and a persistent reference panel for accessing consolidated details without manual scrolling. In a controlled between-subjects study, we find that participants who read the paper with our tool achieved significantly higher scores without increasing time to completion or perceived cognitive load. Fine-grained augmentations provide a systematic way of revealing relationships within a document, supporting engagement with complex, information-dense materials.