Audio guide planning

Museum audio description: what it is and how to write it well

June 9, 2026

What museum audio description is, the four fundamentals from the Audio Description Project, how to structure and pace a description, and how to write so blind and low-vision visitors can build a mental image of the work.

Museum audio description is the use of precise, objective language to convey what a blind or low-vision visitor would otherwise see: an object's composition, scale, colour, materials, position in the room, and the visual relationships that carry its meaning. It runs as its own tour variant alongside the standard audio guide, usually on the same stop numbers, and is the practical answer most museums give to ADA and European Accessibility Act expectations. Written well, it also benefits sighted visitors by slowing them down and pointing them at what they would otherwise walk past.

This guide is for content editors, exhibition producers, writers, and accessibility leads writing or commissioning a museum audio-described tour. It draws on the public guidance from the Audio Description Project at the American Council of the Blind, Joel Snyder's four fundamentals as taught by the ADP, VocalEyes, Cooper Hewitt image description guidelines, and peer-reviewed research on what blind and low-vision visitors actually use. For the wider blind-accessibility planning view, see audio guides for blind and low-vision museum visitors.

What museum audio description is, and what it is not

The Audio Description Project defines audio description as language that "makes visual elements of media more accessible to people with visual impairment and other disabilities" through "concise, objective descriptions of visual components that are crucial to understanding the creator's intention." VocalEyes describes it more broadly as "the use of precise, logically structured language, interwoven with historic references and cultural narratives, to evoke an artefact or a place." Both definitions point in the same direction: a museum audio description is the visible parts of the object turned into words a blind visitor can use to build a mental image.

A standard audio guide assumes the visitor is looking. It introduces a work and moves straight to interpretation: who made it, when, what it means, how to understand it in context. An audio-described tour cannot start there. It has to give the visitor enough visual information to picture the work before any of the interpretation makes sense. That is the working difference between a museum audio guide and a museum audio description, and it is why audio description is a distinct discipline rather than the regular script read more slowly.

The Smithsonian Guidelines for Accessible Exhibition Design set audio description as a baseline expectation, stating that items essential to an exhibition's main theme must be accessible by tactile examination or comprehensive audio description, with the equipment to access that description available on site and signposted at the entrance. In WCAG 2.2, audio description is also a normative criterion for prerecorded video content, which carries through to any companion video the museum publishes around the tour.

The four fundamentals of audio description

Joel Snyder's The Visual Made Verbal, developed through the ADP, is the most widely used training framework for audio description in the United States. It organises the practice around four fundamentals. The first three are the writer's craft; the fourth is the voicing.

Observation: learning to see

The first fundamental is active looking. Snyder cites the French poet Paul Valery, "Seeing is forgetting the name of what one sees," to make the point that labelling is not describing. A writer who looks at a painting and immediately writes "a Madonna and Child" has labelled the work and stopped. Description starts after the label, with the colour of the Virgin's robe, the position of the Child against her body, the architectural detail behind them, the direction of the light. The writer has to look long enough that the named thing becomes a set of visible facts again.

Editing: what not to say

The second fundamental is selection. The eye takes in more than the voice can recount, especially inside the few minutes a visitor will spend at a stop. Editing decides which visual facts carry the meaning of the work and which are scenery. As Snyder puts it, the describer must "leave out all but the essential." In practice, this means writing the long version first and cutting back to what a curator would point to if standing next to a sighted visitor.

Language: the words you say

The third fundamental is the writing itself. Snyder's rule is that describers function like journalists, reporting visual facts in clear, specific, present-tense, third-person language. The Cooper Hewitt image description guidelines add a useful constraint: describe in a consistent direction, top to bottom or left to right or foreground to background, so the visitor can build a mental layout in the same order each time. Avoid the throat-clearing of "this is an image of" and the inflated vocabulary of catalogue prose.

Vocal skills: how it is read

The fourth fundamental is the recording itself. The voice should match the material in pace, energy and tone. The ADP's stated position is that human-voiced narration remains the standard because it "offers expressiveness, emotional nuance, and clarity." Synthetic voices can be used as a baseline where human recording is not possible, provided the result still passes for intelligible museum narration. The point is not the technology but whether the visitor can hear the description clearly enough to follow it on the first listen.

WYSIWYS: objectivity over interpretation

The ADP teaches the acronym WYSIWYS: "What You See Is What You Say." The principle is that the describer reports observable facts rather than naming emotions or assigning meaning. Rather than "he is furious," the description says "he is clenching his fist." Rather than "the room feels oppressive," the description says "the ceiling is low; the walls are painted dark red; there are no windows." The visitor builds the inference. The describer does not pre-empt it.

Objectivity is also where most well-meaning museum description goes wrong. Curators trained in interpretive writing find it harder than they expect to write a paragraph about a Caravaggio that does not call it dramatic, theatrical or anguished. The discipline is to keep those words out of the description and let the visible facts of the painting carry the drama. The interpretive paragraph can sit elsewhere in the tour, after the description has done its job.

Two narrow exceptions are common in museum practice. Identity and demographic detail should be reported when it is part of the work and can be verified, not inferred from appearance; Cooper Hewitt's guidance is to avoid assumptions about gender, ethnicity, or identity unless clearly performed or verified. And gesture or expression can be named directly when the work shows it unambiguously, with the named expression following the description of the face rather than replacing it.

The order of a single description

Most museum description guidance converges on the same ordering: open with the standard catalogue facts, then orient the visitor in the work, then describe the composition in a consistent direction, then add the detail and context that the visitor needs to understand it. Art Beyond Sight puts it as standard information first: artist, nationality, title, date, dimensions, medium and technique, then a general description of the subject matter and composition. Cooper Hewitt's image description guidelines reinforce the same point from the digital side: "descriptions should start with the most important content of the image and branch outward to describe the details."

Title-bar information. Artist and dates, title, year, medium and dimensions. This is the equivalent of the wall label.
Overview. Type of work, scale relative to the viewer, overall shape, dominant colours, where it sits in the room.
Composition. The visual elements in a consistent order: top to bottom, left to right, or foreground to background. The order should be the same across the tour so the visitor learns the writer's pattern.
Detail. Specific features that carry meaning: facial expression, posture, costume, materials, the use of light, the texture of the paint or stone.
Interpretive context. Only after the description: why the work matters, what the curator wants the visitor to take from it, the connection to other works in the gallery.

The same pattern applies to three-dimensional objects, with the orientation step doing more work. For a sculpture, name where the visitor is standing in relation to it, which side faces them, and how big it is in relation to a familiar object. For an installation, name the room or floor, the walls or surfaces involved, and the position of the visitor as they enter. For a heritage interior, name the room shape and ceiling height before describing furnishings. None of this is exotic; it is the missing information a sighted visitor's eye supplies in the first second of looking.

Length, pacing, and layered descriptions

How long a description should be is the question that comes up first and is hardest to answer in the abstract. Research on what blind and low-vision visitors actually use is the most useful guide. A 2023 study summarised in the National Library of Medicine found that participants strongly rejected one-sentence descriptions and preferred layered descriptions that combined spatial and thematic information in enough detail to build a mental model of the work. The same study found that consistent spatial language helped listeners build coherent mental maps of the image.

The practical implication is layering rather than length. A useful museum stop on an audio-described tour has a short overview the visitor can decide to stop at, a fuller composition pass for the visitor who wants more, and optional extended commentary for the visitor who wants the most. On a hardware audio guide, the layering can be three separate tracks selectable by keypad. On a tablet tour, the layered tracks can sit under named buttons. Either way, the structure lets the visitor choose depth rather than forcing every visitor through the longest version.

As a planning baseline, in Look2Innovate proposals an audio-described stop typically runs 30–60 seconds longer than the equivalent standard-tour stop, with a 3-minute cap per stop unless the work justifies more. Visitors do not stand still longer than that without a reason, and a description that runs long discourages the visitor from continuing to the next stop. Pace the voicing slightly slower than the standard track and place natural pauses at the end of each sentence so the visitor's mental image can catch up. For the underlying scripting framework, see museum audio guide script best practices.

Words: colour, materials, spatial language

Colour

Cooper Hewitt's guidance is to use familiar colour names first, then explain less common ones using familiar terms. Red, blue, yellow, green, orange, purple, violet, pink, brown, gold, silver, black and white do most of the work. "Ochre" can become "a yellow-brown the colour of dry mustard." "Vermilion" can become "a bright orange-red." Use the painter's term once if it is part of the work's identity, then translate. Avoid hex codes and named pigments without translation, and avoid skin-tone terms based on ethnic background; describe what is visible ("a light-skinned figure," "a dark-skinned figure") rather than inferring identity.

Materials and texture

Materials carry meaning that colour cannot. A marble sculpture and a plaster cast can look similar from a distance and read very differently up close. Describe the material directly ("polished white marble," "cast bronze with a green patina," "oil on linen canvas") and the texture where it is part of the meaning ("the paint stands in thick ridges around the head," "the surface is worn smooth where centuries of visitors have touched it"). For accessible touch tours, materials descriptions can also prepare the visitor for what they will feel.

Spatial language

Use the visitor's body as the reference point. "In front of you," "to your left," "behind you" are usable; "to the north," "on the east wall" are not. Inside the work itself, use top, bottom, left, right, foreground and background consistently. Cooper Hewitt's guidance is to read complex images in a single direction; museum description can borrow the same rule for objects. When in doubt, pick top-to-bottom for paintings and architecture, foreground-to-background for sculpture and interiors, and stick to the pattern.

Words and phrases to avoid

Avoid "obviously," "as you can see," and any phrase that assumes sight. Avoid "the artist depicts a sense of" constructions; describe what is on the canvas. Avoid scale words without a reference ("a large painting" on its own; "a painting roughly two metres wide, taller than a standing visitor," is usable). Avoid hedging like "a kind of" or "sort of"; describe what is actually visible. Avoid jargon and catalogue Latin unless you translate it on first use.

Voicing, review, and testing with the community

A description that reads well on the page often does not voice well in the gallery. Read every script aloud at the writing stage, time it, and revise anywhere the sentence has to be hurried to fit the breath. Record a single narrator per audio-described track unless there is a strong editorial reason to switch; consistent voicing helps the visitor stay oriented. The ADP's stated preference is human narration over synthetic voice for finished publication, with synthetic narration acceptable as a baseline where budget or scale rule out studio recording.

Review with a blind reader is the step most museums under-invest in. Research published in the journal Museum Management and Curatorship found that audio description developed with blind audience involvement also improved how sighted audiences remembered the artworks afterwards, an effect the authors call "guided looking." The implication for museums is direct: testing a tour with blind users is not a courtesy step, it is the editing pass that improves the tour for every audience.

Where AI is used to produce a first draft, the rule of thumb in Look2Innovate workflows is that an editor and ideally a blind reader review every stop before publication. AI Content Studio can generate a first audio description per stop from object photography, gallery notes and the standard script, in the museum's chosen voice and language. That produces a usable draft faster, but it does not replace the editor's judgement on what carries the meaning of the work, and it does not replace the blind-reader test that catches the prompts no sighted editor would notice were missing.

FAQ

What is museum audio description?

Museum audio description is a separate audio track that describes the visible parts of an object so a blind or low-vision visitor can build a mental image of it before any interpretation. The Audio Description Project defines it as concise, objective descriptions of visual components crucial to understanding the creator's intention. Most museums run audio description as a tour variant on the standard fleet, with the same stop numbers as the standard tour.

What makes a good museum audio description?

Three things, taken from the Audio Description Project framework: active observation that goes beyond labelling the object, careful editing that selects the visual facts that carry meaning, and clear specific present-tense language reported in a consistent direction. The discipline is to describe what is visible before interpreting it, and to let the visitor draw the inferences rather than naming the emotion or meaning for them.

How long should a museum audio description be?

Length should be layered rather than fixed. Research on what blind and low-vision visitors actually use shows they reject one-sentence descriptions and prefer enough detail to build a mental model, but the layered structure should let visitors pick the depth they want. A practical baseline is a 30–60 second extension beyond the standard tour stop, with a 3-minute cap unless the work justifies more, and optional extended commentary for visitors who want it.

Should AI write or voice museum audio description?

AI can produce a usable first draft per stop from object photography and the existing standard script, which is helpful for large fleets where studio writing every description is not possible. The Audio Description Project's stated preference is human narration for finished publication because it carries expressiveness and clarity; synthetic narration is acceptable as a baseline where budget rules out studio recording. Whichever production path is chosen, a museum editor and ideally a blind reader should review every stop before publication.

Does audio description benefit sighted visitors too?

Yes. Research published in Museum Management and Curatorship found that audio description developed with blind audience involvement improved how sighted audiences remembered the artworks afterwards, an effect the authors call guided looking. The discipline of describing the visible facts of a work before interpreting it tends to slow visitors down and point them at details they would otherwise walk past, which is useful for any audience.

What is the difference between audio description and a standard audio guide?

A standard audio guide assumes the visitor is looking at the work and moves quickly to interpretation: artist, date, meaning, context. An audio-described tour starts earlier and gives the visitor the visible facts of the work first: composition, colour, scale, materials, the position of figures, the direction of light. Interpretation still appears, but only after the visitor has enough description to picture the work. The two are usually run as parallel tracks with the same stop numbers.

How do you write audio description for abstract or contemporary art?

The principles do not change, but the vocabulary shifts. Describe the dominant shapes, the colour palette, the materials and surface, the scale relative to the visitor, and the position of the work in the room. Read in a consistent direction. Resist naming what the work means; if a Mark Rothko reads as two large rectangles of soft red and orange floating on a darker red ground, that is the description. The interpretation, if the museum wants to add one, sits after the description, not in place of it.