Audio guide planning

Museum audio guide content production: voiceover, translation and AI

June 23, 2026

How museums take an audio guide script from approved draft to device-ready audio: recording briefs, voice casting, multilingual translation, AI-generated narration, format requirements and CMS sync.

A finished script is the start of production, not the end. The stop a visitor hears has been recorded by a narrator, reviewed against the exhibit, translated into each active language, mixed if music or ambience is used, exported in the correct format, and synced to the device fleet through the CMS. Each stage can become a bottleneck. In Look2Innovate proposal planning, multilingual tours are treated as multi-stage productions: translation, recording and quality review need clear owners and approval gates before the first recording session is booked.

This guide is for museum content editors, exhibition producers, operations managers and procurement teams managing audio guide production for the first time or reviewing an existing workflow. It covers the full pipeline from approved script to device-ready audio, including voiceover recording briefs, voice casting, translation, AI-generated narration, format requirements and the CMS sync stages that load content onto devices. For the earlier stage of writing the script itself, see museum audio guide script best practices.

The production pipeline from script to device

Audio guide content production has seven stages. Not every project uses all seven, but every project should account for each one before the first recording session is booked.

Typical production stages and the decision that controls each one.
Stage	What happens	Main decision	Planning note
Script approval	Final review by curator, legal, and content lead	Which stops are locked; which can be revised post-recording	Allow a formal sign-off window before recording begins
Voice casting	Selecting narrator style, gender, tone and language match	Professional voice talent versus AI narration per language	Build in time for samples, review and replacement choices
Recording	Studio session or remote recording with a direction brief	One session or multiple passes; who attends for each language	Schedule around narrator availability and review attendance
Translation	Source-language master translated into each active language	Professional translator versus machine translation with review	Start from approved text and check spoken duration before recording
Audio editing and mixing	Cleaning, levelling, adding music beds or ambience if specified	Whether music rights and stems are cleared before recording	Do not begin final mix until rights and stems are ready
Quality review	Gallery playback test; curator sign-off; format check	Who has authority to approve each language version	Reserve review time on the actual device and route
CMS upload and device sync	Files loaded to the CMS and pushed to devices via chargers	Whether sync is automatic through charger docks or manual	Confirm sync status before the next visitor handout

The most common cause of delay is a late script change after recording has started. Re-recording even a single stop in one language costs a studio booking, a voice talent session fee, and several days of scheduling. Museums that build a final-script sign-off gate before the first recording session avoid that cost reliably. The second most common cause is translation batching: waiting until all source-language audio is approved before starting translation. Translation can begin from the approved script text while source recording is in progress.

Voiceover recording: casting, briefing and studio direction

Casting the right narrator for each language

Voice casting should match the tone of the collection, not simply the language. A contemporary art museum and a natural history museum need different narrators even in the same language. The brief should specify pace, sentence emphasis style, the required stop durations, and whether the narrator should read the audio description variant in a different register from the standard tour.

For multilingual fleets, the temptation is to use the same voice agency for all languages and let the agency source each narrator. That works for common European languages. For languages with fewer professional museum narrators available, plan extra lead time. A narrator who is wrong for the collection tone will require re-recording, which is more expensive than a longer casting process.

The recording brief

A recording brief for museum audio guide narration should cover: target stop duration, words per minute, pause positions (marked in the script), how proper nouns and artist names are pronounced, which words carry the principal emphasis in each sentence, whether sound effects or music beds will be mixed under the narration (so the narrator does not fill pauses they should leave empty), and who will approve the first take before the full session continues. Sending the brief in advance of the session, not at the start of it, is the single cheapest improvement most productions can make.

Remote recording versus studio sessions

Remote recording is now standard practice for many language versions. A narrator records at a home or local studio, files are delivered as WAV, and the production team edits and mixes centrally. Remote recording is faster to schedule and removes travel costs, but requires the narrator to have a professional acoustic setup. For the primary language version, where a content director is likely to attend and give real-time direction, an in-studio session usually produces a more consistent result on the first pass. For additional languages where the script is fixed and the direction brief is detailed, remote delivery is often the practical choice.

Translation and multilingual content production

Translation for museum audio guides is a specialist discipline. A general translator who is not experienced with spoken audio will produce text that reads well but records awkwardly: sentences that expand past the target stop duration in the target language, clauses that land the emphasis on the wrong word when read aloud, or cultural references that require an explanation the stop does not have room for. Museum audio guide translation briefs should specify target audio duration, not only target word count, and should ask translators to flag any stops where the source script cannot be translated at pace without losing meaning.

Translation planning risks by target language.
Source language	Target language	Planning risk	Planning adjustment
English	French	Often longer when read aloud	Reduce source script to lower word-count bound before translating
English	German	Can run longer than the source	Review stop durations in German before recording
English	Spanish	May expand past the planned stop duration	Usually manageable within the target stop duration
English	Japanese	Duration may stay close while formatting changes	Duration usually within tolerance; check character-based formatting
English	Arabic	May expand and requires interface review	Right-to-left interface labels need separate review
English	Chinese (Simplified)	May be shorter than the source	Often shorter than the source; check that meaning is complete

Machine translation with human review

Machine translation has improved substantially for common language pairs and is a reasonable starting point for translation drafts. It is not a final product for museum audio guides. Machine translation misses idiomatic phrasing, cultural context, and the difference between a sentence that scans on paper and one that lands cleanly when spoken. The appropriate workflow is machine-translated draft, reviewed and edited by a native-speaking translator with audio guide experience, then read aloud by the reviewer before passing to the narrator. This combines machine speed on the first pass with human judgement on the result.

Translation and intellectual-property rights

Translated tour scripts are derivative works of the source scripts. The museum should confirm before production starts who holds the rights to the source script, whether those rights extend to translation and adaptation, and whether the translator's work belongs to the museum or the supplier. This is particularly relevant when the original script was written by a freelance curator or an external content agency. Rights disputes that surface after recording are expensive to resolve.

AI-generated narration: where it fits and where it does not

AI-generated voice narration is now practical for some museum audio guide workflows. It can reduce cost and lead time for language variants, single-stop updates and branching content. It is weaker where the tour depends on performance, emotional timing or a recognisable narrator. The procurement decision should name the use case before choosing AI, human recording, or a mix of both.

Matching the narration method to the use case.
Use case	AI narration	Human narration	Reason
Large multilingual fleet with tail languages	Suitable	Optional for primary languages	AI covers tail languages affordably; human covers the main visitor languages with higher quality
Primary language, flagship permanent collection	Not recommended as sole method	Recommended	Tone, pause placement, and natural emphasis still distinguish professional narration at the highest visitor-volume stops
Rapid content update: single stop change	Suitable	Slower and costlier for a single stop	AI narration can regenerate one stop the same day a script change is approved
Audio description for blind visitors	Suitable with review	Recommended for final version	Audio description scripts are highly specific; a human reviewer should confirm the pacing and descriptive accuracy before publication
Simplified-language or children's tour	Suitable with review	Optional	AI can match a slower, clearer register; the museum should review that tone is appropriate for the intended audience
Interactive or question-based tours	Suitable	Optional	AI can make many response variants easier to produce; human narration may still be chosen for important scripted paths

In Look2Innovate deployments, AI Content Studio supports script drafting, narration generation and iteration across multiple languages and tour variants. The museum's content team reviews and approves before publication. AI generation does not remove the approval step; it reduces the time and cost between first draft and a reviewable recording.

Format, export and quality checks before upload

Audio format requirements

Most museum audio guide hardware requires mono MP3 or WAV files. Stereo files use twice the storage without a meaningful audio benefit through a single earpiece. A typical specification is 44.1 kHz sample rate, 128 kbps MP3, mono. Check the device specification before post-production begins: mixing to stereo and then converting to mono can introduce phase cancellation artefacts if the tracks were recorded with any spatial processing. For tablet-based guides that support headphone-out plus speakers, stereo files may be appropriate for specific immersive stops.

File naming and folder structure

Audio files should be named consistently before upload: stop number, language code, tour variant, and version number. A naming convention such as 04_EN_standard_v2.mp3 makes it possible to confirm at a glance which stops have been delivered in which languages and whether the correct version is in the upload folder. Files with inconsistent names or with spaces in the filename cause preventable CMS import errors. Establish the naming convention at the start of production, not at the upload stage.

Quality review before CMS upload

Quality review for museum audio guide content should happen in the gallery, not in a meeting room or post-production studio. Play each stop through the actual device and headset in the actual exhibit space, with the room at representative occupancy. Check that the narration is intelligible over ambient noise, that the stop duration fits the exhibit dwell time, that the cue-out lands at the right moment, and that any pronunciation of artist names, locations or technical terms has been confirmed by the curator. A studio playback that sounds clean can still fail in a stone-floored gallery with school groups passing through.

CMS upload and device sync

Once audio files pass quality review, they move into the CMS. In a well-configured workflow, the museum uploads approved files, assigns them to stop numbers and tour variants, and the system pushes updates to devices the next time they are docked to a charging rack. In Look2Innovate deployments, Look2Guide CMS and Smart Charger handle this automatically: devices sync content when they return to the dock at the end of the day or during a mid-day charging window, so no manual update of individual devices is required.

Confirming that the correct content is on devices

After a content update, the museum should confirm that the new version has reached the fleet before the next visitor session. The CMS should show sync status per device, or per device group, so operations staff can see at a glance whether any device missed the update and needs to return to a charger before handout. Checking two or three devices manually by playing the updated stop is faster than trusting that a sync completed correctly without verification.

Updating one stop without replacing the whole tour

Partial updates reduce the amount of content staff have to reopen when one stop changes. A museum should be able to change the script for a single stop, record or generate new audio, upload the file, and push the update to the fleet without replacing or re-approving content for unaffected stops. This makes it practical to correct a factual error quickly, add a temporary stop for a visiting work, or remove a stop when an exhibit is moved to storage. If the CMS requires a full tour republish every time a single file changes, routine corrections become slower.

Updating content after launch

Audio guide content does not stay current on its own. New scholarship, rotating loans, signage changes, new accessible-route requirements, and visitor feedback all create valid reasons to update stops after launch. The museum should assign a named content owner before launch, not after. That person is responsible for reviewing stops on a defined cycle, identifying outdated content, commissioning updates, and managing the approval and sync workflow.

Typical triggers for post-launch content updates.
Trigger	Typical urgency	What to update
Factual error reported by a curator or visitor	Immediate	Affected stop in all languages where the error appears
Loan return: exhibit removed from the route	Before next visitor session	Remove or suppress the stop; update any adjacent wayfinding cues
New loan or acquisition added to the route	Before the exhibit opens	New stop in all active languages; audio description if the accessible track is maintained
New language version commissioned	By agreed launch date	Full tour in the new language; verify stop numbers match the existing fleet
Visitor analytics showing low completion on a specific stop	Within the season	Review the script for the affected stop; consider shortening or re-recording
Accessibility update: new audio description variant	Coordinated with accessibility review cycle	Audio description stops in all languages where the accessible tour is offered

For large permanent collections, a rolling content review cycle of twelve to eighteen months is practical: review a portion of the stops each quarter rather than scheduling a full tour re-record at irregular intervals. Visitor Statistics can show which stops have low listen-through rates, which languages are most used, and whether content updates reached the fleet, so the review cycle is data-driven rather than based on assumptions about which stops visitors find most useful.

FAQ

How long does audio guide content production take?

It depends on the number of stops, languages, review rounds and recording method. In Look2Innovate proposal planning, a single-language tour is treated as a multi-week production once recording, editing, quality review and device sync are included. A multilingual tour needs more time if translation and recording are handled one language after another. Starting translation from approved text, recording the primary language in parallel, and using AI narration for tail languages can shorten the schedule.

How much does museum audio guide voiceover recording cost?

Cost depends on narrator experience, studio setup, direction time, language count, editing, quality review and any music or ambience rights. Remote recording is usually cheaper to schedule than an attended studio session, but it still requires a professional acoustic setup and review. AI narration can reduce the recording cost for tail languages and small updates, at the cost of some naturalness in pace and emphasis. Ask vendors to separate voice talent, studio, editing, translation, review and rights costs in the quote.

Can AI replace professional voiceover for museum audio guides?

AI narration is practical for specific use cases: large multilingual fleets where cost makes human recording for every language prohibitive, rapid single-stop updates, interactive branching content, and simplified or children's tour variants. For the primary language version of a flagship permanent collection tour, professional human narration remains the stronger choice because tone, natural emphasis and pause placement still distinguish a high-quality production at high-traffic stops. The most practical approach is human narration for primary visitor languages combined with AI narration for additional language versions.

What audio format should museum audio guides use?

Most museum audio guide hardware requires mono MP3 or WAV files. A typical specification is 44.1 kHz sample rate, 128 kbps MP3, mono. Stereo files use twice the storage without a meaningful quality gain through a single earpiece. Confirm the exact format requirement with the hardware supplier before post-production begins, and establish a consistent file-naming convention that includes stop number, language code, tour variant and version number before the first upload.

How do you produce audio description for blind museum visitors?

Audio description for blind and low-vision museum visitors is a separate script written alongside the standard tour, using the same stop numbers. The description should explain what the sighted visitor sees: composition, scale, materials, colour, spatial relationships and any text visible on or near the object. It is recorded as a separate audio file assigned to the audio description tour variant in the CMS. The museum should use the same narrator as the standard tour where possible, or brief a second narrator to match pace and register. For practical guidance on writing and producing audio description, see the full accessible audio guide article.

How often should museum audio guide content be updated?

Museum audio guide content should be reviewed on a defined cycle rather than only when problems are reported. A rolling review of twelve to eighteen months works for large permanent collections: update a portion of stops each quarter rather than scheduling full re-records at irregular intervals. Content should be updated immediately when a factual error is confirmed, when an exhibit leaves or enters the route, or when visitor analytics show a consistently low listen-through rate on a specific stop.

Who owns the rights to a translated museum audio guide script?

Rights ownership depends on the contracts in place. The museum should confirm before production starts who holds the rights to the source script, whether those rights extend to translation and adaptation, and whether the translator's work belongs to the museum or the production supplier. When the original script was written by a freelance curator or an external content agency, confirm in writing that the translation rights are cleared before booking recording sessions. Rights disputes that surface after recording are expensive to resolve and can delay a launch.