Audio guide planning
Museum audio guide content production: voiceover, translation and AI
How museums take an audio guide script from approved draft to device-ready audio: recording briefs, voice casting, multilingual translation, AI-generated narration, format requirements and CMS sync.
A finished script is the start of production, not the end. The stop a visitor hears has been recorded by a narrator, reviewed against the exhibit, translated into each active language, mixed if music or ambience is used, exported in the correct format, and synced to the device fleet through the CMS. Each stage can become a bottleneck. In Look2Innovate proposal planning, multilingual tours are treated as multi-stage productions: translation, recording and quality review need clear owners and approval gates before the first recording session is booked.
This guide is for museum content editors, exhibition producers, operations managers and procurement teams managing audio guide production for the first time or reviewing an existing workflow. It covers the full pipeline from approved script to device-ready audio, including voiceover recording briefs, voice casting, translation, AI-generated narration, format requirements and the CMS sync stages that load content onto devices. For the earlier stage of writing the script itself, see museum audio guide script best practices.
The production pipeline from script to device
Audio guide content production has seven stages. Not every project uses all seven, but every project should account for each one before the first recording session is booked.
| Stage | What happens | Main decision | Planning note |
|---|---|---|---|
| Script approval | Final review by curator, legal, and content lead | Which stops are locked; which can be revised post-recording | Allow a formal sign-off window before recording begins |
| Voice casting | Selecting narrator style, gender, tone and language match | Professional voice talent versus AI narration per language | Build in time for samples, review and replacement choices |
| Recording | Studio session or remote recording with a direction brief | One session or multiple passes; who attends for each language | Schedule around narrator availability and review attendance |
| Translation | Source-language master translated into each active language | Professional translator versus machine translation with review | Start from approved text and check spoken duration before recording |
| Audio editing and mixing | Cleaning, levelling, adding music beds or ambience if specified | Whether music rights and stems are cleared before recording | Do not begin final mix until rights and stems are ready |
| Quality review | Gallery playback test; curator sign-off; format check | Who has authority to approve each language version | Reserve review time on the actual device and route |
| CMS upload and device sync | Files loaded to the CMS and pushed to devices via chargers | Whether sync is automatic through charger docks or manual | Confirm sync status before the next visitor handout |
The most common cause of delay is a late script change after recording has started. Re-recording even a single stop in one language costs a studio booking, a voice talent session fee, and several days of scheduling. Museums that build a final-script sign-off gate before the first recording session avoid that cost reliably. The second most common cause is translation batching: waiting until all source-language audio is approved before starting translation. Translation can begin from the approved script text while source recording is in progress.
Voiceover recording: casting, briefing and studio direction
Casting the right narrator for each language
Voice casting should match the tone of the collection, not simply the language. A contemporary art museum and a natural history museum need different narrators even in the same language. The brief should specify pace, sentence emphasis style, the required stop durations, and whether the narrator should read the audio description variant in a different register from the standard tour.
For multilingual fleets, the temptation is to use the same voice agency for all languages and let the agency source each narrator. That works for common European languages. For languages with fewer professional museum narrators available, plan extra lead time. A narrator who is wrong for the collection tone will require re-recording, which is more expensive than a longer casting process.
The recording brief
A recording brief for museum audio guide narration should cover: target stop duration, words per minute, pause positions (marked in the script), how proper nouns and artist names are pronounced, which words carry the principal emphasis in each sentence, whether sound effects or music beds will be mixed under the narration (so the narrator does not fill pauses they should leave empty), and who will approve the first take before the full session continues. Sending the brief in advance of the session, not at the start of it, is the single cheapest improvement most productions can make.
Remote recording versus studio sessions
Remote recording is now standard practice for many language versions. A narrator records at a home or local studio, files are delivered as WAV, and the production team edits and mixes centrally. Remote recording is faster to schedule and removes travel costs, but requires the narrator to have a professional acoustic setup. For the primary language version, where a content director is likely to attend and give real-time direction, an in-studio session usually produces a more consistent result on the first pass. For additional languages where the script is fixed and the direction brief is detailed, remote delivery is often the practical choice.
Translation and multilingual content production
Translation for museum audio guides is a specialist discipline. A general translator who is not experienced with spoken audio will produce text that reads well but records awkwardly: sentences that expand past the target stop duration in the target language, clauses that land the emphasis on the wrong word when read aloud, or cultural references that require an explanation the stop does not have room for. Museum audio guide translation briefs should specify target audio duration, not only target word count, and should ask translators to flag any stops where the source script cannot be translated at pace without losing meaning.
| Source language | Target language | Planning risk | Planning adjustment |
|---|---|---|---|
| English | French | Often longer when read aloud | Reduce source script to lower word-count bound before translating |
| English | German | Can run longer than the source | Review stop durations in German before recording |
| English | Spanish | May expand past the planned stop duration | Usually manageable within the target stop duration |
| English | Japanese | Duration may stay close while formatting changes | Duration usually within tolerance; check character-based formatting |
| English | Arabic | May expand and requires interface review | Right-to-left interface labels need separate review |
| English | Chinese (Simplified) | May be shorter than the source | Often shorter than the source; check that meaning is complete |
Machine translation with human review
Machine translation has improved substantially for common language pairs and is a reasonable starting point for translation drafts. It is not a final product for museum audio guides. Machine translation misses idiomatic phrasing, cultural context, and the difference between a sentence that scans on paper and one that lands cleanly when spoken. The appropriate workflow is machine-translated draft, reviewed and edited by a native-speaking translator with audio guide experience, then read aloud by the reviewer before passing to the narrator. This combines machine speed on the first pass with human judgement on the result.
Translation and intellectual-property rights
Translated tour scripts are derivative works of the source scripts. The museum should confirm before production starts who holds the rights to the source script, whether those rights extend to translation and adaptation, and whether the translator's work belongs to the museum or the supplier. This is particularly relevant when the original script was written by a freelance curator or an external content agency. Rights disputes that surface after recording are expensive to resolve.
AI-generated narration: where it fits and where it does not
AI-generated voice narration is now practical for some museum audio guide workflows. It can reduce cost and lead time for language variants, single-stop updates and branching content. It is weaker where the tour depends on performance, emotional timing or a recognisable narrator. The procurement decision should name the use case before choosing AI, human recording, or a mix of both.
| Use case | AI narration | Human narration | Reason |
|---|---|---|---|
| Large multilingual fleet with tail languages | Suitable | Optional for primary languages | AI covers tail languages affordably; human covers the main visitor languages with higher quality |
| Primary language, flagship permanent collection | Not recommended as sole method | Recommended | Tone, pause placement, and natural emphasis still distinguish professional narration at the highest visitor-volume stops |
| Rapid content update: single stop change | Suitable | Slower and costlier for a single stop | AI narration can regenerate one stop the same day a script change is approved |
| Audio description for blind visitors | Suitable with review | Recommended for final version | Audio description scripts are highly specific; a human reviewer should confirm the pacing and descriptive accuracy before publication |
| Simplified-language or children's tour | Suitable with review | Optional | AI can match a slower, clearer register; the museum should review that tone is appropriate for the intended audience |
| Interactive or question-based tours | Suitable | Optional | AI can make many response variants easier to produce; human narration may still be chosen for important scripted paths |
In Look2Innovate deployments, AI Content Studio supports script drafting, narration generation and iteration across multiple languages and tour variants. The museum's content team reviews and approves before publication. AI generation does not remove the approval step; it reduces the time and cost between first draft and a reviewable recording.
Format, export and quality checks before upload
Audio format requirements
Most museum audio guide hardware requires mono MP3 or WAV files. Stereo files use twice the storage without a meaningful audio benefit through a single earpiece. A typical specification is 44.1 kHz sample rate, 128 kbps MP3, mono. Check the device specification before post-production begins: mixing to stereo and then converting to mono can introduce phase cancellation artefacts if the tracks were recorded with any spatial processing. For tablet-based guides that support headphone-out plus speakers, stereo files may be appropriate for specific immersive stops.
File naming and folder structure
Audio files should be named consistently before upload: stop number, language code, tour variant, and version number. A naming convention such as 04_EN_standard_v2.mp3 makes it possible to confirm at a glance which stops have been delivered in which languages and whether the correct version is in the upload folder. Files with inconsistent names or with spaces in the filename cause preventable CMS import errors. Establish the naming convention at the start of production, not at the upload stage.
Quality review before CMS upload
Quality review for museum audio guide content should happen in the gallery, not in a meeting room or post-production studio. Play each stop through the actual device and headset in the actual exhibit space, with the room at representative occupancy. Check that the narration is intelligible over ambient noise, that the stop duration fits the exhibit dwell time, that the cue-out lands at the right moment, and that any pronunciation of artist names, locations or technical terms has been confirmed by the curator. A studio playback that sounds clean can still fail in a stone-floored gallery with school groups passing through.
CMS upload and device sync
Once audio files pass quality review, they move into the CMS. In a well-configured workflow, the museum uploads approved files, assigns them to stop numbers and tour variants, and the system pushes updates to devices the next time they are docked to a charging rack. In Look2Innovate deployments, Look2Guide CMS and Smart Charger handle this automatically: devices sync content when they return to the dock at the end of the day or during a mid-day charging window, so no manual update of individual devices is required.
Confirming that the correct content is on devices
After a content update, the museum should confirm that the new version has reached the fleet before the next visitor session. The CMS should show sync status per device, or per device group, so operations staff can see at a glance whether any device missed the update and needs to return to a charger before handout. Checking two or three devices manually by playing the updated stop is faster than trusting that a sync completed correctly without verification.
Updating one stop without replacing the whole tour
Partial updates reduce the amount of content staff have to reopen when one stop changes. A museum should be able to change the script for a single stop, record or generate new audio, upload the file, and push the update to the fleet without replacing or re-approving content for unaffected stops. This makes it practical to correct a factual error quickly, add a temporary stop for a visiting work, or remove a stop when an exhibit is moved to storage. If the CMS requires a full tour republish every time a single file changes, routine corrections become slower.
Updating content after launch
Audio guide content does not stay current on its own. New scholarship, rotating loans, signage changes, new accessible-route requirements, and visitor feedback all create valid reasons to update stops after launch. The museum should assign a named content owner before launch, not after. That person is responsible for reviewing stops on a defined cycle, identifying outdated content, commissioning updates, and managing the approval and sync workflow.
| Trigger | Typical urgency | What to update |
|---|---|---|
| Factual error reported by a curator or visitor | Immediate | Affected stop in all languages where the error appears |
| Loan return: exhibit removed from the route | Before next visitor session | Remove or suppress the stop; update any adjacent wayfinding cues |
| New loan or acquisition added to the route | Before the exhibit opens | New stop in all active languages; audio description if the accessible track is maintained |
| New language version commissioned | By agreed launch date | Full tour in the new language; verify stop numbers match the existing fleet |
| Visitor analytics showing low completion on a specific stop | Within the season | Review the script for the affected stop; consider shortening or re-recording |
| Accessibility update: new audio description variant | Coordinated with accessibility review cycle | Audio description stops in all languages where the accessible tour is offered |
For large permanent collections, a rolling content review cycle of twelve to eighteen months is practical: review a portion of the stops each quarter rather than scheduling a full tour re-record at irregular intervals. Visitor Statistics can show which stops have low listen-through rates, which languages are most used, and whether content updates reached the fleet, so the review cycle is data-driven rather than based on assumptions about which stops visitors find most useful.
FAQ
How long does audio guide content production take?
It depends on the number of stops, languages, review rounds and recording method. In Look2Innovate proposal planning, a single-language tour is treated as a multi-week production once recording, editing, quality review and device sync are included. A multilingual tour needs more time if translation and recording are handled one language after another. Starting translation from approved text, recording the primary language in parallel, and using AI narration for tail languages can shorten the schedule.
How much does museum audio guide voiceover recording cost?
Cost depends on narrator experience, studio setup, direction time, language count, editing, quality review and any music or ambience rights. Remote recording is usually cheaper to schedule than an attended studio session, but it still requires a professional acoustic setup and review. AI narration can reduce the recording cost for tail languages and small updates, at the cost of some naturalness in pace and emphasis. Ask vendors to separate voice talent, studio, editing, translation, review and rights costs in the quote.
Can AI replace professional voiceover for museum audio guides?
AI narration is practical for specific use cases: large multilingual fleets where cost makes human recording for every language prohibitive, rapid single-stop updates, interactive branching content, and simplified or children's tour variants. For the primary language version of a flagship permanent collection tour, professional human narration remains the stronger choice because tone, natural emphasis and pause placement still distinguish a high-quality production at high-traffic stops. The most practical approach is human narration for primary visitor languages combined with AI narration for additional language versions.
What audio format should museum audio guides use?
Most museum audio guide hardware requires mono MP3 or WAV files. A typical specification is 44.1 kHz sample rate, 128 kbps MP3, mono. Stereo files use twice the storage without a meaningful quality gain through a single earpiece. Confirm the exact format requirement with the hardware supplier before post-production begins, and establish a consistent file-naming convention that includes stop number, language code, tour variant and version number before the first upload.
How do you produce audio description for blind museum visitors?
Audio description for blind and low-vision museum visitors is a separate script written alongside the standard tour, using the same stop numbers. The description should explain what the sighted visitor sees: composition, scale, materials, colour, spatial relationships and any text visible on or near the object. It is recorded as a separate audio file assigned to the audio description tour variant in the CMS. The museum should use the same narrator as the standard tour where possible, or brief a second narrator to match pace and register. For practical guidance on writing and producing audio description, see the full accessible audio guide article.
How often should museum audio guide content be updated?
Museum audio guide content should be reviewed on a defined cycle rather than only when problems are reported. A rolling review of twelve to eighteen months works for large permanent collections: update a portion of stops each quarter rather than scheduling full re-records at irregular intervals. Content should be updated immediately when a factual error is confirmed, when an exhibit leaves or enters the route, or when visitor analytics show a consistently low listen-through rate on a specific stop.
Who owns the rights to a translated museum audio guide script?
Rights ownership depends on the contracts in place. The museum should confirm before production starts who holds the rights to the source script, whether those rights extend to translation and adaptation, and whether the translator's work belongs to the museum or the production supplier. When the original script was written by a freelance curator or an external content agency, confirm in writing that the translation rights are cleared before booking recording sessions. Rights disputes that surface after recording are expensive to resolve and can delay a launch.



