Skip to content

Hearing

Audio can make interfaces richer and more intuitive—but it’s never the only channel. Over 1.5 billion people worldwide live with some form of hearing disability. Many more are in noisy environments where audio is inaudible, or quiet ones where sound would be disruptive. Designing for hearing means making audio helpful when available while ensuring nothing depends on it.

Sound in UX is shifting from standalone notifications to fully integrated design systems. As technology moves beyond screens into AR, VR, and wearable interfaces, spatial audio and 3D sound will play increasingly important roles. But the foundational principle remains: audio must complement, never replace, other channels.

Never make audio the only way to convey information. Every sound should have a visual (or haptic) equivalent. Every video with speech needs captions. Every podcast needs a transcript.

This isn’t just about accessibility compliance—it’s about designing for the reality of how people use technology across diverse contexts.

Hearing ability exists on a spectrum:

Deaf: Complete or near-complete hearing loss

  • About 1 million people in the U.S. are functionally deaf
  • Many use sign language (about 500,000 in the U.S.)
  • Visual-first communication is essential

Hard of hearing: Partial hearing loss

  • Nearly 30 million Americans need hearing aids
  • May hear some sounds but miss others
  • Benefit from amplification and captioning

Tinnitus: Ringing or buzzing in ears

  • Affects 50 million Americans
  • Certain frequencies may be painful or masked
  • Need volume control and frequency considerations

Age-related hearing loss (presbycusis):

  • Affects most people to some degree as they age
  • Typically high frequencies lost first
  • 15% of people aged 6-19 have substantial hearing issues

Temporary or situational hearing limitations:

  • Ear infections, congestion
  • Noise-induced temporary threshold shift
  • Using devices in noisy environments

People with hearing disabilities have diverse communication preferences:

Captions and transcripts: Most commonly preferred

  • Works for people who became deaf later in life
  • Doesn’t require learning sign language
  • Universally understood by all literacy levels

Sign language: Used by about 0.5% of hearing-impaired population

  • Primary language for culturally Deaf community
  • Different sign languages exist (ASL, BSL, etc.)
  • Video sign interpretation for critical content

Real-Time Text (RTT): Messages appear as typed

  • Eliminates delays waiting for “send”
  • Crucial for emergency communication
  • Live conversation support

Design implication: Provide multiple alternatives. Don’t assume all deaf users know sign language—most don’t, especially those with later-onset hearing loss.

Always provide comprehensive audio controls:

Essential controls:

  • Volume adjustment for all audio content
  • Mute option that persists across sessions
  • Clear indication of current sound state
  • Independent volume for different sound types (effects, music, voice)

Control placement:

  • Accessible without starting audio playback
  • Visible and easy to find
  • Works with keyboard and screen readers
  • Remembers user preferences

Audio that plays automatically is jarring—especially with sound effects or speech.

WCAG requirement (1.4.2 Audio Control): Any audio playing for more than 3 seconds must be pausable, stoppable, or have independent volume control.

Best practice: Don’t autoplay audio at all. If you must:

  • Start muted by default
  • Show clear visual indication that audio is available
  • Provide immediate, obvious controls
  • Remember user’s mute preference

Why this matters: Unexpected audio can cause embarrassment (in quiet offices), startle responses (for anxious users), or complete inaccessibility (for screen reader users whose audio is interrupted).

Every auditory cue needs a visual counterpart:

Sound effects and feedback:

  • Success sounds → checkmark animation, green flash
  • Error sounds → shake animation, red indicator
  • Notification sounds → badge count, banner, icon change
  • Progress sounds → progress bar, percentage display

Alarms and alerts:

  • Critical alerts → visual banner, screen flash, vibration
  • Timers → visual countdown, screen notification
  • System alerts → persistent visual indicator

Status and ambient sounds:

  • Loading/processing → spinner, progress bar
  • Connection status → icon changes
  • Mode changes → visual mode indicators

WCAG requirement (1.2.2 Captions): All prerecorded video with audio must have synchronized captions.

Quality captions include:

  • Accurate transcription of all speech
  • Speaker identification when relevant
  • Synchronized timing with audio
  • Non-speech sounds described [door slams], [music plays]
  • Tone and manner cues [sarcastically], [whispering]

Caption best practices:

  • Professional review, not auto-generated only
  • 1-2 lines maximum at a time
  • Sufficient display duration for reading
  • High contrast, readable font
  • Positioning that doesn’t obscure important visuals

Research finding: Closed captioning significantly improves learner access, outcomes, and performance—not just for deaf users, but for all learners.

WCAG requirement (1.2.1): All prerecorded audio-only content must have a text transcript.

Transcripts should include:

  • Complete text of all spoken content
  • Identification of speakers
  • Description of significant sounds
  • Time stamps for long content
  • Searchable, accessible format

Benefits beyond accessibility:

  • SEO improvement (searchable content)
  • Quiet environment use
  • Skimmable content
  • Reference and quotation
  • Translation capability

For users who can hear but can’t see visual content:

WCAG requirement (1.2.5 Audio Descriptions): Prerecorded video must have audio descriptions of important visual information.

Audio descriptions provide:

  • Narration of key visual elements
  • Description of actions and scene changes
  • Character identification
  • On-screen text reading
  • Important visual context

Design for different listening contexts:

Libraries, sleeping babies, hospital rooms, late-night use:

  • Need: Silent mode that doesn’t lose information
  • Solution: All sounds have visual equivalents, vibration options
  • Test: Use interface with sound completely disabled

Cafés, public transport, factories, outdoor use:

  • Need: Visual must carry the full message
  • Solution: Captions, large visual indicators, redundant cues
  • Test: Use interface in noisy environment or with earplugs

Offices, open floor plans, public spaces:

  • Need: Users don’t want to disturb others
  • Solution: Easy mute, headphone detection, volume memory
  • Test: Would you use this in a library?

Private listening with good audio quality:

  • Can support: Stereo, higher frequencies, subtle sounds, spatial audio
  • Opportunity: Richer audio experience when detected
  • Consideration: Don’t make headphone features mandatory

Functional audio provides information:

  • Confirmation of actions
  • Error alerts
  • Status changes
  • Navigation cues

Decorative audio enhances experience:

  • Background music
  • Ambient sounds
  • Audio branding
  • Mood setting

Rule: Functional audio needs visual equivalents. Decorative audio should be optional and easy to disable.

Effective audio feedback is:

  • Brief: Short sounds for quick feedback (< 0.5 seconds for confirmations)
  • Distinct: Different sounds for different meanings
  • Pleasant: Not jarring, especially for frequent actions
  • Informative: Conveys meaning, not just presence
  • Consistent: Same sound means same thing throughout

Avoid:

  • Sounds that are annoying with repetition
  • Similar sounds for different meanings
  • Sounds that compete with speech
  • Frequencies that are commonly lost to hearing impairment

As interfaces extend to AR, VR, and XR:

Spatial audio can:

  • Orient users in virtual environments
  • Provide navigational cues
  • Reinforce actions and feedback
  • Create immersive experiences

Accessibility requirements:

  • All spatial cues need non-audio alternatives
  • Mono mode for users with single-ear hearing
  • Visual indicators for sound direction/source
  • Alternative navigation methods

Voice user interfaces (VUIs) are rapidly transforming human-computer interaction, but require careful accessibility consideration:

For deaf and hard of hearing users:

  • Always provide non-voice input alternatives
  • Visual feedback for voice input (text display)
  • Text-based alternatives for all voice commands

For all users:

  • Error recovery without voice
  • Visual confirmation of recognized input
  • Ability to correct without speaking

When systems speak to users:

  • Provide visual text equivalent
  • Allow playback speed control
  • Offer text-to-speech alternatives
  • Support screen reader compatibility

Many deaf and hard of hearing users cannot use voice phone calls effectively.

Provide multiple contact methods:

  • Email
  • Live chat
  • Video chat with sign interpretation when possible
  • Online forms
  • Text messaging / SMS
  • Real-Time Text (RTT) where supported

For customer service:

  • TTY/TDD number if available
  • Video relay service (VRS) compatibility
  • Response time expectations for non-phone contact
  • Equal quality of service across channels
  • Disable sound completely: Can you still use the interface fully?
  • Caption review: Are captions accurate, timed well, and complete?
  • Transcript check: Do transcripts include all content?
  • Control accessibility: Can audio be controlled via keyboard?
  • Include deaf and hard of hearing users
  • Test with different hearing devices
  • Test caption preferences
  • Gather feedback on communication options
  • Caption file validation
  • Audio control presence
  • WCAG conformance checking
  • Media accessibility audits

The Soundability Lab at University of Michigan designs human-centered, agentic AI for sound accessibility. Current research includes real-time audio captioning systems, editable digital media soundscapes, and adaptive hearing systems. They view sound personalization as a way to make sound more inclusive and equitable.

According to industry experts, in 2025 brands will shift from standalone audio logos to fully integrated UX sound design. Inclusive design practices will ensure sound supports all users through audio-based navigation for visually impaired users, confirmation sounds for actions, and cognitive-friendly sound cues.

The Department of Justice’s April 2024 final rule updated ADA Title II requirements for web content and mobile apps. Public entities must follow WCAG 2.1 Level AA, with compliance deadlines of April 2026 for large entities (50,000+ population) and April 2027 for smaller entities.

Research published in Digital Disability & Deaf Studies Journal confirms that implementing closed captions significantly improves learner access, outcomes, and performance for all learners, not just those with hearing disabilities.

A 2024 systematic literature review on voice user interfaces proposes a six-category classification framework for VUI research. The review emphasizes the importance of accessible voice interface design as VUIs transform human-computer interaction.

As technology moves beyond screens to AR, VR, and XR, 2025 predictions indicate spatial audio and 3D sound design will help orient users in virtual environments, providing navigational cues and reinforcing actions. Automotive, healthcare, and consumer electronics will see increased emphasis on sound safety and compliance.

  • No audio-only information: All sounds have visual equivalents
  • Video captions: Accurate, synchronized, including non-speech sounds
  • Audio transcripts: Complete text for audio-only content
  • Volume controls: Easily accessible, persistent preferences
  • No autoplay: Or starts muted with clear controls
  • Multiple contact methods: Not phone-only
  • Audio descriptions: For video with important visual content
  • Silent mode test: Interface fully usable without sound

Official Standards:

Recent Research:

Practical Resources: