Skip to main content

Amazon Polly

What is it

A Text-to-Speech (TTS) service that converts text into realistic speech.

What is it for

Developing applications that speak, allowing you to create audio content for a variety of use cases.

Use cases

  • Interactive voice applications (e.g., virtual assistants, IVRs)
  • Creation of audio content for e-learning, audiobooks, and podcasts
  • Video and presentation narration
  • Applications for visually impaired people
  • Games and entertainment applications

Key points

  • Realistic speech: Uses deep learning technologies to produce human-like voices
  • Multiple voices and languages: Supports dozens of voices in various languages
  • SSML (Speech Synthesis Markup Language): Allows control over speech aspects such as volume, pitch, speed, and emphasis
  • Lexicons: Allows customization of specific word pronunciations
  • Audio streaming: Converts text into audio stream in real-time
  • Pay-per-use: You pay per character converted to speech

Comparison

  • Amazon Polly: Offers a scalable and cost-effective solution for generating speech, without the need to hire voice actors or manage recording studios. Allows for quick and consistent audio content updates.
  • Human voice recording: Can offer more natural voice quality and emotional nuances, but is more expensive, time-consuming, and less flexible for content updates.