JavaScript实现文字转语音功能完整教程：从基础到实战应用

来源：站长平台作者：陈平安头衔：全栈工程师

导读：本期聚焦于小伙伴创作的《JavaScript实现文字转语音功能完整教程：从基础到实战应用》，敬请观看详情，探索知识的价值。以下视频、文章将为您系统阐述其核心内容与价值。如果您觉得《JavaScript实现文字转语音功能完整教程：从基础到实战应用》有用，将其分享出去将是对创作者最好的鼓励。

JavaScript实现文字转语音功能

在现代Web开发中，让网页能够“说话”是一个很实用的功能。无论是辅助阅读、无障碍访问还是交互反馈，文字转语音都能带来更好的用户体验。JavaScript原生提供了Web Speech API，其中的SpeechSynthesis接口可以让我们轻松实现文字转语音功能。

Web Speech API简介

Web Speech API是浏览器内置的语音合成接口，它允许开发者控制文本到语音的转换过程。这个接口不需要安装任何第三方库，直接使用浏览器原生的语音合成引擎即可。

核心对象有两个：

SpeechSynthesis：控制语音合成的主控制器，负责开始、暂停、恢复和取消语音播放
SpeechSynthesisUtterance：代表一个语音请求，包含要朗读的文本以及语速、音调、音量等参数

基础实现：最简单的文字转语音

我们先从最基础的实现开始，让浏览器朗读一段文字：

// 获取浏览器语音合成实例
const synth = window.speechSynthesis;

// 创建一个语音请求对象
const utterance = new SpeechSynthesisUtterance('你好，欢迎使用文字转语音功能');

// 开始朗读
synth.speak(utterance);

这段代码会立即让浏览器朗读指定的中文文本。需要注意的是，speak()方法是异步的，朗读不会阻塞页面其他操作。

控制语音参数：语速、音调和音量

SpeechSynthesisUtterance对象支持多个参数调整，让朗读效果更符合需求：

const synth = window.speechSynthesis;
const utterance = new SpeechSynthesisUtterance();

// 设置文本内容
utterance.text = 'JavaScript文字转语音功能非常强大，可以自定义多种参数。';

// 设置语言（中文简体）
utterance.lang = 'zh-CN';

// 设置语速，范围0.1到10，1为正常速度
utterance.rate = 1.2;

// 设置音调，范围0到2，1为正常音调
utterance.pitch = 1;

// 设置音量，范围0到1
utterance.volume = 0.8;

synth.speak(utterance);

通过调整这些参数，可以让语音听起来更自然或更适合特定场景。比如教程类应用可以适当降低语速，游戏场景可以调整音调让角色声音更有特色。

选择不同的语音引擎

不同操作系统和浏览器提供的语音引擎各不相同。我们可以先获取可用的语音列表，然后选择最合适的进行朗读：

const synth = window.speechSynthesis;
const utterance = new SpeechSynthesisUtterance('这是一段测试语音，用于演示语音选择功能。');

// 获取所有可用的语音引擎
function getVoices() {
  const voices = synth.getVoices();
  console.log('可用的语音引擎:', voices.map(v => v.name + ' (' + v.lang + ')'));
  return voices;
}

// 选择中文语音引擎
function selectChineseVoice() {
  const voices = synth.getVoices();
  // 优先选择中文语音
  const chineseVoice = voices.find(voice => voice.lang.startsWith('zh'));
  if (chineseVoice) {
    utterance.voice = chineseVoice;
    console.log('已选择中文语音:', chineseVoice.name);
  } else {
    console.log('未找到中文语音引擎');
  }
}

// 注意：部分浏览器在页面加载时getVoices()可能返回空数组
// 需要监听voiceschanged事件
if (synth.getVoices().length === 0) {
  synth.addEventListener('voiceschanged', () => {
    selectChineseVoice();
    synth.speak(utterance);
  });
} else {
  selectChineseVoice();
  synth.speak(utterance);
}

语音引擎列表在不同设备和浏览器中差异很大。Windows系统通常提供Microsoft Huihui、Microsoft Kangkang等中文语音，macOS和iOS则提供Tingting等语音。建议在实际使用时提供语音选择让用户自行决定。

控制播放状态：暂停、恢复和取消

除了基本的朗读功能，SpeechSynthesis接口还提供了播放控制方法：

const synth = window.speechSynthesis;
const utterance = new SpeechSynthesisUtterance('这是一段较长的文本，可以用来测试暂停、恢复和取消功能。');

// 开始朗读
document.getElementById('btnSpeak').addEventListener('click', () => {
  synth.speak(utterance);
});

// 暂停朗读
document.getElementById('btnPause').addEventListener('click', () => {
  if (synth.speaking) {
    synth.pause();
  }
});

// 恢复朗读
document.getElementById('btnResume').addEventListener('click', () => {
  if (synth.paused) {
    synth.resume();
  }
});

// 取消朗读
document.getElementById('btnCancel').addEventListener('click', () => {
  if (synth.speaking) {
    synth.cancel();
  }
});

对应的HTML按钮结构如下：

<button id="btnSpeak">开始朗读</button>
<button id="btnPause">暂停</button>
<button id="btnResume">恢复</button>
<button id="btnCancel">取消</button>

监听语音事件

为了更精确地控制语音播放流程，可以监听语音合成过程中触发的事件：

const synth = window.speechSynthesis;
const utterance = new SpeechSynthesisUtterance('事件监听可以帮助我们了解语音播放的各个阶段。');

// 开始说话时触发
utterance.addEventListener('start', () => {
  console.log('语音开始播放');
  updateStatus('正在朗读...');
});

// 说话结束时触发
utterance.addEventListener('end', () => {
  console.log('语音播放完成');
  updateStatus('朗读完成');
});

// 说话过程中断时触发
utterance.addEventListener('error', (event) => {
  console.error('语音播放出错:', event.error);
  updateStatus('朗读出错');
});

// 暂停时触发
utterance.addEventListener('pause', () => {
  console.log('语音已暂停');
  updateStatus('已暂停');
});

// 恢复时触发
utterance.addEventListener('resume', () => {
  console.log('语音已恢复');
  updateStatus('继续朗读...');
});

// 每朗读一个词边界时触发（部分浏览器支持）
utterance.addEventListener('boundary', (event) => {
  console.log('朗读到字符位置:', event.charIndex);
});

function updateStatus(msg) {
  document.getElementById('status').textContent = msg;
}

synth.speak(utterance);

这些事件对于开发复杂的语音交互应用非常有用，比如根据语音播放状态控制UI显示、实现逐词高亮效果等。

完整示例：文字转语音工具

以下是一个完整的文字转语音工具实现，包含了参数调节和播放控制：

<!DOCTYPE html>
<html lang="zh-CN">
<head>
  <meta charset="UTF-8">
  <title>文字转语音工具</title>
  <style>
    body { font-family: Arial, sans-serif; max-width: 600px; margin: 20px auto; padding: 0 20px; }
    .control-group { margin-bottom: 15px; }
    label { display: block; margin-bottom: 5px; font-weight: bold; }
    textarea { width: 100%; height: 120px; padding: 8px; border: 1px solid #ccc; border-radius: 4px; }
    select, input[type="range"] { width: 100%; padding: 5px; }
    .btn-group { display: flex; gap: 10px; margin-top: 15px; }
    button { padding: 8px 20px; border: none; border-radius: 4px; cursor: pointer; }
    .btn-speak { background: #4CAF50; color: white; }
    .btn-pause { background: #FF9800; color: white; }
    .btn-resume { background: #2196F3; color: white; }
    .btn-cancel { background: #f44336; color: white; }
    .status { margin-top: 15px; padding: 10px; background: #f5f5f5; border-radius: 4px; }
    .rate-value, .pitch-value { display: inline-block; margin-left: 10px; font-weight: bold; }
  </style>
</head>
<body>
  <h1>文字转语音工具</h1>

  <div class="control-group">
    <label for="textInput">请输入文本：</label>
    <textarea id="textInput">你好，欢迎使用文字转语音工具。你可以调整语速和音调来改变朗读效果。</textarea>
  </div>

  <div class="control-group">
    <label for="voiceSelect">选择语音：</label>
    <select id="voiceSelect"></select>
  </div>

  <div class="control-group">
    <label>语速：<span class="rate-value" id="rateValue">1.0</span></label>
    <input type="range" id="rateSlider" min="0.1" max="2.0" step="0.1" value="1.0">
  </div>

  <div class="control-group">
    <label>音调：<span class="pitch-value" id="pitchValue">1.0</span></label>
    <input type="range" id="pitchSlider" min="0.1" max="2.0" step="0.1" value="1.0">
  </div>

  <div class="btn-group">
    <button class="btn-speak" id="btnSpeak">开始朗读</button>
    <button class="btn-pause" id="btnPause">暂停</button>
    <button class="btn-resume" id="btnResume">恢复</button>
    <button class="btn-cancel" id="btnCancel">取消</button>
  </div>

  <div class="status" id="status">准备就绪</div>

  <script>
    const synth = window.speechSynthesis;
    const utterance = new SpeechSynthesisUtterance();
    const voiceSelect = document.getElementById('voiceSelect');
    const statusDiv = document.getElementById('status');

    // 获取所有语音并填充选择器
    function populateVoiceList() {
      const voices = synth.getVoices();
      voiceSelect.innerHTML = '';
      voices.forEach((voice, index) => {
        const option = document.createElement('option');
        option.value = index;
        option.textContent = voice.name + ' (' + voice.lang + ')';
        voiceSelect.appendChild(option);
      });
    }

    populateVoiceList();
    if (synth.getVoices().length === 0) {
      synth.addEventListener('voiceschanged', populateVoiceList);
    }

    // 更新语音参数
    function updateUtterance() {
      const text = document.getElementById('textInput').value;
      if (!text.trim()) {
        statusDiv.textContent = '请输入文本内容';
        return false;
      }

      utterance.text = text;
      const selectedVoiceIndex = voiceSelect.value;
      if (selectedVoiceIndex) {
        utterance.voice = synth.getVoices()[selectedVoiceIndex];
      }
      utterance.rate = parseFloat(document.getElementById('rateSlider').value);
      utterance.pitch = parseFloat(document.getElementById('pitchSlider').value);
      return true;
    }

    // 更新显示值
    document.getElementById('rateSlider').addEventListener('input', () => {
      document.getElementById('rateValue').textContent = document.getElementById('rateSlider').value;
    });
    document.getElementById('pitchSlider').addEventListener('input', () => {
      document.getElementById('pitchValue').textContent = document.getElementById('pitchSlider').value;
    });

    // 事件监听
    utterance.addEventListener('start', () => statusDiv.textContent = '正在朗读...');
    utterance.addEventListener('end', () => statusDiv.textContent = '朗读完成');
    utterance.addEventListener('error', (e) => statusDiv.textContent = '朗读出错: ' + e.error);
    utterance.addEventListener('pause', () => statusDiv.textContent = '已暂停');
    utterance.addEventListener('resume', () => statusDiv.textContent = '继续朗读...');

    // 按钮事件
    document.getElementById('btnSpeak').addEventListener('click', () => {
      if (synth.speaking) {
        synth.cancel();
      }
      if (updateUtterance()) {
        synth.speak(utterance);
      }
    });

    document.getElementById('btnPause').addEventListener('click', () => {
      if (synth.speaking) {
        synth.pause();
      }
    });

    document.getElementById('btnResume').addEventListener('click', () => {
      if (synth.paused) {
        synth.resume();
      }
    });

    document.getElementById('btnCancel').addEventListener('click', () => {
      if (synth.speaking) {
        synth.cancel();
      }
      statusDiv.textContent = '已取消朗读';
    });
  </script>
</body>
</html>

这个完整的示例包含了文本输入、语音选择、语速和音调调节以及完整的播放控制功能。用户可以通过简单的界面操作实现个性化的文字转语音体验。

浏览器兼容性与注意事项

在使用Web Speech API时，有几点需要特别注意：

浏览器	兼容情况	备注
Chrome	完全支持	需HTTPS或localhost环境
Firefox	部分支持	不支持`voice`属性设置
Safari	部分支持	iOS设备需用户触发才能播放
Edge	完全支持	基于Chromium内核

几个重要的注意事项：

用户交互触发：大多数浏览器要求在用户手势（如点击按钮）触发的回调中调用speak()方法，否则可能被浏览器阻止。
语音引擎加载：部分浏览器在页面加载时getVoices()可能返回空数组，需要监听voiceschanged事件确保语音列表可用。
中文支持：不同设备的默认中文语音质量差异较大，建议让用户有选择语音的权限。
长时间朗读：如果文本太长，朗读可能自动停止，建议分段朗读或监听end事件继续下一段。
后台播放：在移动设备上，页面切换到后台后语音可能会自动停止，这是系统行为，无法通过代码控制。

总结

JavaScript的Web Speech API提供了一种简单而强大的方式来实现文字转语音功能。通过SpeechSynthesis和SpeechSynthesisUtterance两个核心对象，我们可以控制朗读的文本、语速、音调、音量，甚至选择不同的语音引擎。结合事件监听和播放控制方法，完全可以开发出功能丰富的语音交互应用。

在实际项目中，这个功能可以用于无障碍阅读辅助、语音导航提示、电子书朗读、语言学习应用等多种场景。虽然不同浏览器的兼容性和语音质量存在差异，但对于大多数常见的应用场景来说，Web Speech API已经足够好用了。

JavaScript 文字转语音 Web_Speech_API SpeechSynthesis 语音合成修改时间：2026-05-13 07:09:56

免责声明：已尽一切努力确保本网站所含信息的准确性。网站内容多为原创整理与精心编撰，观点力求客观中立。本站旨在免费分享，内容仅供个人学习、研究或参考使用。若引用了第三方作品，版权归原作者所有。如内容涉及您的权益，请联系我们处理。