[关闭]
@caelumtian 2017-09-04T14:16:55.000000Z 字数 13253 阅读 974

Building A Simple AI Chatbot With Web Speech API And Node.js

(使用Web Speech API和Node.js构建简单的AI Chatbot)

英文文献翻译


Using voice commands has become pretty ubiquitous[adj. 无处不在的] nowadays, as more mobile phone users use voice assistants[n. 助理] such as Siri and Cortana, and as devices such as Amazon Echo and Google Home have been invading our living rooms. These systems are built with speech recognition software that allows their users to issue voice commands. Now, our web browsers will become familiar with to Web Speech API, which allows users to integrate voice data in web apps.

(随着语音技术的普及,越来越多的手机用户开始使用Siri和Cortana等语音助理软件。Amazon Echo和Google Home等语音软件同样走入了我们的日常生活。这些系统基于语音识别软件构建允许用户直接通过语音下达命令。现在,浏览器也已经支持了Web Speech API,它允许用户在web应用程序中集成语音数据。)

With the current state of web apps, we can rely on various UI elements to interact with users. With the Web Speech API, we can develop rich web applications with natural user interactions and minimal visual interface, using voice commands. This enables countless use cases for richer web applications. Moreover, the API can make web apps accessible, helping people with physical or cognitive disabilities or injuries. The future web will be more conversational and accessible!

(基于现在的web应用程序,我们可以使用多种UI元素来和用户交互。使用Web Speech API,我们可以开发更加自然地小巧的web可视化界面。这样我们就能开发更丰富的web应用程序。此外,该API也可以帮助身体或认知障碍的人更好的访问web信息。)

ENHANCING USER EXPERIENCE LINK
Web Speech API enables websites and web apps not only to speak to you, but to listen, too. Take a look at just some great examples of how it can be used to enhance the user experience. Read more

增强用户体验
Web Speeh API可以使网站和Web app不仅仅可以交谈还可以聆听。这里有一些关于猪呢个条用户体验的很好的例子。阅读更多

In this tutorial, we will use the API to create an artificial intelligence (AI) voice chat interface in the browser. The app will listen to the user’s voice and reply with a synthetic voice. Because the Web Speech API is still experimental, the app works only in supported browsers. The features used for this article, both speech recognition and speech synthesis, are currently only in the Chromium-based browsers, including Chrome 25+ and Opera 27+, while Firefox, Edge and Safari support only speech synthesis at the moment.

(在本教程中,我们将使用API在浏览器中创建人工智能(AI)语音聊天界面。该应用将听取用户的声音并以合成语音回复。因为Web Speech API依然是实现性的,所以目前仅能支持在有限的浏览器中。本文使用的语音识别和语音合成功能目前仅基于Chromium浏览器,包括Chrome 25+和Opera 27+,而Firefox,Edge和Safari目前仅支持语音合成。)

To build the web app, we’re going to take three major steps:
1. Use the Web Speech API’s SpeechRecognition interface to listen to the user’s voice.
2. Send the user’s message to a commercial[adj. 商业的] natural-language-processing API as a text string.
3. Once API.AI returns the response text back, use the SpeechSynthesis interface to give it a synthetic[adj. 合成的] voice.

(要构建该程序,我们主要采取三个步骤:
1. 使用Web Speech API的SpeechRecognition接口来聆听用户的语音
2. 将用户消息作为字符串文本发送给商业的自然语言处理API
3. 一旦API.AI返回响应文本,我们就利用SpeechSynthesis接口返回给用户一个合成声音)

The entire source code used for this tutorial is on GitHub.

(本教程使用的完整的源代码在GitHub上)

PREREQUISITES LINK

先决条件

This tutorial relies on Node.js. You’ll need to be comfortable with JavaScript and have a basic understanding of Node.js.
Make sure Node.js is installed on your machine, and then we’ll get started!

本教程依赖于Node.js,你需要对JavaScript和Node.js有一定的了解并确保您的电脑上已经安装了Noded.js

Setting Up Your Node.js Application

(构建你的Node.js应用)
First, let’s set up a web app framework with Node.js. Create your app directory, and set up your app’s structure like this:

(首先,让我们搭建一个Node.js的应用框架。创建你的应用目录,并设置如下的目录结构: )

.
├── index.js
├── public
│ ├── css
│ │ └── style.css
│ └── js
│ └── script.js
└── views
└── index.html

Then, run this command to initialize your Node.js app:
(之后利用如下命令来初始化你的Node.js应用)

npm init -f

The -f accepts the default setting, or else you can configure the app manually without the flag. Also, this will generate a package.json file that contains the basic info for your app.

(-f 接受默认的设置,你也可以不适用该选项来手动配置你的项目。这将会创建一个包含你的应用基本信息的 package.json 文件)
Now, install all of the dependencies needed to build this app:

(接下来安装依赖:)

$ npm install express socket.io apiai --save

We are going to use Express, a Node.js web application server framework, to run the server locally. To enable real-time bidirectional communication between the server and the browser, we’ll use Socket.IO. Also, we’ll install the natural language processing service tool, API.AI in order to build an AI chatbot that can have an artificial conversation.

(我们将使用Express(一个Node.js Web应用框架)来搭建本地服务。为了实现服务器和浏览器之间的双向通信,我们将使用Socket.IO。同时,我们还将使用自然语言处理工具API.AI,来构建可以人工交谈的AI聊天工具。)

Socket.IO is a library that enables us to use WebSocket easily with Node.js. By establishing a socket connection between the client and server, our chat messages will be passed back and forth between the browser and our server, as soon as text data is returned by the Web Speech API (the voice message) or by API.AI API (the “AI” message).

(Socket.IO,能够让我们更方便的使用WebSocket。通过在客户端和服务端建立socket连接。当Web Speech API或API.AI 返回文本数据的时候,我们能将聊天信息在浏览器和服务器之间传递。)
Now, let’s create an index.js file and instantiate Express and listen to the server:

(首先,我们创建index.js文件并实例化Express作为服务器:)

  1. const express = require('express');
  2. const app = express();
  3. app.use(express.static(__dirname + '/views')); // html
  4. app.use(express.static(__dirname + '/public')); // js, css, images
  5. const server = app.listen(5000);
  6. app.get('/', (req, res) => {
  7. res.sendFile('index.html');
  8. });

Now, let’s work on our app! In the next step, we will integrate[vt. 使完整,结合] the front-end code with the Web Speech API.

(下一步,我们将Web Speech API集成到前端代码。)

Receiving Speech With The SpeechRecognition Interface

(利用SpeechRecognition 接口接受语音)

The Web Speech API has a main controller interface, named SpeechRecognition, to receive the user’s speech from a microphone and understand what they’re saying.

(Web Speech API具有名为SpeechRecognition的接口,用来从麦克风中获取用户们的讲话并了解他们在说什么。)

(创建用界面)

The UI of this app is simple: just a button to trigger voice recognition. Let’s set up our index.html file and include our front-end JavaScript file (script.js) and Socket.IO, which we will use later to enable the real-time communication:

(该程序的UI界面很简单,仅有一个按钮来触发语音识别。让我们创建一个index.html文件,其中包含js文件和Socket.IO,稍后我们将用它启动实时通讯:)

  1. <html lang="en">
  2. <head></head>
  3. <body>
  4. <script src="https://cdnjs.cloudflare.com/ajax/libs/socket.io/2.0.1/socket.io.js"></script>
  5. <script src="js/script.js"></script>
  6. </body>
  7. </html>

Then, add a button interface in the HTML’s body:

(然后,在页面中加入按钮接口:)

  1. <button>Talk</button>

To style the button as seen in the demo, refer to the style.css file in the source code.

(要想查看按钮的样式,你何以查看源代码中的style.css文件。)

Capturing Voice With JavaScript

(使用JavaScript捕获语音)

In script.js, invoke an instance of SpeechRecognition, the controller interface of the Web Speech API for voice recognition:
(在script.js中,创建SpeechRecognition实例来识别语音:)

  1. const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
  2. const recognition = new SpeechRecognition();

We’re including both prefixed and non-prefixed objects, because Chrome currently supports the API with prefixed properties.
Also, we are using some of ECMAScript6 syntax in this tutorial, because the syntax, including the const and arrow functions, are available in browsers that support both Speech API interfaces, SpeechRecognition and SpeechSynthesis.
Optionally[adj. 可选择的;adv. 随意的], you can set varieties of properties to customize speech recognition:

(我们使用了带有前缀和不带的两个对象做判断,因为Chrome当前支持带有前缀的API。
同时,我们在本教程中使用了一些ES6语法,包括const,箭头函数等等,他们都可以在支持语音接口的浏览器中使用。
你可以设置各种属性,来自定义语音识别:)

  1. recognition.lang = 'en-US';
  2. recognition.interimResults = false;

Then, capture the DOM reference for the button UI, and listen for the click event to initiate speech recognition:

(之后,监听button UI的DOM节点的点击事件来启动语音识别:)

  1. document.querySelector('button').addEventListener('click', () => {
  2. recognition.start();
  3. });

Once speech recognition has started, use the result event to retrieve what was said as text:

(一旦语音开始,我们就调用result事件,来获取语音文本:)

  1. recognition.addEventListener('result', (e) => {
  2. let last = e.results.length - 1;
  3. let text = e.results[last][0].transcript;
  4. console.log('Confidence: ' + e.results[0][0].confidence);
  5. // We will use the Socket.IO here later…
  6. });

This will return a SpeechRecognitionResultList object containing the result, and you can retrieve[vt. 检索] the text in the array. Also, as you can see in the code sample, this will return confidence for the transcription, too.
Now, let’s use Socket.IO to pass the result to our server code.

(这将返回一个包含结果的SpeechRecognitionResultList兑现,你可以在数组中检索文字信息。接下来我们使用Socket.IO来传递数据额给我们的服务器。)

Real-Time Communication With Socket.IO

(Socket.IO做实时通讯)

Socket.IO is a library for real-time web applications. It enables real-time bidirectional communication between web clients and servers. We are going to use it to pass the result from the browser to the Node.js code, and then pass the response back to the browser.
(Socket.IO是一个做实时通讯web应用的库。他可以实现web客户端和服务端之间的双向通讯。我们将使用它将语音结果传递给Node.js,将相应信息传回浏览器。)
You may be wondering why are we not using simple HTTP or AJAX instead. You could send data to the server via POST. However, we are using WebSocket via Socket.IO because sockets are the best solution for bidirectional communication, especially when pushing an event from the server to the browser. With a continuous socket connection, we won’t need to reload the browser or keep sending an AJAX request at a frequent interval.
(你也许会有疑虑为什么我们不使用简单的HTTP协议或者AJAX。你可以发送POST请求来传奇数据。但是我们通过Socket.IO来创建webscoket,是因为这是最好的实现双向通讯的解决方案。特别是服务器向浏览器发送数据,如果采用AJAX来实现我们就不得不再用轮训的方式:)

Instantiate Socket.IO in script.js somewhere:
(在script.js中实例化Socket.IO)

  1. const socket = io();

Then, insert this code where you are listening to the result event from SpeechRecognition:

(然后将下面这段代码加入到监听reuslt事件的回调函数中:)

  1. socket.emit('chat message', text);

Now, let’s go back to the Node.js code to receive this text and use AI to reply to the user.

(我们回到Node.js代码中,来接受这些文本并使用API来回复用户的消息。)

Getting A Reply From AI

(从AI中获取回复)

Numerous platforms and services enable you to integrate an app with an AI system using speech-to-text and natural language processing, including IBM’s Watson, Microsoft’s LUIS and Wit.ai. To build a quick conversational interface, we will use API.AI, because it provides a free developer account and allows us to set up a small-talk system quickly using its web interface and Node.js library.

(许多平台提供的服务可以将带有自然语言处理的AI系统结合到项目中,包括IBM的Watson,微软的LUIS和Wit.ai。为了快速构建一个会话接口,我们使用API.AI。因为它提供了免费的开发者帐户,并允许我们使用其Web界面和Node.js库快速设置一个小型的系统。)

(设置API.AI)

Once you’ve created an account, create an “agent.” Refer to the “Getting Started” guide, step one.
Then, instead of going the full customization route by creating entities and intents, first, simply click the “Small Talk” preset from the left menu, then, secondly, toggle the switch to enable the service.

(首先创建一个账户和代理。更多内容可以参考入门指南中的第一步。
然后,创建entities和intents。点击左侧菜单中的"Small Talk",然后切换开关即可开启服务)

Customize your small-talk agent as you’d like using the API.AI interface.

(根据你的喜好,自定义API.AI接口的代理。)
Go to the “General Settings” page by clicking the cog icon next to your agent’s name in the menu, and get your API key. You will need the “client access token” to use the Node.js SDK.
(前往常规设置,点击cog图标,来获取你的API秘钥。你需要使用"客户端访问令牌"才能使用Node.js SDK。)

USING THE API.AI NODE.JS SDK

(使用API.AI的Node.js SDK)

Let’s hook up[勾搭] our Node.js app to API.AI using the latter’s Node.js SDK! Go back to your index.js file and initialize API.AI with your access token:

(我们利用Node.js SDK来链接Node.js应用和API.AI。回到你的index.js文件中,利用你的access token来初始化API.AI:)

  1. const apiai = require('apiai')(APIAI_TOKEN);

If you just want to run the code locally, you can hardcode your API key here. There are multiple ways to set your environment variables, but I usually set an .env file to include the variables. In the source code on GitHub, I’ve hidden my own credentials by including the file with .gitignore, but you can look at the .env-test file to see how it is set.

(如果您只想在本地运行代码,可以在此处对API密钥进行编码。这里有多种方式来设置环境变量,我通常使用设置 .env 文件来包含变量信息。在GitHub源码中,我将包含凭证的信息文件添加到了.gitignroe中,你可以查看 .env-test 文件来查看他是如何设置的。)

Now we are using the server-side Socket.IO to receive the result from the browser.
Once the connection is established and the message is received, use the API.AI APIs to retrieve a reply to the user’s message:

(现在我么能使用服务端的Socket.IO来获取浏览器的结果。
一旦接收到消息,我们就用API.AI的API来检索并回复用户信息:)

  1. io.on('connection', function(socket) {
  2. socket.on('chat message', (text) => {
  3. // Get a reply from API.AI
  4. let apiaiReq = apiai.textRequest(text, {
  5. sessionId: APIAI_SESSION_ID
  6. });
  7. apiaiReq.on('response', (response) => {
  8. let aiText = response.result.fulfillment.speech;
  9. socket.emit('bot reply', aiText); // Send the result back to the browser!
  10. });
  11. apiaiReq.on('error', (error) => {
  12. console.log(error);
  13. });
  14. apiaiReq.end();
  15. });
  16. });

When API.AI returns the result, use Socket.IO’s socket.emit() to send it back to the browser.

(当API.AI返回结果时,使用Socket.IO socket.emit()将其发送回浏览器。)

Giving The AI A Voice With The SpeechSynthesis Interface (利用SpeechSynthesis接口来合成语音)

Let’s go back to script.js once again to finish off the app!
(然我们再次回到script.js中,完成我们的应用!)
Create a function to generate a synthetic voice. This time, we are using the SpeechSynthesis controller interface of the Web Speech API.
(创建一个语音合成器,这次我们使用SpeechSynthesis接口。)
The function takes a string as an argument and enables the browser to speak the text:
(该函数将接受字符串作为参数,并使浏览器能够说出文本:)

  1. function synthVoice(text) {
  2. const synth = window.speechSynthesis;
  3. const utterance = new SpeechSynthesisUtterance();
  4. utterance.text = text;
  5. synth.speak(utterance);
  6. }

In the function, first, create a reference to the API entry point, window.speechSynthesis. You might notice that there is no prefixed property this time: This API is more widely supported than SpeechRecognition, and all browsers that support it have already dropped the prefix for SpeechSysthesis.
(该函数中,我们首先创建了一个API入口对象,window.speechSynthesis。这次我们没有在使用前缀,该API的支持度更高,许多浏览器已经移除了该前缀。)

Then, create a new SpeechSynthesisUtterance() instance using its constructor, and set the text that will be synthesised when the utterance is spoken. You can set other properties, such as voice to choose the type of the voices that the browser and operating system should support.

(然后,我们创建一个SpeechSynthesisUtterance实例,并设置要合成语音的文本。你也可以设置其他属性,例如voice类型和操作系统支持的语音类型。)
Finally, use the SpeechSynthesis.speak() to let it speak!
Now, get the response from the server using Socket.IO again. Once the message is received, call the function.
(最后,我们使用SpeechSynthesis.speak()来是浏览器说话。
现在,再次从Socket.IO获取服务器的响应。一旦接收到消息,请调用该功能。)

  1. socket.on('bot reply', function(replyText) {
  2. synthVoice(replyText);
  3. });

You are done! Let’s try a chit-chat with our AI bot!
(至此我们完成了全部功能,你可以试试:)

请注意,浏览器在首次的时候会询问你是否使用麦克风。像其他Web API(如Geolocation API和Notification API)一样,除非你授予它,否则浏览器将永远不会访问您的敏感信息,因此你的声音不会在不知情的情况下被秘密记录。
API.AI是可配置和可训练的。阅读API.AI文档,使其更加智能。

参考

本教程仅涵盖了API的核心功能,但该API实际上是非常灵活和可自定义的。您可以改变识别语言,合成语音,包括口音(如美国或英国英语),语音音调和语速。你可以在这里了解有关API的更多信息:

自然语言处理工具你可以参考如下:
* API.AI Google
* Wit.ai Facebook
* LUIS Microsoft
* Watson IBM
* Lex Amazon

添加新批注
在作者公开此批注前,只有你和作者可见。
回复批注