Menu Search
Jump to the content X X
Smashing Conf Barcelona

You know, we use ad-blockers as well. We gotta keep those servers running though. Did you know that we publish useful books and run friendly conferences — crafted for pros like yourself? E.g. our upcoming SmashingConf Barcelona, dedicated to smart front-end techniques and design patterns.

Building A Simple AI Chatbot With Web Speech API And Node.js

Using voice commands has become pretty ubiquitous nowadays, as more mobile phone users use voice assistants such as Siri and Cortana, and as devices such as Amazon Echo and Google Home1 have been invading our living rooms. These systems are built with speech recognition software that allows their users to issue voice commands2. Now, our web browsers will become familiar with to Web Speech API, which allows users to integrate voice data in web apps.

With the current state of web apps, we can rely on various UI elements to interact with users. With the Web Speech API, we can develop rich web applications with natural user interactions and minimal visual interface, using voice commands. This enables countless use cases for richer web applications. Moreover, the API can make web apps accessible, helping people3 with physical or cognitive disabilities or injuries. The future web will be more conversational and accessible!

Enhancing User Experience Link

Web Speech API enables websites and web apps not only to speak to you, but to listen, too. Take a look at just some great examples of how it can be used to enhance the user experience. Read more →4

In this tutorial, we will use the API to create an artificial intelligence (AI) voice chat interface in the browser. The app will listen to the user’s voice and reply with a synthetic voice. Because the Web Speech API is still experimental, the app works only in supported browsers5. The features used for this article, both speech recognition and speech synthesis, are currently only in the Chromium-based browsers, including Chrome 25+ and Opera 27+, while Firefox, Edge and Safari support only speech synthesis at the moment.

Browser compatibility6

(View large version7)

This video shows the demo in Chrome, and this is what we are going to build in this tutorial!


A simple AI chat bot demo with Web Speech API

To build the web app, we’re going to take three major steps:

  1. Use the Web Speech API’s SpeechRecognition interface to listen to the user’s voice.
  2. Send the user’s message to a commercial natural-language-processing API as a text string.
  3. Once API.AI returns the response text back, use the SpeechSynthesis interface to give it a synthetic voice.
The app flow8

The entire source code9 used for this tutorial is on GitHub.

Prerequisites Link

This tutorial relies on Node.js. You’ll need to be comfortable with JavaScript and have a basic understanding of Node.js.

Make sure Node.js10 is installed on your machine, and then we’ll get started!

Setting Up Your Node.js Application Link

First, let’s set up a web app framework with Node.js. Create your app directory, and set up your app’s structure like this:

.
├── index.js
├── public
│   ├── css
│   │   └── style.css
│   └── js
│       └── script.js
└── views
    └── index.html

Then, run this command to initialize your Node.js app:

$ npm init -f

The -f accepts the default setting, or else you can configure the app manually without the flag. Also, this will generate a package.json file that contains the basic info for your app.

Now, install all of the dependencies needed to build this app:

$ npm install express socket.io apiai --save

With the --save flag added, your package.json file will be automatically updated with the dependencies.

We are going to use Express11, a Node.js web application server framework, to run the server locally. To enable real-time bidirectional communication between the server and the browser, we’ll use Socket.IO12. Also, we’ll install the natural language processing service tool, API.AI13 in order to build an AI chatbot that can have an artificial conversation.

Socket.IO is a library that enables us to use WebSocket easily with Node.js. By establishing a socket connection between the client and server, our chat messages will be passed back and forth between the browser and our server, as soon as text data is returned by the Web Speech API (the voice message) or by API.AI API (the “AI” message).

Now, let’s create an index.js file and instantiate Express and listen to the server:

const express = require('express');
const app = express();

app.use(express.static(__dirname + '/views')); // html
app.use(express.static(__dirname + '/public')); // js, css, images

const server = app.listen(5000);
app.get('/', (req, res) => {
  res.sendFile('index.html');
});

Now, let’s work on our app! In the next step, we will integrate the front-end code with the Web Speech API.

Receiving Speech With The SpeechRecognition Interface Link

The Web Speech API has a main controller interface, named SpeechRecognition1614, to receive the user’s speech from a microphone and understand what they’re saying.

Creating the User Interface Link

The UI of this app is simple: just a button to trigger voice recognition. Let’s set up our index.html file and include our front-end JavaScript file (script.js) and Socket.IO, which we will use later to enable the real-time communication:

<html lang="en">
  <head>…</head>
  <body>
    …
    <script src="https://cdnjs.cloudflare.com/ajax/libs/socket.io/2.0.1/socket.io.js"></script>
    <script src="js/script.js"></script>
  </body>
</html>

Then, add a button interface in the HTML’s body:

<button>Talk</button>

To style the button as seen in the demo, refer to the style.css file in the source code15.

Capturing Voice With JavaScript Link

In script.js, invoke an instance of SpeechRecognition1614, the controller interface of the Web Speech API for voice recognition:

const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
const recognition = new SpeechRecognition();

We’re including both prefixed and non-prefixed objects, because Chrome currently supports the API with prefixed properties.

Also, we are using some of ECMAScript6 syntax in this tutorial, because the syntax, including the const and arrow functions, are available in browsers that support both Speech API interfaces, SpeechRecognition and SpeechSynthesis.

Optionally, you can set varieties of properties17 to customize speech recognition:

recognition.lang = 'en-US';
recognition.interimResults = false;

Then, capture the DOM reference for the button UI, and listen for the click event to initiate speech recognition:

document.querySelector('button').addEventListener('click', () => {
  recognition.start();
});

Once speech recognition has started, use the result event to retrieve what was said as text:

recognition.addEventListener('result', (e) => {
  let last = e.results.length - 1;
  let text = e.results[last][0].transcript;

  console.log('Confidence: ' + e.results[0][0].confidence);

  // We will use the Socket.IO here later…
});

This will return a SpeechRecognitionResultList object containing the result, and you can retrieve the text in the array. Also, as you can see in the code sample, this will return confidence for the transcription, too.

Now, let’s use Socket.IO to pass the result to our server code.

Real-Time Communication With Socket.IO Link

Socket.IO18 is a library for real-time web applications. It enables real-time bidirectional communication between web clients and servers. We are going to use it to pass the result from the browser to the Node.js code, and then pass the response back to the browser.

You may be wondering why are we not using simple HTTP or AJAX instead. You could send data to the server via POST. However, we are using WebSocket via Socket.IO because sockets are the best solution for bidirectional communication, especially when pushing an event from the server to the browser. With a continuous socket connection, we won’t need to reload the browser or keep sending an AJAX request at a frequent interval.

Socket.IO in the app19

Instantiate Socket.IO in script.js somewhere:

const socket = io();

Then, insert this code where you are listening to the result event from SpeechRecognition:

socket.emit('chat message', text);

Now, let’s go back to the Node.js code to receive this text and use AI to reply to the user.

Getting A Reply From AI Link

Numerous platforms and services enable you to integrate an app with an AI system using speech-to-text and natural language processing, including IBM’s Watson20, Microsoft’s LUIS21 and Wit.ai4022. To build a quick conversational interface, we will use API.AI, because it provides a free developer account and allows us to set up a small-talk system quickly using its web interface and Node.js library.

Setting Up API.AI Link

Once you’ve created an account, create an “agent.” Refer to the “Getting Started23” guide, step one.

Then, instead of going the full customization route by creating entities and intents, first, simply click the “Small Talk” preset from the left menu, then, secondly, toggle the switch to enable the service.

API.AI Small Talk24

(View large version25)

Customize your small-talk agent as you’d like using the API.AI interface.

Go to the “General Settings” page by clicking the cog icon next to your agent’s name in the menu, and get your API key. You will need the “client access token” to use the Node.js SDK.

Using the API.AI Node.js SDK Link

Let’s hook up our Node.js app to API.AI using the latter’s Node.js SDK! Go back to your index.js file and initialize API.AI with your access token:

const apiai = require('apiai')(APIAI_TOKEN);

If you just want to run the code locally, you can hardcode your API key here. There are multiple ways to set your environment variables, but I usually set an .env file to include the variables. In the source code on GitHub, I’ve hidden my own credentials by including the file with .gitignore, but you can look at the .env-test26 file to see how it is set.

Now we are using the server-side Socket.IO to receive the result from the browser.

Once the connection is established and the message is received, use the API.AI APIs to retrieve a reply to the user’s message:

io.on('connection', function(socket) {
  socket.on('chat message', (text) => {

    // Get a reply from API.AI

    let apiaiReq = apiai.textRequest(text, {
      sessionId: APIAI_SESSION_ID
    });

    apiaiReq.on('response', (response) => {
      let aiText = response.result.fulfillment.speech;
      socket.emit('bot reply', aiText); // Send the result back to the browser!
    });

    apiaiReq.on('error', (error) => {
      console.log(error);
    });

    apiaiReq.end();

  });
});

When API.AI returns the result, use Socket.IO’s socket.emit() to send it back to the browser.

Giving The AI A Voice With The SpeechSynthesis Interface Link

Let’s go back to script.js once again to finish off the app!

Create a function to generate a synthetic voice. This time, we are using the SpeechSynthesis controller interface of the Web Speech API.

The function takes a string as an argument and enables the browser to speak the text:

function synthVoice(text) {
  const synth = window.speechSynthesis;
  const utterance = new SpeechSynthesisUtterance();
  utterance.text = text;
  synth.speak(utterance);
}

In the function, first, create a reference to the API entry point, window.speechSynthesis. You might notice that there is no prefixed property this time: This API is more widely supported than SpeechRecognition, and all browsers that support it have already dropped the prefix for SpeechSysthesis.

Then, create a new SpeechSynthesisUtterance()27 instance using its constructor, and set the text that will be synthesised when the utterance is spoken. You can set other properties28, such as voice to choose the type of the voices that the browser and operating system should support.

Finally, use the SpeechSynthesis.speak() to let it speak!

Now, get the response from the server using Socket.IO again. Once the message is received, call the function.

socket.on('bot reply', function(replyText) {
  synthVoice(replyText);
});

You are done! Let’s try a chit-chat with our AI bot!

Demo in GIF animation29
(View large version30)

Note that the browser will ask you for permission to use the microphone the first time. Like other web APIs, such as the Geolocation API and the Notification API, the browser will never access your sensitive information unless you grant it, so your voice will not be secretly recorded without your knowledge.

You will soon get bored with the conversation because the AI is too simple. However, API.AI is configurable and trainable. Read the API.AI documentation31 to make it smarter.

I hope you’ve enjoyed the tutorial and created a fun chatbot!

Push The Web To The Future! Link

Voice interaction has transformed the way users control computing and connected devices. Now with the Web Speech API, the user experience is transforming on the web, too. Combined with AI and deep learning, your web apps will become more intelligent and provide better experiences for users!

References Link

This tutorial has covered only the core features of the API, but the API is actually pretty flexible and customizable. You can change the language of recognition and synthesis, the synthetic voice, including the accent (like US or UK English), the speech pitch and the speech rate. You can learn more about the API here:

Also, to learn Node.js and the libraries used in this tutorial, check out the following:

Finally, check out the different natural-language-processing tools and conversational platforms!

(rb, yk, al, il)

Footnotes Link

  1. 1 https://www.smashingmagazine.com/2017/05/build-action-google-home-api-ai/
  2. 2 https://www.smashingmagazine.com/2017/05/designing-voice-experiences/
  3. 3 https://www.smashingmagazine.com/2015/03/web-accessibility-with-accessibility-api/
  4. 4 https://www.smashingmagazine.com/2014/12/enhancing-ux-with-the-web-speech-api/
  5. 5 http://caniuse.com/#search=speech
  6. 6 https://www.smashingmagazine.com/wp-content/uploads/2017/06/browser-webspeech-large-opt-1.png
  7. 7 https://www.smashingmagazine.com/wp-content/uploads/2017/06/browser-webspeech-large-opt-1.png
  8. 8 https://www.smashingmagazine.com/wp-content/uploads/2017/06/chatapp_with_web-speech_api-preview-opt-1.png
  9. 9 https://github.com/girliemac/web-speech-ai
  10. 10 https://nodejs.org
  11. 11 https://expressjs.com/
  12. 12 https://socket.io
  13. 13 https://api.ai
  14. 14 https://developer.mozilla.org/en-US/docs/Web/API/SpeechRecognition
  15. 15 https://github.com/girliemac/web-speech-ai
  16. 16 https://developer.mozilla.org/en-US/docs/Web/API/SpeechRecognition
  17. 17 https://developer.mozilla.org/en-US/docs/Web/API/SpeechRecognition
  18. 18 https://socket.io/
  19. 19 https://www.smashingmagazine.com/wp-content/uploads/2017/06/using_socketio-preview-opt-1.png
  20. 20 https://www.ibm.com/watson/
  21. 21 https://www.luis.ai/
  22. 22 https://wit.ai/
  23. 23 https://docs.api.ai/docs/get-started
  24. 24 https://www.smashingmagazine.com/wp-content/uploads/2017/06/apiai-smalltalk-large-opt-1.png
  25. 25 https://www.smashingmagazine.com/wp-content/uploads/2017/06/apiai-smalltalk-large-opt-1.png
  26. 26 https://github.com/girliemac/web-speech-ai/blob/master/.env_test
  27. 27 https://developer.mozilla.org/en-US/docs/Web/API/SpeechSynthesisUtterance
  28. 28 https://developer.mozilla.org/en-US/docs/Web/API/SpeechSynthesisUtterance
  29. 29 https://www.smashingmagazine.com/wp-content/uploads/2017/06/webspeech-api-demo.gif
  30. 30 https://www.smashingmagazine.com/wp-content/uploads/2017/06/webspeech-api-demo.gif
  31. 31 https://docs.api.ai/
  32. 32 https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API
  33. 33 https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html
  34. 34 https://docs.microsoft.com/en-us/microsoft-edge/dev-guide/multimedia/web-speech-api
  35. 35 https://nodejs.org/en/docs/guides/
  36. 36 https://docs.npmjs.com/
  37. 37 https://expressjs.com/en/starter/hello-world.html
  38. 38 https://socket.io/get-started/
  39. 39 https://api.ai/
  40. 40 https://wit.ai/
  41. 41 https://www.luis.ai/
  42. 42 https://www.ibm.com/watson/
  43. 43 https://aws.amazon.com/lex/

↑ Back to top Tweet itShare on Facebook

Tomomi Imura (a.k.a girlie_mac) is an avid open web & open technology advocate, and a creative technologist, who is currently working at Slack in San Francisco. When she is not at work, she still geeks around and tries to combine technology with cats- her past projects include HTTP Status Cats, and Raspberry Pi Cat Camera.

  1. 1

    So a speech recognition API and Node.js helps in creating chatbot. But what if we cant work on Node.js? Is there any other tools which help to create chatbot.

    0
    • 2

      Yep. If you’d taken the time to take a look at the API.ai site, you might have found out it’s supporting a lot of different programming languages.

      I had the very same question and asked myself: “Why not shed the usual trendish node.js requirement and try ‘Building a simple chatbot with Speech API and LAMP’ instead?”

      As it turns out, API.ai got an unofficial PHP-SDK, referenced in their official documentation.

      For bi-directional communication aka Real-Time Communication (RTC) there is this simple overview article at phpbuilder, which lists a few options. Scavenging github and sf.net should turn up a few more though.

      That should be enough pieces to build a LAMP-centric version of this node.js application :)

      cu, w0lf.

      -2
    • 3

      Hi Salman,
      This article is mainly about the web API on browsers and the node.js part is really optional. But if you’d like to use api.ai with the web API like I do, they have SDKs for other langs.
      https://api.ai/docs/sdks

      0
  2. 4

    Great article Tomomi!
    One thing I’d like to confirm though is the start script command in your git repo’s package.json file: nf start. Does this mean we’ll have to install foreman as a dev dependency? I didn’t see it mentioned in your article or in the current package.json, so just curious if I’m missing something here.

    0
  3. 6

    hello, Tomomi.
    I set up webspeech in my server according to your guide and github source.
    But, it’s not working in my browser. (Chrome, edge, firefox)
    These browser is latetest version.
    In all browser, microphone is not working.
    In the case of chrome, chrome block microphone function.
    Could you solve this issue?
    I tried your test site (webspeech.herokuapp.com).
    It happen same problem.
    Thanks.

    0
  4. 9

    franckstifler

    August 11, 2017 9:37 am

    Where is SESSION_ID coming from? I can’t find it in the API.AI documentation

    3
    • 10

      I am in the same boat, I don’t know where the session id is from

      0
    • 11

      yeah can you clarify?

      0
      • 12

        In the source code I found this piece of snippet,


        const APIAI_TOKEN = process.env.APIAI_TOKEN;
        const APIAI_SESSION_ID = process.env.APIAI_SESSION_ID;

        This suggests both API key and API session id will be provided by API.AI.

        2
        • 13

          George Perivolarakis

          August 12, 2017 8:10 pm

          I create my own session foreach user in a config file.

          APIAI_SESSION_ID: `session_${Math.random() * 10000}

          const APIAI_SESSION_ID = process.env.APIAI_SESSION_ID || config.APIAI_SESSION_ID;

          And the result is ‘my session’ from the config.
          It works with a hardcoded string also, but I guess it should be unique.

          0
          • 14

            Georgios Perivolarakis

            August 13, 2017 5:49 pm

            Actually in response JSON from api.ai there is a property "sessionId": "431bcfec-37d9-43bc-97d6-b52a2e826cff"

            -2
    • 15

      It is the developer access token given by api.ai That isn’t clear in the tutorial though

      0
    • 16

      Raghu answered the question :-)

      I guess I omit a bit too much info I should’ve added. I tried not to make the tutorial too complex by talking about the api.ai to much… but I missed out something I really needed to include.

      Thank you all for pointing it out!
      <3

      0
  5. 17

    I’m sure it’s something staring me right in the face, but how do I find the APIAI Session ID?

    0
  6. 18

    This is such a lovely explanation. Thank you !
    This is my version of it: https://github.com/PezCoder/ai-chatbot
    I added listeners on ‘onspeechstart’ & ‘onspeechend’ to start & end animations accordingly.
    It’s fun !

    1
  7. 20

    Session ID is used to keep with a particluar user conversation unique and manage the context at api.ai end.
    Session ID can be any random number to run it we can hardcode it with any value.
    but better is create session id for each logged in user and pass the same Session ID to api.ai.

    Main problem I am facing is in the socket.on method we do not receive the request object and so not able to get the session for each unique user

    0
    • 21

      Thanks for addressing the issue. Yes, you’re right about the session per user. I simplified the app to make it easier to write up the tutorial, but I didn’t make the part clear.

      I’ll need to spend some time to take a look at the issue!

      0
  8. 22

    When I’m trying to run server, I get an ‘ ‘clientAccessToken’ cannot be empty’ error.
    In .env file I assigned APIAI_TOKEN to Client access token from apiai console as well as APIAI_SESSION_ID to Developer access token from apiai console . Why is it failing?

    0
    • 23

      try by making a direct change in the index.js file like below
      const APIAI_TOKEN = ‘275563c346d544899a86a9165c608b55’;
      const APIAI_SESSION_ID = ‘123’;

      1
  9. 24

    Jakub Jedryszek

    August 16, 2017 5:53 am

    You can you my voiceCmdr library for adding voice commands: https://github.com/jj09/voiceCmdr

    0
  10. 25

    hello, Tomomi.
    I’m testing something using your web speech API.
    So, I met a little strange something.

    After I set up web-speech (with api.ai), I tested.
    But, voice (TTS) seems like korean.
    When I watch your demo video, voice seems like American.
    How can I convert to American voice?

    0
  11. 28

    I am getting this error response from api.ai

    { [Error: connect ECONNREFUSED 35.190.15.252:443]
    code: ‘ECONNREFUSED’,
    errno: ‘ECONNREFUSED’,
    syscall: ‘connect’,
    address: ‘35.190.15.252’,
    port: 443 }

    There seems to be an issue open:
    https://github.com/api-ai/apiai-nodejs-client/issues/66

    I am using same codebase, wonder how it works for you?

    0
  12. 29

    When I’m trying to run server, I get an ‘ ‘clientAccessToken’ cannot be empty’ error.
    In .env file I assigned APIAI_TOKEN to Client access token from apiai console as well as APIAI_SESSION_ID to Developer access token from apiai console . Why is it failing?

    Facing the same issue as Luis…

    Please help…

    0
  13. 30

    That works absolutely fine. But I am facing a problem, when I am entering http://localhost:5000/ it is working, but in stead if enter my pc’s local ip address like http://10.11.201.93:5000/ then when I am pressing the mike it’s results- Bot replied: Error: not-allowed

    I edited the app.listen code in index.js like that

    const server = app.listen(5000,'0.0.0.0', function() {
    console.log('Express server listening on port %d in %s mode', server.address().port, app.settings.env);
    });

    But still it’s not working for me. Could someone help me in this regard ?

    0

Leave a Comment

You may use simple HTML to add links or lists to your comment. Also, use <pre><code class="language-*">...</code></pre> to mark up code snippets. We support -js, -markup and -css for comments.

↑ Back to top