In this chapter we'll look into testing bots and see how it's possible to increase the testability of bots via a clean separation of concerns.
Let's say we're building a chatbot which allows us to play music.
const { musicService } = require('./services');
const bot = async (req, res) => {
const artistResult = /play (.*)/i.exec(req.text);
if (artistResult) {
const artist = artistResult[1];
await musicService.play(artist);
res.send(`Playing ${artist}`);
return;
}
const randomResult = /play random/i.test(req.text);
if (randomResult) {
await musicService.playRandom();
res.send(`Playing random music`);
return;
}
}
Now let's test it. We need to create a mocked version of res
so that we can easily examine the results of send
, and create stubs for our musicService
with Sinon.JS to ensure our tests don't actually call the service:
require('mocha');
const { expect } = require('chai');
const { musicService } = require('./services');
const { createSandbox } = require('sinon')
describe("bot", () => {
const tests = [
['play artist', ['Playing artist']],
['play random', ['Playing random music']],
];
const sandbox = createSandbox();
beforeEach(() => {
sandbox.stub(musicService, 'play').resolves();
sandbox.stub(musicService, 'playRandom').resolves();
});
afterEach(() => {
sandbox.restore();
});
for (const [text, expectedResponses] of tests) {
it(`should respond "${expectedResponses}" when user says "${text}"`, done => {
const actualResponses = [];
// mock res to add each response to our array
const res = {
send: text => actualResponses.push(text)
}
bot({ text }, res).then(() => {
expect(actualResponses).deep.equals(expectedResponses);
done();
});
});
}
});
The results look good!
bot
✓ should respond "Playing artist" when I say "play artist"
✓ should respond "Playing random music" when I say "play random"
2 passing (11ms)
But these successful test results hide a problem. The test for playing a specific artist is matching when someone says play random.
We could (should?) fix this by looking at the call count for play
and playRandom
. While this would work for our relatively simplistic bot which contains two possible outcomes, each mapping to a single service call, this doesn't scale to anything remotely real world.
It's common for multiple calls to be made, and refactorings to occur. The former means lots of code to ensure the right functions are being called, and the latter requires cascading updates to all of the various utterances and logic paths we're testing. It won't take long before our test code becomes unwieldy.
Our situation is a direct result of the imperative style used to code bot
- at some point a decision is made to take an action, the action is taken, and that's that. The function representing our bot is both determining what action to perform, and performing the action.
It's hard to test, which is often an indication our code could be improved.
There's a different approach, which is to split this into two steps:
- One function which returns a result indicating which action to take
- One function that performs that action
Our test code can separately test the first function, and the action that it should take. And our running code would simply execute both steps.
Let's create a class representing "an action to take":
class ActionReference {
constructor (
name: string,
...args: any[]
) {
this.name = name;
this.args = args;
}
}
Now we can recode bot
by starting with the decision making piece:
const botLogic = (req) => {
const artistResult = /play (.*)/i.exec(req.text);
if (artistResult) {
return new ActionReference('playArtist', [artistResult[1]]);
}
const randomResult = /play random/i.test(req.text);
if (randomResult) {
return new ActionReference('playRandom');
}
// etc.
}
You may wonder why we created an ActionReference
class when botLogic
could have created an anonymous object literal to return, as demonstrated below:
if (artistResult)
return {
name: 'playArtist',
args: [artistResult[1]];
}
Spoiler: later on we will create functions that return other values and it's helpful to have a way to positively distinguish an ActionReference
(by testing instanceof ActionReference
) from non-ActionReference
objects that might happen to include the same properties.
Note that botLogic
only takes req
as a parameter. In our refactor, botLogic
only makes decisions about which action to take; it doesn't actually execute the action.
We'll recode our tests accordingly:
describe("botLogic", () => {
const tests = [
['play artist', 'playArtist', 'artist'],
['play random', 'playRandom'],
]
for (const [text, name, ...args] of tests) {
it(`should call ${name}(${args.join(', ')}) when user says "${text}"`, done => {
botLogic({ text }).then(result => {
expect(result).deep.equals(new ActionReference(name, ...args));
done();
});
});
}
})
Our new test results look like this:
botLogic
✓ should call playArtist(artist) when I say "play artist"
1) should call playRandom() when I say "play random"
1 passing (11ms)
1 failing
1) should call playRandom() when I say "play random"
AssertionError: expected { name: 'playRandom', args: [] } to deeply equal { Object (name, args) }
+ expected - actual
{
- "args": [
- "random"
- ]
- "name": "playArtist"
+ "args": [ ]
+ "name": "playRandom"
}
Now our test correctly spots our bug. Someone should really fix that thing.
All we need now is the final step - take the action. Here's one approach:
const botAction = async (res, action) => {
if (!action instanceof ActionReference)
return;
switch (action.name) {
case 'playArtist':
await musicService.playArtist(action.args[0]);
res.send(`Playing ${action.args[0]}`);
return;
case 'playRandom':
await musicService.playRandom();
res.send(`Playing random music`);
return;
default:
throw `Unknown name ${action.name}`;
}
}
Note that botAction
takes res
as an argument, but not req
-- that's because all the arguments it needs are in result.args
. botAction
is not analyzing what to do, as botLogic
handles that responsibility; botAction
is solely responsible for performing the action indicated by ActionReference
.
Now we can test each action individually. We might still need to resort to verifying the output, but that's fine, as we have solved the problem of "guess which code produced this output?" The code below does still assume a single service call for each action, but this could easily be updated as the calls grow and change.
describe("botAction", () => {
const tests = [
[['playArtist', 'artist'], ['Playing artist']],
[['playRandom'], ['Playing random music']],
];
for (const [[name, ...args], expectedResponses] of tests) {
it(`should respond "${expectedResponses}" on ${name}(${args.join(', ')})`, done => {
const actualResponses = [];
const stub = sinon.stub(musicService, name).resolves();
const res = {
send: text => actualResponses.push(text)
};
botAction(res, new ActionReference(name, args)).then(done => {
expect(actualResponses).deep.equals(expectedResponses);
expect(stub.calledWith(...args)).to.be.true();
done();
});
}
}
});
These tests look like this:
botAction
✓ should respond 'Playing artist' on playArtist(artist)'
✓ should respond 'Playing random music' on playRandom()
2 passing (11ms)
This shows (correctly) that our issue was with the logic determining which action to take, and not with the implementation of the action itself.
As promised, we have split bot
into two functions -- one which returns a result indicating which action to take, independent of how that action is coded, and one which takes that action, independent of the decision making process which selected it.
We come full circle by combining the two:
const bot = (req, res) => botLogic(req).then(result => botAction(res, result));
This new bot
is functionally equivalent to the original bot
, but its concerns have been cleanly separated. Of course, there's nothing stopping us from integration testing bot
using our original tests. The fact that the integration tests pass when the unit tests fail is a great example of the value of unit testing.
Note: some chatbot systems may not use cleanly separated req
and res
arguments -- that's too bad because it makes it harder to enforce this desirable separation of concerns.
This is progress, but we've introduced some new potential problems:
- Putting every action into a giant
switch
statement is a little awkward. botLogic
can createActionReference
s whose names do not exist -- we'll only find out when we callbotAction
Comprehensive tests will help, but defense in depth tell us not to rely on our tests.
It would be nice if botLogic
could only create ActionReference
s with specific names. We can do that by using an abstraction that creates ActionReference
s for us, checking against a predefined list of functions. This also fixes the switch
problem.
class ActionReferences {
constructor(getActions) {
this.getActions = getActions;
this.reference = {};
for (const name of Object.keys(getActions()))
this.reference[name] = (...args) => new ActionReference(name, ...args);
};
}
doAction(...args) {
return (result) => {
if (!result instanceof ActionReference)
return;
const action = this.getActions(...args)[result.name];
if (!action)
throw `unknown action ${result.name}`;
return action(...result.args);
}
}
}
Now instead of creating a botActions
function we create a (very readable) list of actions as functions:
const actions = new ActionReferences(res => ({
async playArtist(artist) {
await musicService.playArtist(artist);
res.send(`Playing "${artist}"`);
},
async playRandom() {
await musicService.playRandom();
res.send(`I can't delete "${arg}"`);
},
}));
Now when botLogic
wants to create an ActionReference
it does so using actions.reference
, e.g.
return actions.reference.playArtist(result[1]);
What's nice about this is that it resembles a normal function call, but instead of executing playArtist
, it returns a reference to it.
Any attempt to reference a nonexistent action will immediately throw an error:
// throws because actions.reference doesn't include a "goodDog" property
return actions.reference.goodDog();
Note TypeScript will catch incorrect method names and argument types at compile time.
Our revised botLogic
looks like this:
const botLogic = async req => {
const artistResult = /play (.*)/i.exec(req.text);
if (artistResult)
return actions.reference.playArtist(artistResult[1]);
const randomResult = /play/i.test(req.text);
if (randomResult)
return actions.reference.playRandom();
}
Our revised bot
looks like this:
const bot = (req, res) => botLogic(req).then(result => actions.doAction(res)(result));
Which we can simplify to:
const bot = (req, res) => botLogic(req).then(actions.doAction(res));
Our test for botLogic
stays exactly the same, because botLogic
is still just returning ActionReference
s, it's just creating them a little differently.
We could reuse our actions test by replacing botAction
with actions.doAction
and/or we could individually test each action by calling e.g. actions.getActions(res).playArtist('artist)
.
In this section we saw how it's possible to increase the testability of bots via a clean separation of concerns. We created helper classes called ActionReference
and ActionReferences
which also makes our code more readable.
In the next chapter we'll see other advantages to structuring your application to return a result instead of taking an action directly.