Synonym suggester tool takes an existing model, analyses its synonyms and intents and comes up with a list of synonyms that are currently missing that you might want to add to your model.
This tool is accessed via REST call. It is based on Google's BERT and Facebook fasttext models. It requires @NCIntentSample or @NCIntentSampleRef annotations present on intent callbacks. When invoked, the tool scans the given data model for intents and these annotations, and based on these samples tries to determine which synonyms are missing in the model.
Single Word Synonyms
Synonym suggester tool analyses only single word synonyms ignoring any multi-word synonyms. You can often convert a named element with multi-word synonyms into a combination of multiple named elements each with a single word synonyms using Composable NERs technique.
In order to use this tool the ctxword
server and NLPCraft server should be started as well as the server's configuration should potentially be updated.
ctxword
Server As of this writing (Dec 2020) the ctxword
server and its dependencies work only with Python 3.6-3.8 version.
'ctxword' server is a Python-based module that provides BERT and fasttext based implementation for finding a contextually related words for a given word from the input sentence. NLPCraft server interacts with 'ctxword' server via internal REST interface. To configure NLPCraft server and start 'ctxword' Python-based server follow these steps:
Install necessary dependencies by running the following commands from the NLPCraft installation directory:
NOTE: this step should only be performed once.
$ cd nlpcraft/src/main/python/ctxword $ bin/install_dependencies.sh
Read src\main\python\ctxword\bin\WINDOWS_SETUP.md
file for manual installation instructions.
nlpcraft.server.ctxword.url
property in nlpcraft.conf
file (or your own configuration file). This property comes with a default endpoint and you only need to change it if you change the 'ctxword' module implementation.$ cd nlpcraft/src/main/python/ctxword $ bin/start_server.{sh|cmd}
1st Start
Note that on the first start the server will try to load compressed BERT model which is not yet available. It will then download this library and compress it which will take a several minutes and may require 10 GB+ of available memory. Subsequent starts will skip this step, and the server will start much faster.REST server should be started.
Synonyms tool can be run in two different ways:
$ bin/nlpcraft.sh help --cmd=model-sugsyn $ bin/nlpcraft.sh model-sugsyn --mdlId=nlpcraft.alarm.ex --minScore=0.5
NOTES:
mldId
parameter is only required if there is more than one model deployed in the connected data probe. If the data probe has only one model you can ommit this parameter.minScore
- Optional minimum confidence score to include into the result, ranging from 0 to 1, default is 0. minScore
of 0 will include all results, and minScore
of 1 will include only results with the absolutely highest confidence score. Values between 0.5 and 0.7 is generally suggested.nlpcraft.sh
for and nlpcraft.cmd
for .bin/nlpcraft.sh help --cmd=model-sugsyn
to get a full help on this command. REST API accepts only POST
HTTP calls and application/json
content type for JSON payload and responses. When issuing a REST call for this tool you will be using the following URL:
https://localhost:8081/api/v1/model/sugsyn
where:
http
http
or https
protocol.localhost:8081
localhost:8081
is the default configuration and can be changed./api/v1
model/sugsyn
The parameters should be passed in as JSON:
{ "acsTok": "qweqw9123uqwe", "mdlId": "nlpcraft.alarm.ex", "minScore": 0.5 }
where:
acsTok
- access token obtain via previous '/signin'
call.mdlId
- ID of the model to run synonym suggester on.minScore
- Optional minimum confidence score to include into the result, ranging from 0 to 1, default is 0. minScore
of 0 will include all results, and minScore
of 1 will include only results with the absolutely highest confidence score. Values between 0.5 and 0.7 is generally suggested. Either way the synonym suggester returns the following JSON result (nlpcraft.alarm.ex
model from Alarm example):
{ "status": "API_OK", "result": { "modelId": "nlpcraft.alarm.ex", "minScore": 0.5, "durationMs": 424.0, "timestamp": 1.60091239852E12, "suggestions": [ { "x:alarm": [ { "score": 1.0, "synonym": "ask" }, { "score": 0.9477103542042674, "synonym": "join" }, { "score": 0.8882341083867801, "synonym": "get" }, { "score": 0.7330826349218547, "synonym": "remember" }, { "score": 0.6902880910527778, "synonym": "contact" }, { "score": 0.6014764219771813, "synonym": "time" }, { "score": 0.5816398376889104, "synonym": "follow" }, { "score": 0.5640882890681899, "synonym": "watch" }, { "score": 0.5139855649326083, "synonym": "stop" }, { "score": 0.5136895804732818, "synonym": "kill" }, { "score": 0.5001167992233122, "synonym": "send" } ] } ], "warnings": [ "Model has too few (3) intents samples. It will negatively affect the quality of suggestions. Try to increase overall sample count to at least 20." ] }
The result is structured as a list of proposed synonyms with their corresponding scores for each model's element. You should analyse the results for their fitness for your model and its existing synonyms. The tool cannot guarantee that every suggested synonym is appropriate or valid - but it gives a good "courtesy" check for potentially missing synonyms.
Run Periodically
It is a good idea to run this tool periodically if you are actively changing the model. With dozens or hundreds of model elements it is very hard to manually maintain quality set of synonyms. With a good list of user input samples for each intent this tool can be indispensable for easy maintenance of the synonyms.