In this short article I would like to introduce Apache NLPCraft - an open source library for adding Natural Language Interface to any application. It enables people to interact with your products using voice or text. The goal of this project from its inception in 2017 was and still is unambiguously straightforward - provide an efficient & highly productive API to develop advanced NLP-based interfaces for modern applications.
Latest Version
At the moment of this writing, the project undergoes incubation at Apache Software Foundation, the latest version of NLPCraft is 0.9.0. In some rare cases, APIs might have changed since this article was written. Consult the latest documentation for up-to-date information.
These are some of the key features that should give you a flavor of what NLPCraft is and what it can do:
Advanced Intent Definition Language (IDL) coupled with deterministic intent matching provide ease of use and unprecedented expressiveness for designing real-life, non-trivial intents.
Easily compose, mix and match new named entities out of built-in or external ones, creating new reusable named entity recognizers on the fly.
Everything you do with NLPCraft is part of your source code. No more awkward web UIs splitting your logic across different incompatible places. Model-as-a-code is built by engineers, and it reflects how engineers work.
REST API and Java-based implementation natively supports the world's largest ecosystem of development tools, multiple programming languages, frameworks and services.
NLPCraft can work with any data source, device, or service - public or private. From databases and SaaS systems, to smart home devices, voice assistants and chatbots.
Advanced out-of-the-box support for maintaining and managing conversational context that is fully integrated with intent matching.
There are two terms that are important to define before we proceed:
When working with NLPCraft you will be dealing with these three main components:
Let’s see how NLPCraft works using a prototypical smart home NLP-based light switch as an example. We would like to be able to say things like "Turn off all the light in the house"
or "Light up the garage, please!"
to control our lights. Note that NLPCraft does not deal with voice-to-text conversion - so this example deals only with the text regardless of its origin.
To accomplish that we need to first develop a data model that involves the following:
For our example we’ll need to define three NEs in our model:
If we can detect these NE in the input text we can perform the necessary operation.
Once we have an idea of what NEs we’ll need - we can start building the data model that would define how these NEs can be detected and ultimately what intents we are going to detecting using these NEs.
In NLPCraft the default built-in approach for detecting NEs is a synonym matching. For each NE you provide a list of synonyms by which it will be found in the input text. To make this task really simple NLPCraft comes with a set of tools including Macro DSL and Intent Definition Language (IDL). Here’s the static model configuration for our example as lightswitch_model.yaml
file that includes NE definitions and one intent:
id: "nlpcraft.lightswitch.ex" name: "Light Switch Example Model" version: "1.0" description: "NLI-powered light switch example model." macros: - name: "<ACTION>" macro: "{turn|switch|dial|control|let|set|get|put}" - name: "<ENTIRE_OPT>" macro: "{entire|full|whole|total|_}" - name: "<LIGHT>" macro: "{all|_} {it|them|light|illumination|lamp|lamplight}" enabledBuiltInTokens: [] # This example doesn't use any built-in tokens. elements: - id: "ls:loc" description: "Location of lights." synonyms: - "<ENTIRE_OPT> {upstairs|downstairs|_} {kitchen|library|closet|garage|office|playroom|{dinning|laundry|play} room}" - "<ENTIRE_OPT> {upstairs|downstairs|_} {master|kid|children|child|guest|_} {bedroom|bathroom|washroom|storage} {closet|_}" - "<ENTIRE_OPT> {house|home|building|{1st|first} floor|{2nd|second} floor}" - id: "ls:on" groups: - "act" description: "Light switch ON action." synonyms: - "<ACTION> {on|up|_} <LIGHT> {on|up|_}" - "<LIGHT> {on|up}" - id: "ls:off" groups: - "act" description: "Light switch OFF action." synonyms: - "<ACTION> <LIGHT> {off|out}" - "{<ACTION>|shut|kill|stop|eliminate} {off|out} <LIGHT>" - "no <LIGHT>" intents: - "intent=ls term(act)={has(tok_groups, 'act')} term(loc)={# == 'ls:loc'}*"
NOTES:
"ls:loc"
to define the location."ls:on"
to define ON action."ls:off"
to define OFF action."ls:on"
and "ls:off"
into group "act"
for easier use in the intent."ls"
uses Intent Definition Language. In short - this intent will match if we can detect one element from the group "act"
and zero or more "ls:loc"
elements in any order.Notice that Macro DSL allow you to define hundreds and often thousands of synonyms for each model element with only a few lines of YAML (or JSON). In the above model, for example, the three elements have over 7,700 unique synonyms after all Macro DSL processing.
Now, dealing with synonyms may sound limiting at first. It is, however, a surprisingly powerful and flexible mechanism for the domain-specific applications:
It’s also important to note that if synonym based approach is not enough NLPCraft, of course, supports custom NE resolvers that can use any desired approach and technique.
Although full explanation of the intent matching algorithm is outside the scope of this article, the basic workflow looks like this:
Below is a full implementation of the data model in Scala (the same implementation in Java or Kotlin is practically identical):
package org.apache.nlpcraft.examples.lightswitch import org.apache.nlpcraft.model.{NCIntentTerm, _} class LightSwitchModel extends NCModelFileAdapter("org/apache/nlpcraft/examples/lightswitch/lightswitch_model.yaml") { @NCIntentRef("ls") def onMatch( @NCIntentTerm("act") actTok: NCToken, @NCIntentTerm("loc") locToks: List[NCToken] ): NCResult = { val status = if (actTok.getId == "ls:on") "on" else "off" val locations = if (locToks.isEmpty) "entire house" else locToks.map(_.meta[String]("nlpcraft:nlp:origtext")).mkString(", ") // Add HomeKit, Arduino or other integration here. // By default - return a descriptive action string. NCResult.text(s"Lights are [$status] in [${locations.toLowerCase}].") } }
NOTES:
onMatch(...)
is a callback function for our intent "ls"
(define above in the lightswitch_model.yaml
file).onMatch(...)
has two input parameters:"act"
term."loc"
term.onMatch(...)
method is pretty straightforward: it detects the desired action from "actTok"
and the places to operate the lights on from "locToks"
parameters.We’ll leave outside of this article the details of the particular integration with HomeKit or Arduino devices. We’ll also defer to the NLPCraft documentation to learn about other topics such as conversation management, details of Macro DSL and Intent Definition Language, built-in testing tools, 3rd party NER integrations, etc.
In a couple dozen lines of code we’ve created a non-trivial application that understands free-speech natural language interface to operate a simple lightswitch. You can ask it many things like:
"Turn the lights off in the entire house."
"Switch on the illumination in the master bedroom closet."
"Get the lights on."
"Lights up in the kitchen."
"Please, put the light out in the upstairs bedroom."
"Set the lights on in the entire house."
"Turn the lights off in the guest bedroom."
"Could you please switch off all the lights?"
"Dial off illumination on the 2nd floor."
"Please, no lights!"
"Kill off all the lights now!"
"No lights in the bedroom, please."
You can also extend this model and support different locations or actions like dimming or brightening the lights, etc. Model-as-a-code approach enables you to iterate on this in the matter of minutes without heavy re-training cycles.
Hopefully, this short introduction to Apache NLPCraft gave you a hint of what it is capable of and what you can do with it.