• Docs
  • Resources
  • Community
  • Use Cases
  • Downloads
  • v.0.9.0
  • GitHub
  1. Home
  2. Resources
  3. Introduction To NLPCraft

Learn

  • Developer Guide
  • Overview
  • Installation
  • First Example
  • Data Model
  • Intent Matching
  • Short-Term Memory
  • Server & Probe
  • Metrics & Tracing
  • Integrations
  • REST API
  • Tools
  • nlpcraft.{sh|cmd}
  • Test Framework
  • Embedded Probe
  • SQL Model Generator
  • Synonyms Tool
  • Examples
  • Alarm Clock
  • Light Switch
  • Weather Bot
  • SQL Model
Quick Introduction to Apache NLPCraft
Nikita Ivanov
November 16, 2020
This blog is an English adaptation of Sergey Kamov's blog written in Russian.

In this short article I would like to introduce Apache NLPCraft - an open source library for adding Natural Language Interface to any application. It enables people to interact with your products using voice or text. The goal of this project from its inception in 2017 was and still is unambiguously straightforward - provide an efficient & highly productive API to develop advanced NLP-based interfaces for modern applications.

Latest Version

At the moment of this writing, the project undergoes incubation at Apache Software Foundation, the latest version of NLPCraft is 0.9.0. In some rare cases, APIs might have changed since this article was written. Consult the latest documentation for up-to-date information.

Overview

These are some of the key features that should give you a flavor of what NLPCraft is and what it can do:

Intent Definition Language

Advanced Intent Definition Language (IDL) coupled with deterministic intent matching provide ease of use and unprecedented expressiveness for designing real-life, non-trivial intents.

Composable Named Entities

Easily compose, mix and match new named entities out of built-in or external ones, creating new reusable named entity recognizers on the fly.

Model-As-A-Code

Everything you do with NLPCraft is part of your source code. No more awkward web UIs splitting your logic across different incompatible places. Model-as-a-code is built by engineers, and it reflects how engineers work.

Java First

REST API and Java-based implementation natively supports the world's largest ecosystem of development tools, multiple programming languages, frameworks and services.

Any Data Source

NLPCraft can work with any data source, device, or service - public or private. From databases and SaaS systems, to smart home devices, voice assistants and chatbots.

Short-Term-Memory

Advanced out-of-the-box support for maintaining and managing conversational context that is fully integrated with intent matching.

Out-Of-The-Box Integration

NLPCraft natively integrates with 3rd party libraries for basic NLP processing and named entity recognition:

Terminology

There are two terms that are important to define before we proceed:

  • Named Entity (NE) - typically a real world object or entity that can be recognized in the input text (see https://www.wikiwand.com/en/Named_entity for details). Note that NEs can be universal, like country to city name, or domain specific for each model.
  • Intent - a template made out of NEs and a callback function to call when such template is found in the input text.

Runtime Components

When working with NLPCraft you will be dealing with these three main components:

  • Data Model
    Data Model encapsulates the NLPCraft’s model-as-a-code approach where the data model is simply an implementation of NCDataModel interface. Typically, static model configuration and NE definitions are abstracted out to an external YAML or JSON file.
  • Data Probe
    Data probe is a light-weight container that securely deploys and manages data models. You can have multiple data probes and each data probe can host multiple models.
  • REST Server
    REST server provides URL endpoint for user applications to securely query data sources using natural language. REST server routes such requests to a data probe that hosts the requested data model.
Fig 1. NLPCraft Runtime Components

Smart Home Example

Let’s see how NLPCraft works using a prototypical smart home NLP-based light switch as an example. We would like to be able to say things like "Turn off all the light in the house" or "Light up the garage, please!" to control our lights. Note that NLPCraft does not deal with voice-to-text conversion - so this example deals only with the text regardless of its origin.

To accomplish that we need to first develop a data model that involves the following:

  • Determining which NEs we need and how to detect them in the text.
  • Use these NEs to define intents as well as code their callbacks for when such an intent is found in the input text.

Required Named Entities (NEs)

For our example we’ll need to define three NEs in our model:

  • A place where we need to turn on or off the lights like “kitchen” or “garage”.
  • Two actions: “on” and “off”.

If we can detect these NE in the input text we can perform the necessary operation.

Data Model

Once we have an idea of what NEs we’ll need - we can start building the data model that would define how these NEs can be detected and ultimately what intents we are going to detecting using these NEs.

In NLPCraft the default built-in approach for detecting NEs is a synonym matching. For each NE you provide a list of synonyms by which it will be found in the input text. To make this task really simple NLPCraft comes with a set of tools including Macro DSL and Intent Definition Language (IDL). Here’s the static model configuration for our example as lightswitch_model.yaml file that includes NE definitions and one intent:

        id: "nlpcraft.lightswitch.ex"
        name: "Light Switch Example Model"
        version: "1.0"
        description: "NLI-powered light switch example model."
        macros:
          - name: "<ACTION>"
            macro: "{turn|switch|dial|control|let|set|get|put}"
          - name: "<ENTIRE_OPT>"
            macro: "{entire|full|whole|total|_}"
          - name: "<LIGHT>"
            macro: "{all|_} {it|them|light|illumination|lamp|lamplight}"
        enabledBuiltInTokens: [] # This example doesn't use any built-in tokens.
        elements:
          - id: "ls:loc"
            description: "Location of lights."
            synonyms:
              - "<ENTIRE_OPT> {upstairs|downstairs|_} {kitchen|library|closet|garage|office|playroom|{dinning|laundry|play} room}"
              - "<ENTIRE_OPT> {upstairs|downstairs|_} {master|kid|children|child|guest|_} {bedroom|bathroom|washroom|storage} {closet|_}"
              - "<ENTIRE_OPT> {house|home|building|{1st|first} floor|{2nd|second} floor}"
        
          - id: "ls:on"
            groups:
              - "act"
            description: "Light switch ON action."
            synonyms:
              - "<ACTION> {on|up|_} <LIGHT> {on|up|_}"
              - "<LIGHT> {on|up}"
        
          - id: "ls:off"
            groups:
              - "act"
            description: "Light switch OFF action."
            synonyms:
              - "<ACTION> <LIGHT> {off|out}"
              - "{<ACTION>|shut|kill|stop|eliminate} {off|out} <LIGHT>"
              - "no <LIGHT>"
        intents:
          - "intent=ls term(act)={has(tok_groups, 'act')} term(loc)={# == 'ls:loc'}*"
    

NOTES:

  • We define 3 model elements (corresponding to the NEs we’ve come up with earlier):
    • "ls:loc" to define the location.
    • "ls:on" to define ON action.
    • "ls:off" to define OFF action.
  • We grouped "ls:on" and "ls:off" into group "act" for easier use in the intent.
  • Each model element is defined through synonyms using Macro DSL.
  • Intent "ls" uses Intent Definition Language. In short - this intent will match if we can detect one element from the group "act" and zero or more "ls:loc" elements in any order.

Notice that Macro DSL allow you to define hundreds and often thousands of synonyms for each model element with only a few lines of YAML (or JSON). In the above model, for example, the three elements have over 7,700 unique synonyms after all Macro DSL processing.

Now, dealing with synonyms may sound limiting at first. It is, however, a surprisingly powerful and flexible mechanism for the domain-specific applications:

  • Synonyms can be developed, extended and tested using many tools that NLPCraft provides like Macro DSL, Intent Definition Language and Synonym Suggester (that uses Google’ BERT and Facebook fastText models).
  • Classic NLP problems like word-sense disambiguation (“bass” the fish, and “bass” the sound) are not a real concern with semantic modelling for domain specific models.
  • Unlike neural networks-based approaches, synonym based NEs and intents don’t require expensive and time consuming model training and creation of specialized large text corpora.
  • Synonym-based NE detection and Intent matching is fully deterministic (i.e. predictable) which is a stark contrast to deep learning approaches that allow only for probabilistic results.

It’s also important to note that if synonym based approach is not enough NLPCraft, of course, supports custom NE resolvers that can use any desired approach and technique.

Intent Matching

Although full explanation of the intent matching algorithm is outside the scope of this article, the basic workflow looks like this:

  • User text is tokenized into individual words and tokens.
  • Each token gets lemmatized and stemmatized, and basic syntactic and grammatical information (such as POS tagging) being obtained.
  • Using these tokens the system tries to detect all NEs defined in the model.
  • Found NEs are matched against all model’s intents and the callback for the winning match, if any, is called.

Model Implementation

Below is a full implementation of the data model in Scala (the same implementation in Java or Kotlin is practically identical):

        package org.apache.nlpcraft.examples.lightswitch

        import org.apache.nlpcraft.model.{NCIntentTerm, _}

        class LightSwitchModel extends NCModelFileAdapter("org/apache/nlpcraft/examples/lightswitch/lightswitch_model.yaml") {
            @NCIntentRef("ls")
            def onMatch(
                @NCIntentTerm("act") actTok: NCToken,
                @NCIntentTerm("loc") locToks: List[NCToken]
            ): NCResult = {
                val status = if (actTok.getId == "ls:on") "on" else "off"
                val locations =
                    if (locToks.isEmpty)
                        "entire house"
                    else
                        locToks.map(_.meta[String]("nlpcraft:nlp:origtext")).mkString(", ")

                // Add HomeKit, Arduino or other integration here.

                // By default - return a descriptive action string.
                NCResult.text(s"Lights are [$status] in [${locations.toLowerCase}].")
            }
        }
    

NOTES:

  • We use an NCModelFileAdapter adapter that allows us to load our static model configuration from a YAML file.
  • Method onMatch(...) is a callback function for our intent "ls" (define above in the lightswitch_model.yaml file).
  • Method onMatch(...) has two input parameters:
    • A single token from the "act" term.
    • A list of tokens (zero or more) from the "loc" term.
  • The body of the onMatch(...) method is pretty straightforward: it detects the desired action from "actTok" and the places to operate the lights on from "locToks" parameters.

We’ll leave outside of this article the details of the particular integration with HomeKit or Arduino devices. We’ll also defer to the NLPCraft documentation to learn about other topics such as conversation management, details of Macro DSL and Intent Definition Language, built-in testing tools, 3rd party NER integrations, etc.

Conclusion

In a couple dozen lines of code we’ve created a non-trivial application that understands free-speech natural language interface to operate a simple lightswitch. You can ask it many things like:

  • "Turn the lights off in the entire house."
  • "Switch on the illumination in the master bedroom closet."
  • "Get the lights on."
  • "Lights up in the kitchen."
  • "Please, put the light out in the upstairs bedroom."
  • "Set the lights on in the entire house."
  • "Turn the lights off in the guest bedroom."
  • "Could you please switch off all the lights?"
  • "Dial off illumination on the 2nd floor."
  • "Please, no lights!"
  • "Kill off all the lights now!"
  • "No lights in the bedroom, please."

You can also extend this model and support different locations or actions like dimming or brightening the lights, etc. Model-as-a-code approach enables you to iterate on this in the matter of minutes without heavy re-training cycles.

Hopefully, this short introduction to Apache NLPCraft gave you a hint of what it is capable of and what you can do with it.

  • Quick Links
  • Examples
  • Javadoc
  • REST API
  • Download
  • Cheat Sheet
  • News & Events
  • Support
  • JIRA
  • Dev List
  • Stack Overflow
  • GitHub
  • Gitter
  • Twitter
  • YouTube
Copyright © 2021 Apache Software Foundation asf Events • Privacy • News • Docs release: 0.9.0 Gitter Built in: