Apache NLPCraft - Natural Language Interface

Overview

This example provides a very simple French language implementation for NLI-powered light switch. You can say something like "Éteignez les lumières dans toute la maison" or "Allumez les lumières". By modifying intent callbacks using, for example, HomeKit or Arduino-based controllers you can provide the actual light switching.

Complexity:
Source code: GitHub
Review: All Examples at GitHub

Create New Project

You can create new Scala projects in many ways - we'll use SBT to accomplish this task. Make sure that build.sbt file has the following content:

            ThisBuild / version := "0.1.0-SNAPSHOT"
            ThisBuild / scalaVersion := "3.2.2"
            lazy val root = (project in file("."))
              .settings(
                name := "NLPCraft LightSwitch FR Example",
                version := "1.0.0",
                libraryDependencies += "org.apache.nlpcraft" % "nlpcraft" % "1.0.0",
                libraryDependencies += "org.apache.lucene" % "lucene-analyzers-common" % "8.11.2",
                libraryDependencies += "org.languagetool" % "languagetool-core" % "6.0",
                libraryDependencies += "org.languagetool" % "language-fr" % "6.0",
                libraryDependencies += "org.scalatest" %% "scalatest" % "3.2.15" % "test"
              )

Lines 8, 9 and 10 add libraries which used for support base NLP operations with French language.

NOTE: use the latest versions of Scala and ScalaTest.

Create the following files so that resulting project structure would look like the following:

lightswitch_model_fr.yaml - YAML configuration file which contains model description.
LightSwitchFrModel.scala - Model implementation.
NCFrSemanticEntityParser.scala - Semantic entity parser, custom implementation of NCSemanticEntityParser for French language.
NCFrLemmaPosTokenEnricher.scala - Lemma and point of speech token enricher, custom implementation of NCTokenEnricher for French language.
NCFrStopWordsTokenEnricher.scala - Stop-words token enricher, custom implementation of NCTokenEnricher for French language.
NCFrTokenParser.scala - Token parser, custom implementation of NCTokenParser for French language.
LightSwitchFrModelSpec.scala - Test that allows to test your model.

            |  build.sbt
            +--project
            |    build.properties
            \--src
               +--main
               |  +--resources
               |  |  lightswitch_model_fr.yaml
               |  \--scala
               |     \--demo
               |        |  LightSwitchFrModel.scala
               |        \--nlp
               |           +--entity
               |           |  \--parser
               |           |       NCFrSemanticEntityParser.scala
               |           \--token
               |              +--enricher
               |              |    NCFrLemmaPosTokenEnricher.scala
               |              |    NCFrStopWordsTokenEnricher.scala
               |              \--parser
               |                   NCFrTokenParser.scala
               \--test
                   \--scala
                       \--demo
                            LightSwitchFrModelSpec.scala

Data Model

We are going to start with declaring the static part of our model using YAML which we will later load in our Scala-based model implementation. Open src/main/resources/lightswitch_model_fr.yaml file and replace its content with the following YAML:

            macros:
              "<ACTION>" : "{allumer|laisser|mettre}"
              "<KILL>" : "{éteindre|couper|tuer|arrêter|éliminer|baisser|no}"
              "<ENTIRE_OPT>" : "{entière|pleine|tout|total|_}"
              "<FLOOR_OPT>" : "{là-haut|à l'étage|en bas|{1er|premier|2ème|deuxième|3ème|troisième|4ème|quatrième|5ème|cinquième|dernier|haut|rez-de-chaussée|en bas} étage|_}"
              "<TYPE>" : "{chambre|salle|pièce|placard|mansardé|loft|mezzanine|rangement {chambre|salle|pièce|_}}"
              "<LIGHT>" : "{tout|_} {cela|lumière|éclairage|illumination|lampe}"

            elements:
              - type: "ls:loc"
                description: "Location of lights."
                synonyms:
                  - "<ENTIRE_OPT> <FLOOR_OPT> {cuisine|bibliothèque|placard|garage|bureau|salle de jeux|{salle à manger|buanderie|jeu} <TYPE>}"
                  - "<ENTIRE_OPT> <FLOOR_OPT> {maître|gamin|bébé|enfant|hôte|client|_} {coucher|bains|toilette|rangement} {<TYPE>|_}"
                  - "<ENTIRE_OPT> {maison|foyer|bâtiment|{1er|premier} étage|chaussée|{2ème|deuxième} étage}"

              - type: "ls:on"
                groups:
                  - "act"
                description: "Light switch ON action."
                synonyms:
                  - "{<ACTION>|_} <LIGHT>"
                  - "{<LIGHT>|_} <ACTION>"

              - type: "ls:off"
                groups:
                  - "act"
                description: "Light switch OFF action."
                synonyms:
                  - "<KILL> <LIGHT>"
                  - "<LIGHT> <KILL>"

Line 1 defines several macros that are used later on throughout the model's elements to shorten the synonym declarations. Note how macros coupled with option groups shorten overall synonym declarations 1000:1 vs. manually listing all possible word permutations.
Lines 10, 17, 25 define three model elements: the location of the light, and actions to turn the light on and off. Action elements belong to the same group act which will be used in our intent, defined in LightSwitchFrModel class. Note that these model elements are defined mostly through macros we have defined above.

YAML vs. API

As usual, this YAML-based static model definition is convenient but totally optional. All elements definitions can be provided programmatically inside Scala model LightSwitchFrModel class as well.

Model Class

Open src/main/scala/demo/LightSwitchFrModel.scala file and replace its content with the following code:

            package demo

            import com.google.gson.Gson
            import org.apache.nlpcraft.*
            import org.apache.nlpcraft.annotations.*
            import demo.nlp.entity.parser.NCFrSemanticEntityParser
            import demo.nlp.token.enricher.*
            import demo.nlp.token.parser.NCFrTokenParser
            import scala.jdk.CollectionConverters.*

            class LightSwitchFrModel extends NCModel(
                NCModelConfig("nlpcraft.lightswitch.fr.ex", "LightSwitch Example Model FR", "1.0"),
                new NCPipelineBuilder().
                    withTokenParser(new NCFrTokenParser()).
                    withTokenEnricher(new NCFrLemmaPosTokenEnricher()).
                    withTokenEnricher(new NCFrStopWordsTokenEnricher()).
                    withEntityParser(new NCFrSemanticEntityParser("lightswitch_model_fr.yaml")).
                    build
            ):
                @NCIntent("intent=ls term(act)={has(ent_groups, 'act')} term(loc)={# == 'ls:loc'}*")
                def onMatch(
                    ctx: NCContext,
                    im: NCIntentMatch,
                    @NCIntentTerm("act") actEnt: NCEntity,
                    @NCIntentTerm("loc") locEnts: List[NCEntity]
                ): NCResult =
                    val action = if actEnt.getType == "ls:on" then "allumer" else "éteindre"
                    val locations = if locEnts.isEmpty then "toute la maison" else locEnts.map(_.mkText).mkString(", ")

                    // Add HomeKit, Arduino or other integration here.
                    // By default - just return a descriptive action string.
                    NCResult(new Gson().toJson(Map("locations" -> locations, "action" -> action).asJava))

The intent callback logic is very simple - we return a descriptive confirmation message back (explaining what lights were changed). With action and location detected, you can add the actual light switching using HomeKit or Arduino devices. Let's review this implementation step by step:

On line 11our class extends NCModel with two mandatory parameters.
Line 12 creates model configuration with most default parameters.
Line 13 creates pipeline based on custom French language components:
- NCFrTokenParser - Token parser.
- NCFrLemmaPosTokenEnricher - Lemma and point of speech token enricher.
- NCFrStopWordsTokenEnricher - Stop-words token enricher.
- NCFrSemanticEntityParser - Semantic entity parser extending.
Note that NCFrSemanticEntityParser is based on semantic model definition described in lightswitch_model_fr.yaml file.
Lines 20 and 21 annotate intents ls and its callback method onMatch(). Intent ls requires one action (a token belonging to the group act) and optional list of light locations (zero or more tokens with ID ls:loc) - by default we assume the entire house as a default location.
Lines 24 and 25 map terms from detected intent to the formal method parameters of the onMatch() method.
On the line 32 the intent callback simply returns a confirmation message.

Custom Components

Open src/main/scala/demo/nlp/token/parser/NCFrTokenParser.scala file and replace its content with the following code:

            package demo.nlp.token.parser

            import org.apache.nlpcraft.*
            import org.languagetool.tokenizers.fr.FrenchWordTokenizer
            import scala.jdk.CollectionConverters.*

            class NCFrTokenParser extends NCTokenParser:
                private val tokenizer = new FrenchWordTokenizer

                override def tokenize(text: String): List[NCToken] =
                    val toks = collection.mutable.ArrayBuffer.empty[NCToken]
                    var sumLen = 0

                    for ((word, idx) <- tokenizer.tokenize(text).asScala.zipWithIndex)
                        val start = sumLen
                        val end = sumLen + word.length

                        if word.strip.nonEmpty then
                            toks += new NCPropertyMapAdapter with NCToken:
                                override def getText: String = word
                                override def getIndex: Int = idx
                                override def getStartCharIndex: Int = start
                                override def getEndCharIndex: Int = end

                        sumLen = end

                    toks.toList

NCFrTokenParser is a simple wrapper which implements NCTokenParser based on open source Language Tool library.
Line 19 creates the NCToken instance.

Open src/main/scala/demo/nlp/token/enricher/NCFrLemmaPosTokenEnricher.scala file and replace its content with the following code:

            package demo.nlp.token.enricher

            import org.apache.nlpcraft.*
            import org.languagetool.AnalyzedToken
            import org.languagetool.tagging.fr.FrenchTagger
            import scala.jdk.CollectionConverters.*

            class NCFrLemmaPosTokenEnricher extends NCTokenEnricher:
                private def nvl(v: String, dflt : => String): String = if v != null then v else dflt

                override def enrich(req: NCRequest, cfg: NCModelConfig, toks: List[NCToken]): Unit =
                    val tags = FrenchTagger.INSTANCE.tag(toks.map(_.getText).asJava).asScala

                    require(toks.sizeIs == tags.size)

                    toks.zip(tags).foreach { case (tok, tag) =>
                        val readings = tag.getReadings.asScala

                        val (lemma, pos) = readings.size match
                            // No data. Lemma is word as is, POS is undefined.
                            case 0 => (tok.getText, "")
                            // Takes first. Other variants ignored.
                            case _ =>
                                val aTok: AnalyzedToken = readings.head
                                (nvl(aTok.getLemma, tok.getText), nvl(aTok.getPOSTag, ""))

                        tok.put("pos", pos)
                        tok.put("lemma", lemma)

                        () // Otherwise NPE.
                    }

NCFrLemmaPosTokenEnricher lemma and point of speech tokens enricher is based on open source Language Tool library.
On line 27 and 28 the tokens are enriched by pos and lemma data.

Open src/main/scala/demo/nlp/token/enricher/NCFrStopWordsTokenEnricher.scala file and replace its content with the following code:

            package demo.nlp.token.enricher

            import org.apache.lucene.analysis.fr.FrenchAnalyzer
            import org.apache.nlpcraft.*

            class NCFrStopWordsTokenEnricher extends NCTokenEnricher:
                private final val stops = FrenchAnalyzer.getDefaultStopSet

                private def getPos(t: NCToken): String = t.get("pos").getOrElse(throw new NCException("POS not found in token."))
                private def getLemma(t: NCToken): String = t.get("lemma").getOrElse(throw new NCException("Lemma not found in token."))

                override def enrich(req: NCRequest, cfg: NCModelConfig, toks: List[NCToken]): Unit =
                    for (t <- toks)
                        val lemma = getLemma(t)
                        lazy val pos = getPos(t)

                        t.put(
                            "stopword",
                            lemma.length == 1 && !Character.isLetter(lemma.head) && !Character.isDigit(lemma.head) ||
                            stops.contains(lemma.toLowerCase) ||
                            pos.startsWith("I") ||
                            pos.startsWith("O") ||
                            pos.startsWith("P") ||
                            pos.startsWith("D")
                        )

NCFrStopWordsTokenEnricher is a stop-words tokens enricher based on open source Apache Lucene library.
On line 17 tokens are enriched by stopword flags data.

Open src/main/scala/demo/nlp/entity/parser/NCFrSemanticEntityParser.scala file and replace its content with the following code:

            package demo.nlp.entity.parser

            import demo.nlp.token.parser.NCFrTokenParser
            import opennlp.tools.stemmer.snowball.SnowballStemmer
            import org.apache.nlpcraft.nlp.parsers.*
            import org.apache.nlpcraft.nlp.stemmer.NCStemmer

            class NCFrSemanticEntityParser(src: String) extends NCSemanticEntityParser(
                new NCStemmer:
                    private val stemmer = new SnowballStemmer(SnowballStemmer.ALGORITHM.FRENCH)
                    override def stem(txt: String): String = stemmer.synchronized { stemmer.stem(txt.toLowerCase).toString }
                ,
                new NCFrTokenParser(),
                src
            )

NCFrSemanticEntityParser extends NCSemanticEntityParser. It uses stemmer implementation from Apache OpenNLP project.

Testing

The test defined in LightSwitchFrModelSpec allows to check that all input test sentences are processed correctly and trigger the expected intent ls:

            package demo

            import org.apache.nlpcraft.*
            import org.scalatest.funsuite.AnyFunSuite
            import scala.util.Using

            class LightSwitchFrModelSpec extends AnyFunSuite:
                test("test") {
                    Using.resource(new NCModelClient(new LightSwitchFrModel)) { client =>
                        def check(txt: String): Unit =
                            require(client.debugAsk(txt, "userId", true).getIntentId == "ls")

                        check("Éteignez les lumières dans toute la maison.")
                        check("Éteignez toutes les lumières maintenant.")
                        check("Allumez l'éclairage dans le placard de la chambre des maîtres.")
                        check("Éteindre les lumières au 1er étage.")
                        check("Allumez les lumières.")
                        check("Allumes dans la cuisine.")
                        check("S'il vous plait, éteignez la lumière dans la chambre à l'étage.")
                        check("Allumez les lumières dans toute la maison.")
                        check("Éteignez les lumières dans la chambre d'hôtes.")
                        check("Pourriez-vous éteindre toutes les lumières s'il vous plait?")
                        check("Désactivez l'éclairage au 2ème étage.")
                        check("Éteignez les lumières dans la chambre au 1er étage.")
                        check("Lumières allumées à la cuisine du deuxième étage.")
                        check("S'il te plaît, pas de lumières!")
                        check("Coupez toutes les lumières maintenant!")
                        check("Éteindre les lumières dans le garage.")
                        check("Lumières éteintes dans la cuisine!")
                        check("Augmentez l'éclairage dans le garage et la chambre des maîtres.")
                        check("Baissez toute la lumière maintenant!")
                        check("Pas de lumières dans la chambre, s'il vous plait.")
                        check("Allumez le garage, s'il vous plait.")
                        check("Tuez l'illumination maintenant.")
                    }
                }

Line 9 creates the client for our model.
Line 11 calls a special method debugAsk(). It allows to check the winning intent and its callback parameters without actually calling the intent.
Lines 13-34 define all the test input sentences that should all trigger ls intent.

You can run this test via SBT task executeTests or using IDE.

            $ sbt executeTests

Done! 👌

You've created light switch data model and tested it.

Light Switch FR ex