ex
ex
This example provides a very simple Russian language implementation for NLI-powered light switch. You can say something like "Выключи свет по всем доме" or "Включи свет в детской". By modifying intent callbacks using, for example, HomeKit or Arduino-based controllers you can provide the actual light switching.
Complexity:
Source code: GitHub
Review: All Examples at GitHub
You can create new Scala projects in many ways - we'll use SBT to accomplish this task. Make sure that build.sbt
file has the following content:
ThisBuild / version := "0.1.0-SNAPSHOT" ThisBuild / scalaVersion := "3.2.2" lazy val root = (project in file(".")) .settings( name := "NLPCraft LightSwitch RU Example", version := "1.0.0", libraryDependencies += "org.apache.nlpcraft" % "nlpcraft" % "1.0.0", libraryDependencies += "org.apache.lucene" % "lucene-analyzers-common" % "8.11.2", libraryDependencies += "org.languagetool" % "languagetool-core" % "6.0", libraryDependencies += "org.languagetool" % "language-ru" % "6.0", libraryDependencies += "org.scalatest" %% "scalatest" % "3.2.15" % "test" )
Lines 8, 9 and 10
add libraries which used for support base NLP operations with Russian language.
NOTE: use the latest versions of Scala and ScalaTest.
Create the following files so that resulting project structure would look like the following:
lightswitch_model_ru.yaml
- YAML configuration file which contains model description.LightSwitchRuModel.scala
- Model implementation.NCRuSemanticEntityParser.scala
- Semantic entity parser, custom implementation of NCSemanticEntityParser for Russian language.NCRuLemmaPosTokenEnricher.scala
- Lemma and point of speech token enricher, custom implementation of NCTokenEnricher for Russian language.NCRuStopWordsTokenEnricher.scala
- Stop-words token enricher, custom implementation of NCTokenEnricher for Russian language.NCRuTokenParser.scala
- Token parser, custom implementation of NCTokenParser for Russian language.LightSwitchRuModelSpec.scala
- Test that allows to test your model.| build.sbt +--project | build.properties \--src +--main | +--resources | | lightswitch_model_ru.yaml | \--scala | \--demo | | LightSwitchRuModel.scala | \--nlp | +--entity | | \--parser | | NCRuSemanticEntityParser.scala | \--token | +--enricher | | NCRuLemmaPosTokenEnricher.scala | | NCRuStopWordsTokenEnricher.scala | \--parser | NCRuTokenParser.scala \--test \--scala \--demo LightSwitchRuModelSpec.scala
We are going to start with declaring the static part of our model using YAML which we will later load in our Scala-based model implementation. Open src/main/resources/lightswitch_model_ru.yaml
file and replace its content with the following YAML:
macros: "<TURN_ON>" : "{включить|включать|врубить|врубать|запустить|запускать|зажигать|зажечь}" "<TURN_OFF>" : "{погасить|загасить|гасить|выключить|выключать|вырубить|вырубать|отключить|отключать|убрать|убирать|приглушить|приглушать|стоп}" "<ENTIRE_OPT>" : "{весь|все|всё|повсюду|вокруг|полностью|везде|_}" "<LIGHT_OPT>" : "{это|лампа|бра|люстра|светильник|лампочка|лампа|освещение|свет|электричество|электрика|_}" elements: - type: "ls:loc" description: "Location of lights." synonyms: - "<ENTIRE_OPT> {здание|помещение|дом|кухня|детская|кабинет|гостиная|спальня|ванная|туалет|{большая|обеденная|ванная|детская|туалетная} комната}" - type: "ls:on" groups: - "act" description: "Light switch ON action." synonyms: - "<LIGHT_OPT> <ENTIRE_OPT> <TURN_ON>" - "<TURN_ON> <ENTIRE_OPT> <LIGHT_OPT>" - type: "ls:off" groups: - "act" description: "Light switch OFF action." synonyms: - "<LIGHT_OPT> <ENTIRE_OPT> <TURN_OFF>" - "<TURN_OFF> <ENTIRE_OPT> <LIGHT_OPT>" - "без <ENTIRE_OPT> <LIGHT_OPT>"
Line 1
defines several macros that are used later on throughout the model's elements to shorten the synonym declarations. Note how macros coupled with option groups shorten overall synonym declarations 1000:1 vs. manually listing all possible word permutations.Lines 8, 13, 21
define three model elements: the location of the light, and actions to turn the light on and off. Action elements belong to the same group act
which will be used in our intent, defined in LightSwitchRuModel
class. Note that these model elements are defined mostly through macros we have defined above.YAML vs. API
As usual, this YAML-based static model definition is convenient but totally optional. All elements definitions can be provided programmatically inside Scala model LightSwitchRuModel
class as well.
Open src/main/scala/demo/LightSwitchRuModel.scala
file and replace its content with the following code:
package demo import com.google.gson.Gson import org.apache.nlpcraft.* import org.apache.nlpcraft.annotations.* import demo.nlp.entity.parser.NCRuSemanticEntityParser import demo.nlp.token.enricher.* import demo.nlp.token.parser.NCRuTokenParser import scala.jdk.CollectionConverters.* class LightSwitchRuModel extends NCModel( NCModelConfig("nlpcraft.lightswitch.ru.ex", "LightSwitch Example Model RU", "1.0"), new NCPipelineBuilder(). withTokenParser(new NCRuTokenParser()). withTokenEnricher(new NCRuLemmaPosTokenEnricher()). withTokenEnricher(new NCRuStopWordsTokenEnricher()). withEntityParser(new NCRuSemanticEntityParser("lightswitch_model_ru.yaml")). build ): @NCIntent("intent=ls term(act)={has(ent_groups, 'act')} term(loc)={# == 'ls:loc'}*") def onMatch( ctx: NCContext, im: NCIntentMatch, @NCIntentTerm("act") actEnt: NCEntity, @NCIntentTerm("loc") locEnts: List[NCEntity] ): NCResult = val action = if actEnt.getType == "ls:on" then "включить" else "выключить" val locations = if locEnts.isEmpty then "весь дом" else locEnts.map(_.mkText).mkString(", ") // Add HomeKit, Arduino or other integration here. // By default - just return a descriptive action string. NCResult(new Gson().toJson(Map("locations" -> locations, "action" -> action).asJava))
The intent callback logic is very simple - we return a descriptive confirmation message back (explaining what lights were changed). With action and location detected, you can add the actual light switching using HomeKit or Arduino devices. Let's review this implementation step by step:
line 11
our class extends NCModel with two mandatory parameters.Line 12
creates model configuration with most default parameters.Line 13
creates pipeline based on custom Russian language components:NCRuTokenParser
- Token parser.NCRuLemmaPosTokenEnricher
- Lemma and point of speech token enricher.NCRuStopWordsTokenEnricher
- Stop-words token enricher.NCRuSemanticEntityParser
- Semantic entity parser extending.NCRuSemanticEntityParser
is based on semantic model definition described in lightswitch_model_ru.yaml
file.Lines 20 and 21
annotate intents ls
and its callback method onMatch()
. Intent ls
requires one action (a token belonging to the group act
) and optional list of light locations (zero or more tokens with ID ls:loc
) - by default we assume the entire house as a default location.Lines 24 and 25
map terms from detected intent to the formal method parameters of the onMatch()
method.line 32
the intent callback simply returns a confirmation message. Open src/main/scala/demo/nlp/token/parser/NCRuTokenParser.scala
file and replace its content with the following code:
package demo.nlp.token.parser import org.apache.nlpcraft.* import org.languagetool.tokenizers.WordTokenizer import scala.jdk.CollectionConverters.* class NCRuTokenParser extends NCTokenParser: private val tokenizer = new WordTokenizer override def tokenize(text: String): List[NCToken] = val toks = collection.mutable.ArrayBuffer.empty[NCToken] var sumLen = 0 for ((word, idx) <- tokenizer.tokenize(text).asScala.zipWithIndex) val start = sumLen val end = sumLen + word.length if word.strip.nonEmpty then toks += new NCPropertyMapAdapter with NCToken: override def getText: String = word override def getIndex: Int = idx override def getStartCharIndex: Int = start override def getEndCharIndex: Int = end sumLen = end toks.toList
NCRuTokenParser
is a simple wrapper which implements NCTokenParser based on open source Language Tool library.Line 19
creates the NCToken instance. Open src/main/scala/demo/nlp/token/enricher/NCRuLemmaPosTokenEnricher.scala
file and replace its content with the following code:
package demo.nlp.token.enricher import org.apache.nlpcraft.* import org.languagetool.AnalyzedToken import org.languagetool.tagging.ru.RussianTagger import scala.jdk.CollectionConverters.* class NCRuLemmaPosTokenEnricher extends NCTokenEnricher: private def nvl(v: String, dflt : => String): String = if v != null then v else dflt override def enrich(req: NCRequest, cfg: NCModelConfig, toks: List[NCToken]): Unit = val tags = RussianTagger.INSTANCE.tag(toks.map(_.getText).asJava).asScala require(toks.size == tags.size) toks.zip(tags).foreach { case (tok, tag) => val readings = tag.getReadings.asScala val (lemma, pos) = readings.size match // No data. Lemma is word as is, POS is undefined. case 0 => (tok.getText, "") // Takes first. Other variants ignored. case _ => val aTok: AnalyzedToken = readings.head (nvl(aTok.getLemma, tok.getText), nvl(aTok.getPOSTag, "")) tok.put("pos", pos) tok.put("lemma", lemma) () // Otherwise NPE. }
NCRuLemmaPosTokenEnricher
lemma and point of speech tokens enricher is based on open source Language Tool library.line 27 and 28
the tokens are enriched by pos
and lemma
data. Open src/main/scala/demo/nlp/token/enricher/NCRuStopWordsTokenEnricher.scala
file and replace its content with the following code:
package demo.nlp.token.enricher import org.apache.lucene.analysis.ru.RussianAnalyzer import org.apache.nlpcraft.* class NCRuStopWordsTokenEnricher extends NCTokenEnricher: private val stops = RussianAnalyzer.getDefaultStopSet private def getPos(t: NCToken): String = t.get("pos").getOrElse(throw new NCException("POS not found in token.")) private def getLemma(t: NCToken): String = t.get("lemma").getOrElse(throw new NCException("Lemma not found in token.")) override def enrich(req: NCRequest, cfg: NCModelConfig, toks: List[NCToken]): Unit = for (t <- toks) val lemma = getLemma(t) lazy val pos = getPos(t) t.put( "stopword", lemma.length == 1 && !Character.isLetter(lemma.head) && !Character.isDigit(lemma.head) || stops.contains(lemma.toLowerCase) || pos.startsWith("PARTICLE") || pos.startsWith("INTERJECTION") || pos.startsWith("PREP") )
NCRuStopWordsTokenEnricher
is a stop-words tokens enricher based on open source Apache Lucene library.line 17
tokens are enriched by stopword
flags data. Open src/main/scala/demo/nlp/entity/parser/NCRuSemanticEntityParser.scala
file and replace its content with the following code:
package demo.nlp.entity.parser import opennlp.tools.stemmer.snowball.SnowballStemmer import demo.nlp.token.parser.NCRuTokenParser import org.apache.nlpcraft.nlp.parsers.* import org.apache.nlpcraft.nlp.stemmer.NCStemmer class NCRuSemanticEntityParser(src: String) extends NCSemanticEntityParser( new NCStemmer: private val stemmer = new SnowballStemmer(SnowballStemmer.ALGORITHM.RUSSIAN) override def stem(txt: String): String = stemmer.synchronized { stemmer.stem(txt.toLowerCase).toString } , new NCRuTokenParser(), src )
NCRuSemanticEntityParser
extends NCSemanticEntityParser. It uses stemmer implementation from Apache OpenNLP project. The test defined in LightSwitchRuModelSpec
allows to check that all input test sentences are processed correctly and trigger the expected intent ls
:
package demo import org.apache.nlpcraft.* import org.scalatest.funsuite.AnyFunSuite import scala.util.Using class LightSwitchRuModelSpec extends AnyFunSuite: test("test") { Using.resource(new NCModelClient(new LightSwitchRuModel)) { client => def check(txt: String): Unit = require(client.debugAsk(txt, "userId", true).getIntentId == "ls") check("Выключи свет по всем доме") check("Выруби электричество!") check("Включи свет в детской") check("Включай повсюду освещение") check("Включайте лампы в детской комнате") check("Свет на кухне, пожалуйста, приглуши") check("Нельзя ли повсюду выключить свет?") check("Пожалуйста без света") check("Отключи электричество в ванной") check("Выключи, пожалуйста, тут всюду свет") check("Выключай все!") check("Свет пожалуйста везде включи") check("Зажги лампу на кухне") } }
Line 9
creates the client for our model.Line 11
calls a special method debugAsk(). It allows to check the winning intent and its callback parameters without actually calling the intent.Lines 13-25
define all the test input sentences that should all trigger ls
intent. You can run this test via SBT task executeTests
or using IDE.
$ sbt executeTests
You've created light switch data model and tested it.