Latest Version
At the moment of this writing, the project undergoes incubation at Apache Software Foundation, the latest version of NLPCraft is 0.9.0. In some rare cases, APIs might have changed since this article was written. Consult the latest documentation for up-to-date information.
This article is a second part of the article Проектируем интенты с Apache NLPCraft and contains a detailed description of NLPCraft IDL - Intent Definition Language, created for NLP projects based on the Apache NlpCraft. NLPCraft IDL support has been added to the system since version 0.7.5.
The declarative Intent Definition Language, called NLPCraft IDL, significantly simplifies the process of working with intents in NLP-based dialog and search systems developed using Apache NLPsCraft and at the same time expands the capabilities of them.
Note that at the point of choosing the best matching intent, NLP systems, in general, already have a parsed and processed user input request, which contains combinations of request’s tokens and other related information.
Let's start with examples to demonstrate the general capabilities of the language, provide the necessary explanations, and then describe the design of the language a bit more formally.
intent=xa flow="^(?:login)(^:logout)*$" meta={'enabled': true} term(a)={# != "z"}[1,3] term(b)={ meta_intent('enabled') == true && // Must be active. month == 1 // January. } term(c)~{ // Variables. @usrTypes = meta_model('user_types') // Predicate. (# == 'order' || # == 'order_cancel') && has_all(@usrTypes, list(1, 2, 3) && abs(meta_tok('order:size')) > 10)
NOTES:
xa
.term(a)
- the parsed request must contain between one and three tokens with identifiers other than z
, without taking into account data from the dialogue history (term type =
).term(b)
- the intent must be active - the flag enabled
from the model metadata. In addition, such an intent can only be triggered in January - the month()
built-in function.term(c)
- the token with identifier order
or order_cancel
should be found in the request or in the dialogue history (term type ~
). There are additional restrictions in the model metadata values and also the order size absolute value should be more than 10. In the definition of this term we use IDL variables - and we will talk about them a little bit later.flow=
This section defines an additional rule, according to which for the intent to be triggered, the intent with the login
identifier must have been selected at least once within the current session, and the intent with the identifier logout
should have never been selected in this session. The rule is defined as a regular expression based on the intents IDs of the previous intents in the user session. Other ways to create similar rules will also be described below.meta=
For the given intent, a certain set of data can be defined - in this case a configuration - with which you can enable or disable the intent.intent=xb flow=/#flowModelMethod/ term(a)=/org.mypackage.MyClass#termMethod/? fragment(frag)
NOTES:
xb
.term(a)
The intent can contain one optional term (?
quantifier, detailed explanations will be given below), defined in the code - org.mypackage.MyClass#termMethod
method.fragment(frag)
Fragment with identifier frag
extends the list of terms of the intent with additional terms that are defined elsewhere (in fragment
expression). The frag
element must be defined above in the code, or accessible via import statement.flow=
Flow contains the condition specified in the model code in the method flowModelMethod
.You can find more examples here
Note that this article is only a brief overview of the capabilities of NLPCraft IDL and is not trying to be a comprehensive manual. Before starting to work with NLPCraft, it is recommended to study a detailed description of the syntax and all the features review of the language in the corresponding sections of the documentation.
*.idl
extension). In this case, the model will refer to these intents according to the specified path to these files or URL resources using the import statement. This approach is convenient when working with large models, when syntax highlighting and other features provided by the IDE may be useful (for example, Intellij IDEA provides keyword highlighting, hints and syntax checking for files for the configured types). In addition, this approach can be useful for specialists who do not have the ability or desire to edit the code directly. An example is available at here and here. NLPCraft IDL has only 10 keywords: flow, fragment, import, intent, meta, ordered, term, true, false, null.
intent, flow, fragment, meta, options, term
are parts of the intent definition.fragment
keyword is also can be used to create named terms lists to include in intent definitions (a-la macros).import
- required for including external files with fragment, intent or imports statements.true, false, null
- used when working with built-in functions.An intent compiled when the model loaded and can be debugged only when the model is being debugged. To define complex intents, it is recommended to create them in their own separated files and use the editing capabilities provided by the IDE to ensure initial validation of the syntax.
The program contains a set of the following optional elements, in no particular order:
import
statementfragment
statementintent
statementimport
Statement It contains the import
keyword and the file name or URL resource in parentheses. It allows to import other IDL definitions from external files or URLs. For example:
import('http://mysite.com/nlp/idls/external.idl)
fragment
Statement It contains the fragment
keyword with the name and a list of terms. Terms can be parameterized. An example of a simple fragment:
fragment=buzz term~{tok_id() == 'x:alarm'}
An example of a parameterized fragment with arguments a
and b
:
fragment=p1 term={ meta_frag('a') && has_any(get(meta_frag('b'), 'Москва'), list(1, 2)) }
Below is an example of using this fragment in an intent. Note that fragment’s parameters (its metadata) is passed using JSON format:
intent=i1 fragment(p1, {'a': true, 'b': {'Москва': [1, 2, 3]}})
intent
StatementThis statement is the core element of IDL allowing to declare the intent. Here's the example of the simple intent:
intent=xa flow="^(?:login)(^:logout)*$" meta={'enabled': true} term(a)={# != "z"}[1,3]
Every intent statement consists of the following elements:
Intent Name
Required element. The name is a unique identifier within the model. Below is an example of using a reference to the xa
intent.
@NCIntentRef("xa") fun onTimeMatch( ctx: NCIntentMatch, @NCIntentTerm("a") tok: NCToken ): NCResult { ... }
Intent Terms
At least one term is required. Term is the main element of the intent definition. Term consists of the predicate over a token (term body). The constituent parts of the predicate can be based on the token or on some other factors.
Here's how an intent's term is defined:
term
keyword. Required element.a
.Term type. Required element. Two term types are supported:
~
the token can be obtained from the history of the dialog or from the current request (i.e. the term is conversational).=
the token should be obtained only from the current request.Example terms:
term(nums)~{# == 'nlpcraft:num'} // Conversational '~' term. term(nums)={# != 'z'} // Non-conversational '=' term.
The term body. Required element. There are two ways to define the term body: using IDL script with built-in functions or in external Java-based code:
term(nums)={# == 'nlpcraft:num'} // IDL script. term(yes)~{true} // IDL script. term~/org.mypackage.MyClass#termMethod/? // Reference to external code.
Note the special syntax for the last term. The existing set of built-in functions are typically enough to define many terms of the intent. But, if necessary, the programmer can create her own predicates in Java, Scala, Kotlin, Groovy or any other Java based language and write them in the term body. That is, NLPCraft IDL can have snippets of code written entirely in other languages:
term~/org.mypackage.MyClass#termMethod/? // Reference to external code.
This function termMethod
receives an argument that contains all the necessary data on the user's request, and as the output it should return the value of a specific type.
Quantifier. Optional. The default value is [1, 1]
. The following types of quantifiers are supported:
[M, N]
- the term must be found from N
to M
times.*
- the term must be found at least once, is equivalent to [0, ∞]
+
- the term must be found more than once, is it equivalent to [1, ∞]
?
- the term must be found 0 or 1 time, equivalent to [0, 1]
Examples:
term(nums)={# == 'nlpcraft:num'}[1,2] // The request must contain one or two tokens with the ID “nlpcraft: num”. term(nums)={# == 'nlpcraft:num'}* // The request must contain zero or more tokens with the ID “nlpcraft: num”.
IDL Built-In Functions
NLPCraft IDL provides over 140 built-in functions that can be used in term definition. These functions can be conventionally classified into the following categories:
tok_id
, tok_groups
, tok_parent
.tok_lemma
, tok_is_wordnet
, tok_swear
.tok_value
, tok_is_permutated
, tok_is_direct
.req_tstamp
, req_addr
, req_agent
.meta_model('my: prop')
, meta_tok('nlpcraft: num: unit')
, meta_user('my: prop')
.geo:city
token it can be the number of city residents or coordinates obtained from metadata.user_admin
, comp_name
, user_signup_tstamp
. meta_sys('java.home'), now, day_of_week
.lowercase("TeXt")
, abs(-1.5)
, distinct(list(1, 2, 2, 3, 1))
.More detailed information and a description of each function can be found here.
Term Variables
The term body written in IDL script is a predicate written using IDL built-in functions. To avoid the unnecessary repetitive computations when working with built-in functions, local term variables can be used:
term(t2)={ @a = meta_model('a') @list = list(1, 2, 3, 4) (@a == 42 || @a == 44) && has_all(@list, list(3, 2)) }
Local variables are defined and used with the special prefix @
. They used to avoid repeated computations and shortening/simplifying of the IDL term predicate expressions.
Intent Fragments
Fragment is a named set of terms that is created to be reusable across intents. See example by following this link.
Intent Flow
Here we define an additional intent selection rule based on data from previous intent matches within the current session. This rule can be defined using a regex based on the IDs of the previously matched intents:
flow="^(?:login)(^:logout)*$"
This rule means that for the intent to be matched, it is necessary that within the current session the intent with the login
identifier has already been matched before, and not with identifier logout
.
If it is necessary to define more complex logic, it can also be moved into user code written in any Java-based language, like the term body:
@NCIntent("intent=x flow=/com.company.dialog.Flow#customFlow/ term~{# == 'some_id'}" ) def onX(): NCResult = { .. }
The predicate defined in the method customFlow()
receives at the input a list with all intents information, previously matched within the current session, and returns a boolean value.
Intent Metadata
Optional element. A additional dataset that can be used by term predicates presented in JSON format.
All the logic for creating intents defined using NLPCraft IDL, can be written in any Java based language. Why, then, is this new language needed at all? Even if its syntax is short, simple and straightforward, you still have to spend some time studying it.
Below are some reasons for using NLPCraft IDL:
I hope you were able to get a first impression of the capabilities of the NLPCraft IDL language and the types of tasks that can be solved using it. Here you will find a detailed description of the language and its capabilities. Additional examples of models created in Java, Kotlin, Groovy and Scala, using NLPCraft IDL to define intents, are available in the GitHub project.