• Docs
  • Resources
  • Community
  • Use Cases
  • Downloads
  • v.0.9.0
  • GitHub
  1. Home
  2. Resources
  3. NLPCraft IDL - Intent Definition Language

Learn

  • Developer Guide
  • Overview
  • Installation
  • First Example
  • Data Model
  • Intent Matching
  • Short-Term Memory
  • Server & Probe
  • Metrics & Tracing
  • Integrations
  • REST API
  • Tools
  • nlpcraft.{sh|cmd}
  • Test Framework
  • Embedded Probe
  • SQL Model Generator
  • Synonyms Tool
  • Examples
  • Alarm Clock
  • Light Switch
  • Weather Bot
  • SQL Model
NLPCraft IDL - Intent Definition Language
Aaron Radzinski
June 3, 2021
This blog is an English adaptation of Sergey Kamov's blog written in Russian.

Latest Version

At the moment of this writing, the project undergoes incubation at Apache Software Foundation, the latest version of NLPCraft is 0.9.0. In some rare cases, APIs might have changed since this article was written. Consult the latest documentation for up-to-date information.

This article is a second part of the article Проектируем интенты с Apache NLPCraft and contains a detailed description of NLPCraft IDL - Intent Definition Language, created for NLP projects based on the Apache NlpCraft. NLPCraft IDL support has been added to the system since version 0.7.5.

The declarative Intent Definition Language, called NLPCraft IDL, significantly simplifies the process of working with intents in NLP-based dialog and search systems developed using Apache NLPsCraft and at the same time expands the capabilities of them.

Note that at the point of choosing the best matching intent, NLP systems, in general, already have a parsed and processed user input request, which contains combinations of request’s tokens and other related information.

Examples

Let's start with examples to demonstrate the general capabilities of the language, provide the necessary explanations, and then describe the design of the language a bit more formally.

        intent=xa
           flow="^(?:login)(^:logout)*$"
           meta={'enabled': true}
           term(a)={# != "z"}[1,3]
           term(b)={
              meta_intent('enabled') == true && // Must be active.
              month == 1 // January.
           }
           term(c)~{
                // Variables.
                @usrTypes = meta_model('user_types')

                // Predicate.
                (# == 'order' || # == 'order_cancel') &&
                has_all(@usrTypes, list(1, 2, 3) &&
                abs(meta_tok('order:size')) > 10)
    

NOTES:

  • The intent name is xa.
  • The intent contains three terms. Term is an element that defines a predicate, each of which must pass for the intent to be selected:
    • term(a) - the parsed request must contain between one and three tokens with identifiers other than z, without taking into account data from the dialogue history (term type =).
    • term(b) - the intent must be active - the flag enabled from the model metadata. In addition, such an intent can only be triggered in January - the month() built-in function.
    • term(c) - the token with identifier order or order_cancel should be found in the request or in the dialogue history (term type ~). There are additional restrictions in the model metadata values and also the order size absolute value should be more than 10. In the definition of this term we use IDL variables - and we will talk about them a little bit later.
  • flow= This section defines an additional rule, according to which for the intent to be triggered, the intent with the login identifier must have been selected at least once within the current session, and the intent with the identifier logout should have never been selected in this session. The rule is defined as a regular expression based on the intents IDs of the previous intents in the user session. Other ways to create similar rules will also be described below.
  • meta= For the given intent, a certain set of data can be defined - in this case a configuration - with which you can enable or disable the intent.
        intent=xb
           flow=/#flowModelMethod/
           term(a)=/org.mypackage.MyClass#termMethod/?
           fragment(frag)
    

NOTES:

  • The intent name is xb.
  • term(a) The intent can contain one optional term (? quantifier, detailed explanations will be given below), defined in the code - org.mypackage.MyClass#termMethod method.
  • fragment(frag) Fragment with identifier frag extends the list of terms of the intent with additional terms that are defined elsewhere (in fragment expression). The frag element must be defined above in the code, or accessible via import statement.
  • flow= Flow contains the condition specified in the model code in the method flowModelMethod.

You can find more examples here

Deeper Dive

Note that this article is only a brief overview of the capabilities of NLPCraft IDL and is not trying to be a comprehensive manual. Before starting to work with NLPCraft, it is recommended to study a detailed description of the syntax and all the features review of the language in the corresponding sections of the documentation.

The Places Where Intents Are Defined

  • Intents defined with NLPCraft IDL can be declared directly in static model definition JSON or YAML files. This approach is very convenient for simple cases. An example is available here.
  • Intents can also be defined directly in the model code using @NCIntent annotations. An example can be found here.
  • In addition, intents can be defined in separate special files (usually with *.idl extension). In this case, the model will refer to these intents according to the specified path to these files or URL resources using the import statement. This approach is convenient when working with large models, when syntax highlighting and other features provided by the IDE may be useful (for example, Intellij IDEA provides keyword highlighting, hints and syntax checking for files for the configured types). In addition, this approach can be useful for specialists who do not have the ability or desire to edit the code directly. An example is available at here and here.

Keywords

NLPCraft IDL has only 10 keywords: flow, fragment, import, intent, meta, ordered, term, true, false, null.

  • intent, flow, fragment, meta, options, term are parts of the intent definition.
  • fragment keyword is also can be used to create named terms lists to include in intent definitions (a-la macros).
  • import - required for including external files with fragment, intent or imports statements.
  • true, false, null - used when working with built-in functions.

Lifecycle

An intent compiled when the model loaded and can be debugged only when the model is being debugged. To define complex intents, it is recommended to create them in their own separated files and use the editing capabilities provided by the IDE to ensure initial validation of the syntax.

Program Structure

The program contains a set of the following optional elements, in no particular order:

  • import statement
  • fragment statement
  • intent statement
import Statement

It contains the import keyword and the file name or URL resource in parentheses. It allows to import other IDL definitions from external files or URLs. For example:

        import('http://mysite.com/nlp/idls/external.idl)
    
fragment Statement

It contains the fragment keyword with the name and a list of terms. Terms can be parameterized. An example of a simple fragment:

        fragment=buzz term~{tok_id() == 'x:alarm'}
    

An example of a parameterized fragment with arguments a and b:

        fragment=p1
        term={
            meta_frag('a') &&
            has_any(get(meta_frag('b'), 'Москва'), list(1, 2))
        }
    

Below is an example of using this fragment in an intent. Note that fragment’s parameters (its metadata) is passed using JSON format:

        intent=i1
        fragment(p1, {'a': true, 'b': {'Москва': [1, 2, 3]}})
    
intent Statement

This statement is the core element of IDL allowing to declare the intent. Here's the example of the simple intent:

        intent=xa
           flow="^(?:login)(^:logout)*$"
           meta={'enabled': true}
           term(a)={# != "z"}[1,3]
    

Every intent statement consists of the following elements:

  • Intent Name

    Required element. The name is a unique identifier within the model. Below is an example of using a reference to the xa intent.

                    @NCIntentRef("xa")
                    fun onTimeMatch(
                        ctx: NCIntentMatch,
                        @NCIntentTerm("a") tok: NCToken
                    ): NCResult { ... }
                
  • Intent Terms

    At least one term is required. Term is the main element of the intent definition. Term consists of the predicate over a token (term body). The constituent parts of the predicate can be based on the token or on some other factors.

    Here's how an intent's term is defined:

    • The term keyword. Required element.
    • Name in parentheses. Optional. Used to create references to the found token in the callback arguments, see the example above, token a.
    • Term type. Required element. Two term types are supported:

      • ~ the token can be obtained from the history of the dialog or from the current request (i.e. the term is conversational).
      • = the token should be obtained only from the current request.

      Example terms:

                              term(nums)~{# == 'nlpcraft:num'} // Conversational '~' term.
                              term(nums)={# != 'z'} // Non-conversational '=' term.
                          
    • The term body. Required element. There are two ways to define the term body: using IDL script with built-in functions or in external Java-based code:

                              term(nums)={# == 'nlpcraft:num'} // IDL script.
                              term(yes)~{true} // IDL script.
                              term~/org.mypackage.MyClass#termMethod/? // Reference to external code.
                          

      Note the special syntax for the last term. The existing set of built-in functions are typically enough to define many terms of the intent. But, if necessary, the programmer can create her own predicates in Java, Scala, Kotlin, Groovy or any other Java based language and write them in the term body. That is, NLPCraft IDL can have snippets of code written entirely in other languages:

                              term~/org.mypackage.MyClass#termMethod/? // Reference to external code.
                          

      This function termMethod receives an argument that contains all the necessary data on the user's request, and as the output it should return the value of a specific type.

    • Quantifier. Optional. The default value is [1, 1]. The following types of quantifiers are supported:

      • [M, N] - the term must be found from N to M times.
      • * - the term must be found at least once, is equivalent to [0, ∞]
      • + - the term must be found more than once, is it equivalent to [1, ∞]
      • ? - the term must be found 0 or 1 time, equivalent to [0, 1]

      Examples:

                              term(nums)={# == 'nlpcraft:num'}[1,2] // The request must contain one or two tokens with the ID “nlpcraft: num”.
                              term(nums)={# == 'nlpcraft:num'}* // The request must contain zero or more tokens with the ID “nlpcraft: num”.
                          
  • IDL Built-In Functions

    NLPCraft IDL provides over 140 built-in functions that can be used in term definition. These functions can be conventionally classified into the following categories:

    • Based on base tokens properties - token IDs, groups, parent, hierarchy, etc. Examples: tok_id, tok_groups, tok_parent.
    • NLP-based tokens properties - stemmas, lemmas, parts of speech, stop words. Examples: tok_lemma, tok_is_wordnet, tok_swear.
    • Based on information about how the token was found in the user's request - synonym values, etc. Examples: tok_value, tok_is_permutated, tok_is_direct.
    • Based on user request data - request time, user agent type. Examples: req_tstamp, req_addr, req_agent.
    • Based on various metadata - tokens, model, request, etc. Examples: meta_model('my: prop'), meta_tok('nlpcraft: num: unit'), meta_user('my: prop').
    • Based on data provided by NER token providers. Example, for geo:city token it can be the number of city residents or coordinates obtained from metadata.
    • Based on the user and his company - admin status, registration time. Examples: user_admin, comp_name, user_signup_tstamp.
    • Based on system/environment variables, system time, etc. Examples: meta_sys('java.home'), now, day_of_week.
    • Math functions, text functions, collection functions, etc. Examples: lowercase("TeXt"), abs(-1.5), distinct(list(1, 2, 2, 3, 1)).

    More detailed information and a description of each function can be found here.

  • Term Variables

    The term body written in IDL script is a predicate written using IDL built-in functions. To avoid the unnecessary repetitive computations when working with built-in functions, local term variables can be used:

                    term(t2)={
                        @a = meta_model('a')
                        @list = list(1, 2, 3, 4)
    
                        (@a == 42 || @a == 44) && has_all(@list, list(3, 2))
                    }
                

    Local variables are defined and used with the special prefix @. They used to avoid repeated computations and shortening/simplifying of the IDL term predicate expressions.

  • Intent Fragments

    Fragment is a named set of terms that is created to be reusable across intents. See example by following this link.

  • Intent Flow

    Here we define an additional intent selection rule based on data from previous intent matches within the current session. This rule can be defined using a regex based on the IDs of the previously matched intents:

                    flow="^(?:login)(^:logout)*$"
                

    This rule means that for the intent to be matched, it is necessary that within the current session the intent with the login identifier has already been matched before, and not with identifier logout.

    If it is necessary to define more complex logic, it can also be moved into user code written in any Java-based language, like the term body:

                    @NCIntent("intent=x
                        flow=/com.company.dialog.Flow#customFlow/
                        term~{# == 'some_id'}"
                    )
                    def onX(): NCResult = { .. }
                

    The predicate defined in the method customFlow() receives at the input a list with all intents information, previously matched within the current session, and returns a boolean value.

  • Intent Metadata

    Optional element. A additional dataset that can be used by term predicates presented in JSON format.

Why Do We Need NLPCraft IDL?

All the logic for creating intents defined using NLPCraft IDL, can be written in any Java based language. Why, then, is this new language needed at all? Even if its syntax is short, simple and straightforward, you still have to spend some time studying it.

Below are some reasons for using NLPCraft IDL:

  • The program terseness. A custom DSL code is always shorter than a general programming language code with the same logic. For intents with non-trivial rules, this can be important.
  • If the NLPCraft IDL code is defined in a separate file, then editing the IDL program does not require the rebuild of the code of the model and its callbacks.
  • Separating the logic of writing callbacks and the logic of matching intents. Different people can work with these tasks. Due to the deliberate limited IDL features, DSL is easier to learn by a non-programmer.
  • Currently, models can be created in any Java based language. Apache NLPCraft plans to expand the list of supported languages in the near future. Using NLPCraft IDL will allow you to maintain one common language for defining intents for different types of models within the same project.

Conclusion

I hope you were able to get a first impression of the capabilities of the NLPCraft IDL language and the types of tasks that can be solved using it. Here you will find a detailed description of the language and its capabilities. Additional examples of models created in Java, Kotlin, Groovy and Scala, using NLPCraft IDL to define intents, are available in the GitHub project.

  • Quick Links
  • Examples
  • Javadoc
  • REST API
  • Download
  • Cheat Sheet
  • News & Events
  • Support
  • JIRA
  • Dev List
  • Stack Overflow
  • GitHub
  • Gitter
  • Twitter
  • YouTube
Copyright © 2021 Apache Software Foundation asf Events • Privacy • News • Docs release: 0.9.0 Gitter Built in: