• Docs
  • Resources
  • Community
  • Use Cases
  • Downloads
  • v.0.9.0
  • GitHub
  1. Home
  2. Intent Matching

Intent Matching

  • Developer Guide
  • Overview
  • Installation
  • First Example
  • Data Model
  • Intent Matching
  • Short-Term Memory
  • Server & Probe
  • Metrics & Tracing
  • Integrations
  • REST API
  • Tools
  • nlpcraft.{sh|cmd}
  • Test Framework
  • Embedded Probe
  • SQL Model Generator
  • Synonyms Tool
  • Examples
  • Alarm Clock
  • Light Switch
  • Weather Bot
  • SQL Model

Overview

Data Model processing logic is defined as a collection of one or more intents. The sections below explain what intent is, how to define it in your model, and how it works.

Intent

The goal of the data model implementation is to take the user input text and match it to a specific user-defined code that will execute for that input. The mechanism that provides this matching is called an intent.

The intent generally refers to the goal that the end-user had in mind when speaking or typing the input utterance. The intent has a declarative part or template written in IDL - Intent Definition Language that strictly defines a particular form the user input. Intent is also bound to a callback method that will be executed when that intent, i.e. its template, is detected as the best match for a given input. A typical data model will have multiple intents defined for each form of the expected user input that model wants to react to.

For example, a data model for banking chatbot or analytics application can have multiple intents for each domain-specific group of input such as opening an account, closing an account, transferring money, getting statements, etc.

Intents can be specific or generic in terms of what input they match. Multiple intents can overlap and NLPCraft will disambiguate such cases to select the intent with the overall best match. In general, the most specific intent match wins.

IDL Syntax

NLPCraft intents are written in Intent Definition Language (IDL). IDL is a relatively straightforward declarative language. For example, here's a simple intent x with two terms a and b:

            intent=x
                term(a)~{# == 'my_elm'}
                term(b)={has(tok_groups, "my_group")}
        

IDL intent defines a match between the parsed user input represented as the collection of tokens, and the user-define callback method. IDL intents are bound to their callbacks via Java annotation and can be located in the same Java annotations or placed in model YAML/JSON file as well as in external *.idl files.

You can review the formal ANTLR4 grammar for IDL, but here are the general properties of IDL:

  • IDL has context-free grammar. In simpler terms, all whitespaces outside of string literals are ignored.
  • IDL supports Java-style comments, both single line // Comment. as well as multi-line /* Comment. */.
  • String literals can use either single quotes ('text') or double quotes ("text") simplifying IDL usage in JSON or Java languages - you don't have to escape double quotes. Both quotes can be escaped in string, i.e. "text with \" quote" or 'text with \' quote'
  • Built-in literals true, false and null for boolean and null values.
  • Algebraic and logical expression including operator precedence follow standard Java language conventions.
  • Both integer and real numeric literals can use underscore '_' character for separation as in 200_000.
  • Numeric literals use Java string conversions.
  • IDL has only 10 reserved keywords: flow fragment import intent meta options term true false null
  • Identifiers and literals can use the same Unicode space as Java.
  • IDL provides over 150 built-in functions to aid in intent matching. IDL functions are pure immutable mathematical functions that work on a runtime stack. In other words, they look like Python functions: IDL length(trim(" text ")) vs. OOP-style " text ".trim().length().
  • IDL is a lazily evaluated language, i.e. expressions are evaluated only when required during runtime. That means that evaluated left-to-right logical AND and OR operators, for example, skip their right-part expressions if the left expression result is determinative for the overall result - so-called short-circuit evaluation. Some IDL functions like if and or_else also provide the similar short-circuit evaluation.

IDL program consists of intent, fragment, or import statements in any order or combination:

  • intent statement

    Intent is defined as one or more terms. Each term is a predicate over a instance of NCToken interface. For an intent to match all of its terms have to evaluate to true. Intent definition can be informally explained using the following full-feature example:

                        intent=xa
                            flow="^(?:login)(^:logout)*$"
                            meta={'enabled': true}
                            term(a)={month >= 6 && !# != "z" && meta_intent('enabled') == true}[1,3]
                            term(b)~{
                                @usrTypes = meta_model('user_types')
    
                                (# == 'order' || # == 'order_cancel') && has_all(@usrTypes, list(1, 2, 3))
                            }
    
                        intent=xb
                            options={
                                'ordered': false,
                                'unused_free_words': true,
                                'unused_sys_toks': true,
                                'unused_usr_toks': false,
                                'allow_stm_only': false
                            }
                            flow=/#flowModelMethod/
                            term(a)=/org.mypackage.MyClass#termMethod/?
                            fragment(frag, {'p1': 25, 'p2': {'a': false}})
                    

    NOTES:

    intent=xa line 1
    intent=xb line 12
    xa and xb are the mandatory intent IDs. Intent ID is any arbitrary unique string matching the following lexer template: (UNI_CHAR|UNDERSCORE|LETTER|DOLLAR)+(UNI_CHAR|DOLLAR|LETTER|[0-9]|COLON|MINUS|UNDERSCORE)*
    options={...} line 13
    Optional. Matching options specified as JSON object. Entire JSON object as well as each individual JSON field is optional. Allows to customize the matching algorithm for this intent:
    OptionTypeDescriptionDefault Value
    orderedBoolean

    Whether or not this intent is ordered. For ordered intent the specified order of terms is important for matching this intent. If intent is unordered its terms can be found in any order in the input text. Note that ordered intent significantly limits the user input it can match. In most cases the ordered intent is only applicable to processing of a formal grammar (like a programming language) and mostly unsuitable for the natural language processing.

    Note that while the ordered flag affect entire intent and all its terms, you can define the individual term that depends on the position of the token. This, in fact, allows you have a subset of terms that order dependant. See the following IDL functions for details:

    • tok_index()
    • tok_all()
    • tok_count()
    • tok_is_last()
    • tok_is_first()
    • tok_is_before_id()
    • tok_is_before_parent()
    • tok_is_before_group()
    • tok_is_between_ids()
    • tok_is_between_parents()
    • tok_is_between_groups()
    • tok_is_after_id()
    • tok_is_after_parent()
    • tok_is_after_group()
    false
    unused_free_wordsBoolean Whether or not free words - that are unused by intent matching - should be ignored (value true) or reject the intent match (value false). Free words are the words in the user input that were not recognized as any user or system token. Typically, for the natural language comprehension it is safe to ignore free words. For the formal grammar, however, this could make the matching logic too loose.true
    unused_sys_toksBoolean Whether or not unused system tokens should be ignored (value true) or reject the intent match (value false). By default, tne unused system tokens are ignored.true
    unused_usr_toksBoolean Whether or not unused user-defined tokens should be ignored (value true) or reject the intent match (value false). By default, tne unused user tokens are not ignored since it is assumed that user would define his or her own tokens on purpose and construct the intent logic appropriate.false
    allow_stm_onlyBoolean Whether or not the intent can match when all of the matching tokens came from STM. By default, this special case is disabled (value false). However, in specific intents designed for free-form language comprehension scenario, like, for example, SMS messaging - you may want to enable this option.false
    flow="^(?:login)(^:logout)*$" line 2
    flow=/#flowModelMethod/ line 20

    Optional. Dialog flow is a history of previously matched intents to match on. If provided, the intent will first match on the history of the previously matched intents before processing its terms. There are two way to define a match on the dialog flow:

    • Regular Expression

      In this case dialog flow specification is a string with the standard Java regular expression. The history of previously matched intents is presented as a space separated string of intent IDs that were selected as the best match during the current conversation, in the chronological order with the most recent matched intent ID being the first element in the string. Dialog flow regular expression will be matched against that string representing intent IDs.

      In the line 2, the ^(?:login)(^:logout)*$ dialog flow regular expression defines that intent should only match when the immediate previous intent was login and no logout intents are in the history. If the history is "login order order" - this intent will match. However, for "login logout" or "order login" history this dialog flow will not match.

    • User-Defined Callback

      In this case the dialog flow specification is defined as a callback in a form /x.y.z.Cass#method/, where x.y.z.Class should be a fully qualified name of the class where callback is defined, and method must be the name of the callback method. This method should take one parameter of type java.util.List[NCDialogFlowItem] and return boolean result.

      Class name is optional in which case the model class will be used by default. Note that if the custom class is in fact specified, the instance of this class will be created for each dialog flow test. This class must have a no-arg constructor to instantiate via standard Java reflection and its creation must be as light as possible to avoid performance degradation during its instantiation. For this reasons it is recommended to have dialog flow callback on the model class itself which will avoid instantiating the class on each dialog flow evaluation.

    Note that if dialog flow is defined and it doesn't match the history the terms of the intent won't be tested at all.

    meta={'enabled': true} line 3

    Optional. Just like the most of the components in NLPCraft, the intent can have its own metadata. Intent metadata is defined as a standard JSON object which will be converted into java.util.Map instance and can be accessed in intent's terms via meta_intent() IDL function. The typical use case for declarative intent metadata is to parameterize its behavior, i.e. the behavior of its terms, with a clearly defined properties that are provided inside intent definition itself.

    term(a)={month >= 6 && !# != "z" && meta_intent('enabled') == true}[1,3] line 4
    term(b)~{ line 5
    @usrTypes = meta_model('user_types')
    (# == 'order' || # == 'order_cancel') && has_all(@usrTypes, list(1, 2, 3))
    }
    term(a)=/org.mypackage.MyClass#termMethod/? line 21

    Term is a building block of the intent. Intent must have at least one term. Term has optional ID, a token predicate and optional quantifiers. It supports conversation context if it uses '~' symbol or not if it uses '=' symbol in its definition. For the conversational term the system will search for a match using tokens from the current request as well as the tokens from conversation STM (short-term-memory). For a non-conversational term - only tokens from the current request will be considered.

    A term is matched if its token predicate returns true. The matched term represents one or more tokens, sequential or not, that were detected in the user input. Intent has a list of terms (always at least one) that all have to be matched in the user input for the intent to match. Note that term can be optional if its min quantifier is zero. Whether the order of the terms is important for matching is governed by intent's ordered parameter.

    Term ID (a and b) is optional. It is only required by @NCIntentTerm annotation to link term's tokens to a formal parameter of the callback method. Note that term ID follows the same lexical rules as intent ID.

    Term's body can be defined in two ways:

    • IDL Expression

      Inside of curly brackets { } you can have an optional list of term variables and the mandatory term expression that must evaluate to a boolean value. Term variable name must start with @ symbol and be unique within the scope of the current term. All term variables must be defined and initialized before term expression which must be the last statement in the term:

                                          term(b)~{
                                              @a = meta_model('a')
                                              @lst = list(1, 2, 3, 4)
      
                                              has_all(@lst, list(@a, 2))
                                          }
                                      

      Term variable initialization expression as well as term's expression follow Java-like expression grammar including precedence rules, brackets and logical combinators, as well as built-in IDL functions calls:

                                          term={true} // Special case of 'constant' term.
                                          term={
                                              // Variable declarations.
                                              @a = round(1.25)
                                              @b = meta_model('my_prop')
      
                                              // Last expression must evaluate to boolean.
                                              (@a + 2) * @b > 0
                                          }
                                          term={
                                              // Variable declarations.
                                              @c = meta_tok('prop')
                                              @lst = list(1, 2, 3)
      
                                              // Last expression must evaluate to boolean.
                                              abs(@c) > 1 && size(@lst) != 5
                                          }
                                      

      NOTE: while term variable initialization expressions can have any type - the term's expression itself, i.e. the last expression in the term's body, must evaluate to a boolean result only. Failure to do so will result in a runtime exception during intent evaluation. Note also that such errors cannot be detected during intent compilation phase.

    • User-Defined Callback

      In this case the term's body is defined as a callback in a form /x.y.z.Cass#method/, where x.y.z.Class should be a fully qualified name of the class where callback is defined, and method must be the name of the callback method. This method should take one parameter of type NCTokenPredicateContext and return an instance of NCTokenPredicateResult as its result:

                                          term(a)=/org.mypackage.MyClass#termMethod/?
                                      

      Class name is optional in which case the model class will be used by default. Note that if the custom class is in fact specified, the instance of this class will be created for each term evaluation. This class must have a no-arg constructor to instantiate via standard Java reflection and its creation must be as light as possible to avoid performance degradation during its instantiation. For this reason it is recommended to have user-defined term callback on the model class itself which will avoid instantiating the class on each term evaluation.

    ? and [1,3] define an inclusive quantifier for that term, i.e. how many times the match for this term should found. You can use the following quick abbreviations:

    • * is equal to [0,∞]
    • + is equal to [1,∞]
    • ? is equal to [0,1]
    • No quantifier defaults to [1,1]

    As mentioned above the quantifier is inclusive, i.e. the [1,3] means that the term should appear once, two times or three times.

    fragment(frag, {'p1': 25, 'p2': {'a': false}}) line 22

    Fragment reference allows to insert the terms defined by that fragment in place of this fragment reference. Fragment reference has mandatory fragment ID parameter and optional JSON second parameter. Optional JSON parameter allows to parameterize the inserted terms' behavior and it is available to the terms via meta_frag() IDL function.

  • fragment statement

    Fragments allow to group and name a set of reusable terms. Such groups can be further parameterized at the place of reference and enable the reuse of one or more terms by multiple intents. For example:

                        // Fragments.
                        fragment=buzz term~{# == meta_frag('id')}
                        fragment=when
                            term(nums)~{
                                // Term variable.
                                @type = meta_tok('nlpcraft:num:unittype')
                                @iseq = meta_tok('nlpcraft:num:isequalcondition')
    
                                # == 'nlpcraft:num' && @type == 'datetime' && @iseq == true
                            }[0,7]
    
                        // Intents.
                        intent=alarm
                            // Insert parameterized terms from fragment 'buzz'.
                            fragment(buzz, {"id": "x:alarm"})
    
                            // Insert terms from fragment 'when'.
                            fragment(when)
                    

    NOTES:

    • Fragment statements (line 2 and 3) have a name (buzz and when) and a list of terms.
    • Terms follow the same syntax as in intent definition.
    • When a fragment is referenced in intent (lines 15 and 18) it is replaced with its terms.
  • import statement

    Import statement allows to import IDL declarations from either local file, classpath resource or URL:

                        // Import using absolute path.
                        import('/opt/globals.idl')
    
                        // Import using classpath resource.
                        import('org/apache/nlpcraft/examples/alarm/intents.idl')
    
                        // Import using URL.
                        import('ftp://user:password@myhost:22/opt/globals.idl')
                    

    NOTES:

    • The effect of importing is the same as if the imported declarations were inserted in place of import statement.
    • Recursive and cyclic imports are detected and safely ignored.
    • Import statement starts with import keyword and has a string parameter that indicates the location of the resource to import.
    • For the classpath resource you don't need to specify leading forward slash.

Intent Lifecycle

During NLPCraft data probe start it scans the models provided in its configuration for the intents. The scanning process goes through JSON/YAML external configurations as well as model classes when looking for IDL intents. All found intents are compiled into an internal representation before the data probe completes its start up sequence.

@NCModelAddClasses and @NCModelAddPackage Annotations

You can use these annotations to add specified classes and packages to the list of classes that will be scanned when NLPCraft searches for the annotated intent callbacks. By default, only the model class itself and its ancestors are scanned. Larger models can be modularized and split into separate compilation units to simplify their development and maintenance.

Note that not all intents problems can be detected at the compilation phase, and probe can start with intents not being completely validated. For example, each term in the intent must evaluate to a boolean result. This can only be checked at runtime. Another example is the number and the types of parameters passed into IDL function which is only checked at runtime as well.

Intents are compiled only once during the data probe start up sequence and cannot be re-compiled without data probe restart. Model logic, however, can affect the intent behavior through model callbacks, model metadata, user and company metadata, as well as request data all of which can change at runtime and are accessible through IDL functions.

Intent Examples

Here's few of intent examples with explanations:

Example 1:

        intent=a
            term~{# == 'x:id'}
            term(nums)~{# == 'nlpcraft:num' && lowercase(meta_tok('nlpcraft:num:unittype')) == 'datetime'}[0,2]
        

NOTES:

  • Intent has ID a.
  • Intent uses default conversational support (true) and default order (false).
  • Intent has two conversational terms (~) that have to be found for the intent to match. Note that second term is optional as it has [0,2] quantifier.
  • Both terms have to be found in the user input for the intent to match.
  • First term matches any single token with ID x:id.
  • Second term can appear zero, once or two times and it matches token with ID nlpcraft:num with nlpcraft:num:unittype metadata property equal to 'datetime' string.
  • IDL function lowercase used on nlpcraft:num:unittype metadata property value.
  • Note that since second term has ID (nums) it can be references by @NCIntentTerm annotation by the callback formal parameter.

Example 2:

        intent=id2
            flow='id1 id2'
            term={# == 'mytok' && signum(get(meta_tok('score'), 'best')) != -1}
            term={has_any(tok_groups, list('actors', 'owners')) && size(meta_part('partAlias, 'text')) > 10}
        

NOTES:

  • Intent has ID id2.
  • Intent has dialog flow pattern: 'id1 id2'. It expects the sequence of intents id1 and id2 somewhere in the history of previously matched intents in the course of the current conversation.
  • Intent has two non-conversational terms (=). Both terms have to be present only once (their implicit quantifiers are [1,1]).
  • Both terms have to be found in the user input for the intent to match.
  • First term should be a token with ID mytok and have metadata property score of type map. This map should have a value with the string key 'best'. signum of this map value should not equal -1. Note that meta_tok(), get() and signum() are all built-in IDL functions.
  • Second term should be a token that belongs to either actors or owners group. It should have a part token whose with alias partAlias. That part token should have metadata property text of type string, list or map. The length of this string, list or map should be greater than 10.

Syntax Highlighting

NLPCraft IDL has relatively simple syntax and you can easily configure its syntax highlighting in most modern code editors and IDEs. Here are two examples of how to add IDL syntax highlighting:

IntelliJ IDEA SyntaxHighlighter.js

NLPCraft project comes with idea/nlpcraft_idl_idea_settings.zip file that contains syntax highlighting configuration for *.idl file types. Import this file (File -> Manage IDE Settings -> Import Settings...) and you will get proper syntax highlighting for *.idl files in your project.

For highlighting the IDL syntax on the web you can use SyntaxHighlighter JavaScript library that is used for all IDL code on this website.

To add custom language support, create a new brush file shBrushIdl.js with the following content and place it under scripts folder in your local SyntaxHighlighter installation:

;(function()
{
    // CommonJS
    typeof(require) != 'undefined' ? SyntaxHighlighter = require('shCore').SyntaxHighlighter : null;

    function Brush()
    {
        const keywords = 'flow fragment import intent meta options term';
        const literals = 'false null true';
        const symbols =	'[\\[\\]{}*@+?~=]+';
        const fns = 'abs asin atan atan2 avg cbrt ceil comp_addr comp_city comp_country comp_id comp_name comp_postcode comp_region comp_website concat contains cos cosh count day_of_month day_of_week day_of_year degrees distinct ends_with euler exp expm1 first floor get has has_all has_any hour hypot if index_of is_alpha is_alphanum is_alphanumspace is_alphaspace is_empty is_num is_numspace is_whitespace json keys last length list log log10 log1p lowercase max meta_company meta_conv meta_frag meta_intent meta_model meta_part meta_req meta_sys meta_tok meta_user min minute month non_empty now or_else pi pow quarter radians rand replace req_addr req_agent req_id req_normtext req_tstamp reverse rint round second signum sin sinh size sort split split_trim sqrt square starts_with stdev strip substr tan tanh to_double to_int to_string tok_aliases tok_all tok_all_for_group tok_all_for_id tok_all_for_parent tok_ancestors tok_count tok_end_idx tok_find_part tok_find_parts tok_groups tok_has_part tok_id tok_index tok_is_abstract tok_is_after_group tok_is_after_id tok_is_after_parent tok_is_before_group tok_is_before_id tok_is_before_parent tok_is_bracketed tok_is_direct tok_is_english tok_is_first tok_is_freeword tok_is_last tok_is_permutated tok_is_quoted tok_is_stopword tok_is_swear tok_is_user tok_is_wordnet tok_lemma tok_parent tok_pos tok_sparsity tok_start_idx tok_stem tok_this tok_unid tok_value trim uppercase user_admin user_email user_fname user_id user_lname user_signup_tstamp values week_of_month week_of_year year';

        this.regexList = [
            { regex: SyntaxHighlighter.regexLib.singleLineCComments, css: 'comments' },	// One line comments.
            { regex: SyntaxHighlighter.regexLib.multiLineCComments,	css: 'comments' }, // Multiline comments.
            { regex: SyntaxHighlighter.regexLib.doubleQuotedString,	css: 'string' }, // String.
            { regex: SyntaxHighlighter.regexLib.singleQuotedString, css: 'string' }, // String.
            { regex: /0x[a-f0-9]+|\d+(\.\d+)?/gi, css: 'value' }, // Numbers.
            { regex: new RegExp(this.getKeywords(keywords), 'gm'), css: 'keyword' }, // Keywords.
            { regex: new RegExp(this.getKeywords(literals), 'gm'), css: 'color1' }, // Literals.
            { regex: /<|>|<=|>=|==|!=|&&|\|\|/g, css: 'color2' }, // Operators.
            { regex: new RegExp(this.getKeywords(fns), 'gm'), css: 'functions' }, // Functions.
            { regex: new RegExp(symbols, 'gm'), css: 'color3' } // Symbols.
        ];
    }

    Brush.prototype	= new SyntaxHighlighter.Highlighter();
    Brush.aliases	= ['idl'];

    SyntaxHighlighter.brushes.Idl = Brush;

    // CommonJS.
    typeof(exports) != 'undefined' ? exports.Brush = Brush : null;
})();
                

Make sure to include this script in your page:

<script src="/path/to/your/scripts/shBrushIdl.js" type="text/javascript"></script>
                

And then you can use it to display IDL code from HTML using <pre> tag and brush: idl CSS class:

<pre class="brush: idl">
    intent=xa
        flow="^(?:login)(^:logout)*$"
        meta={'enabled': true}
        term(a)={month >= 6 && # != "z" && meta_intent('enabled') == true}[1,3]
        term(b)~{
            @usrTypes = meta_model('user_types')

            (# == 'order' || # == 'order_cancel') && has_all(@usrTypes, list(1, 2, 3))
        }
</pre>
                

IDL Functions

IDL provides over 150 built-in functions that can be used in IDL intent definitions. IDL function call takes on traditional fun_name(p1, p2, ... pk) syntax form. If function has no parameters, the brackets are optional. IDL function operates on stack - its parameters are taken from the stack and its result is put back onto stack which in turn can become a parameter for the next function call and so on. IDL functions can have zero or more parameters and always have one result value. Some IDL functions support variable number of parameters. Note that you cannot define your own functions in IDL - in such cases you need to use the term with the user-defined callback method.

Special Shorthand #

The frequently used IDL function tok_id() has a special shorthand #. For example, the following expressions are all equal:

                tok_id() == 'id'
                tok_id == 'id' // Remember - empty parens are optional.
                # == 'id'
            

When chaining the function calls IDL uses mathematical notation (a-la Python) rather than object-oriented one: IDL length(trim(" text ")) vs. OOP-style " text ".trim().length().

IDL functions operate with the following types:

JVM TypeIDL NameNotes
java.lang.StringString
java.lang.Long
java.lang.Integer
java.lang.Short
java.lang.Byte
Long Smaller numerical types will be converted to java.lang.Long.
java.lang.Double
java.lang.Float
Double java.lang.Float will be converted to java.lang.Double.
java.lang.BooleanBooleanYou can use true or false literals.
java.util.List<T>List[T]Use list(...) IDL function to create new list.
java.util.Map<K,V>Map[K,V]
NCTokenToken
java.lang.ObjectAnyAny of the supported types above. Use null literal for null value.

Some IDL functions are polymorphic, i.e. they can accept arguments and return result of multiple types. Encountering unsupported types will result in a runtime error during intent matching. It is especially important to watch out for the types when adding objects to various metadata containers and using that metadata in the IDL expressions.

Unsupported Types

Detection of the unsupported types by IDL functions cannot be done during IDL compilation and can only be done during runtime execution. This means that even though the data probe compiles IDL intents and starts successfully - it does not guarantee that intents will operate correctly.

All IDL functions are organized into the following groups:

Token Text Math Collection Metadata Date & Time Request User Company Other

Description:
Returns token ID for the current token (default) or the provided one by the optional paremeter t. Note that this functions has a special shorthand #.

Usage:

// Result: 'true' if the current token ID is equal to 'my_id'.
tok_id == 'my_id'
# == 'my_id'
tok_id(tok_this) == 'my_id'
#(tok_this) == 'my_id'

Description:
Returns token lemma for the current token (default) or the provided one by the optional paremeter t.

Usage:

// Result: 'true' if the current token lemma is equal to 'work'.
tok_lemma == 'work'
tok_lemma(tok_this) == 'work'

Description:
Returns token stem for the current token (default) or the provided one by the optional paremeter t.

Usage:

// Result: 'true' if the current token stem is equal to 'work'.
tok_stem == 'work'
tok_stem(tok_this) == 'work'

Description:
Returns token PoS tag for the current token (default) or the provided one by the optional paremeter t.

Usage:

// Result: 'true' if the current token PoS tag is equal to 'NN'.
tok_pos == 'NN'
tok_pos(tok_this) == 'NN'

Description:
Returns token sparsity value for the current token (default) or the provided one by the optional paremeter t.

Usage:

// Result: token sparsity value.
tok_sparsity
tok_sparsity(tok_this)

Description:
Returns internal token globally unique ID for the current token (default) or the provided one by the optional paremeter t.

Usage:

// Result: internal token globally unique ID.
tok_unid
tok_unid(tok_this)

Description:
Returns token abstract flag for the current token (default) or the provided one by the optional paremeter t.

Usage:

// Result: token abstract flag.
tok_is_abstract
tok_is_abstract(tok_this)

Description:
Returns token bracketed flag for the current token (default) or the provided one by the optional paremeter t.

Usage:

// Result: token bracketed flag.
tok_is_bracketed
tok_is_bracketed(tok_this)

Description:
Returns token direct flag for the current token (default) or the provided one by the optional paremeter t.

Usage:

// Result: token direct flag.
tok_is_direct
tok_is_direct(tok_this)

Description:
Returns token permutated flag for the current token (default) or the provided one by the optional paremeter t.

Usage:

// Result: token permutated flag.
tok_is_permutated
tok_is_permutated(tok_this)

Description:
Returns token English detection flag for the current token (default) or the provided one by the optional paremeter t.

Usage:

// Result: token English detection flag.
tok_is_english
tok_is_english(tok_this)

Description:
Returns token freeword flag for the current token (default) or the provided one by the optional paremeter t.

Usage:

// Result: token freeword flag.
tok_is_freeword
tok_is_freeword(tok_this)

Description:
Returns token quoted flag for the current token (default) or the provided one by the optional paremeter t.

Usage:

// Result: token quoted flag.
tok_is_quoted
tok_is_quoted(tok_this)

Description:
Returns token stopword flag for the current token (default) or the provided one by the optional paremeter t.

Usage:

// Result: token stopword flag.
tok_is_stopword
tok_is_stopword(tok_this)

Description:
Returns token swear word flag for the current token (default) or the provided one by the optional paremeter t.

Usage:

// Result: token swear flag.
tok_is_swear
tok_is_swear(tok_this)

Description:
Returns if this token is defined by user-defined model element or a built-in element for the current token (default) or the provided one by the optional paremeter t.

Usage:

// Result: wether or not this is defined by user model element vs. built-in,
tok_is_user
tok_is_user(tok_this)

Description:
Returns if this token's text is a known part of WordNet dictionary for the current token (default) or the provided one by the optional paremeter t.

Usage:

// Result: whether or not this token is part of WordNet dictionary.
tok_is_wordnet
tok_is_wordnet(tok_this)

Description:
Gets the list of all parent IDs for the current token (default) or the provided one by the optional paremeter t up to the root. This only available for user-defined model elements - built-in tokens do not have parents and will return an empty list. May return an empty list but never a null.

Usage:

// Result: list of all ancestors.
tok_ancestors
tok_ancestors(tok_this)

Description:
Gets the optional parent ID of the model element the current token (default) or the provided one by the optional paremeter t represents. This only available for user-defined model elements - built-in tokens do not have parents and this will return null.

Usage:

// Result: list of all ancestors.
tok_parent
tok_parent(tok_this)

Description:
Gets the list of groups the current token (default) or the provided one by the optional paremeter t belongs to. Note that, by default, if not specified explicitly, token always belongs to one group with ID equal to token ID. May return an empty list but never a null.

Usage:

// Result: list of groups this token belongs to.
tok_groups
tok_groups(tok_this)

Description:
Gets the value if the current token (default) or the provided one by the optional paremeter t was detected via element's value (or its synonyms). Otherwise returns null. Only applicable for user-defined model elements - built-in tokens do not have values and it will return null.

Usage:

// Result: the token value if this token was detected via element's value
tok_value
tok_value(tok_this)

Description:
Gets optional list of aliases the current token (default) or the provided one by the optional paremeter t is known by. Token can get an alias if it is a part of other composed token and IDL expression that was used to match it specified an alias. Note that token can have zero, one or more aliases. May return an empty list but never a null.

Usage:

// Result: checks if this token is known by 'alias' alias.
has(tok_aliases, 'alias')
has(tok_aliases(tok_this), 'alias')

Description:
Gets start character index of the current token (default) or the provided one by the optional paremeter t in the original text.

Usage:

// Result: start character index of this token in the original text.
tok_start_idx
tok_start_idx(tok_this)

Description:
Gets end character index of the current token (default) or the provided one by the optional paremeter t in the original text. If t is not provided the current token is assumed.

Usage:

// Result: end character index of this token in the original text.
tok_end_idx
tok_end_idx(tok_this)

Description:
Returns current token.

Usage:

// Result: current token.
tok_this

Description:
Finds part token with given ID or aliase traversing entire part token graph. The start token is provided by t parameter. Token ID or alias to find is defined by a parameter. This function throws runtime exception if given alias or ID cannot be found or more than one token is found. This function never returns null. If more than one token is expected - use tok_find_parts() function instead. See also tok_has_part() function to check if certain part token exists.

Usage:

// Result: part token of the current token found by 'alias' alias,
//         if any, or throws runtime exception.
tok_find_part(tok_this, 'alias')

Description:
Checks the existence of the part token with given ID or aliase traversing entire part token graph. The start token is provided by t parameter. Token ID or alias to find is defined by a parameter. See also if() function for 'if-then-else' branching support.

Usage:

// Result: 'true' if part token of the current token found by 'alias' alias, 'false' otherwise.
tok_has_part(tok_this, 'alias')

// Result: part token 'alias' if it exists or the current token if it does not.
@this = tok_this
@tok = if(tok_has_part(@this, 'alias'), tok_find_part(@this, 'alias'), @this)

Description:
Finds part tokens with given ID or aliase traversing entire part token graph. The start token is provided by t parameter. Token ID or alias to find is defined by a parameter. This function may return an empty list but never a null.

Usage:

// Result: list of part tokens, potentially empty, of the current token found by 'alias' alias.
tok_find_parts(tok_this, 'alias')

// Result: part token 'alias' if it exists or the current token if it does not.
@this = tok_this
@parts = tok_find_parts(@this, 'alias')
@tok = if(is_empty(@parts), @this, first(@parts))

Description:
Returns token's original text. If t is not provided the current token is assumed.

Usage:

// Result: token original input text.
tok_txt

Description:
Returns token's normalized text. If t is not provided the current token is assumed.

Usage:

// Result: token normalized input text.
tok_norm_txt

Description:
Returns token's index in the original input. Note that this is an index of the token and not of the character. If t is not provided the current token is assumed.

Usage:

// Result: 'true' if index of this token in the original input is equal to 1.
tok_index == 1
tok_index(tok_this) == 1

Description:
Returns true if this token is the first in the original input. Note that this checks index of the token and not of the character. If t is not provided the current token is assumed.

Usage:

// Result: 'true' if this token is the first token in the original input.
tok_is_first
tok_is_first(tok_this)

Description:
Returns true if this token is the last in the original input. Note that this checks index of the token and not of the character. If t is not provided the current token is assumed

Usage:

// Result: 'true' if this token is the last token in the original input.
tok_is_last
tok_is_last(tok_this)

Description:
Returns true if there is a token with ID id after this token.

Usage:

// Result: 'true' if there is a token with ID 'a' after this token.
tok_is_before_id('a')

Description:
Returns true if there is a token with ID id before this token.

Usage:

// Result: 'true' if there is a token with ID 'a' before this token.
tok_is_after_id('a')

Description:
Returns true if this token is located between tokens with IDs id1 and id2.

Usage:

// Result: 'true' if this token is located after token with ID 'before' and before the token with ID 'after'.
tok_is_between_ids('before', 'after')

Description:
Returns true if this token is located between tokens with group IDs grp1 and grp2.

Usage:

// Result: 'true' if this token is located after token belonging to the group 'before' and before the token belonging to the group 'after'.
tok_is_between_groups('before', 'after')

Description:
Returns true if this token is located between tokens with parent IDs id1 and id2.

Usage:

// Result: 'true' if this token is located after token with parent ID 'before' and before the token with parent ID 'after'.
tok_is_between_parents('before', 'after')

Description:
Returns true if there is a token that belongs to the group grp after this token.

Usage:

// Result: 'true' if there is a token that belongs to the group 'grp' after this token.
tok_is_before_group('grp')

Description:
Returns true if there is a token that belongs to the group grp before this token.

Usage:

// Result: 'true' if there is a token that belongs to the group 'grp' before this token.
tok_is_after_group('grp')

Description:
Returns true if there is a token with parent ID parentId before this token.

Usage:

// Result: 'true' if there is a token with parent ID 'owner' before this token.
tok_is_after_parent('owner')

Description:
Returns true if there is a token with parent ID parentId after this token.

Usage:

// Result: 'true' if there is a token with parent ID 'owner' after this token.
tok_is_before_parent('owner')

Description:
Returns all tokens from the original input.

Usage:

// Result: list of all tokens for the original input.
tok_all

Description:
Returns number of tokens from the original input. It is equivalent to size(tok_all)

Usage:

// Result: number of all tokens for the original input.
tok_count

Description:
Returns list of tokens from the original input with ID id.

Usage:

// Result: list of tokens for the original input that have ID 'id'.
tok_all_for_id('id')

Description:
Returns list of tokens from the original input with parent ID parentId.

Usage:

// Result: list of tokens for the original input that have parent ID 'id'.
tok_all_for_parent('id')

Description:
Returns list of tokens from the original input that belong to the group grp.

Usage:

// Result: list of tokens for the original input that belong to th group 'grp'.
tok_all_for_group('grp')

Description:
Returns size or length of the given string, list or map. This function has aliases: size and count.

Usage:

// Result: 9
length("some text")

// Result: 3
@lst = list(1, 2, 3)
size(@lst)
count(@lst)

Description:
Returns true if string s matches Java regular expression rx, false otherwise.

Usage:

regex('textabc', '^text.*$') // Returns 'true'.
regex('_textabc', '^text.*$') // Returns 'false'.

Description:
Calls String.trim() on given parameter p and returns its result. This function has alias: strip

Usage:

// Result: "text"
trim(" text ")
strip(" text ")

Description:
Calls String.toUpperCase() on given parameter p and returns its result.

Usage:

// Result: "TEXT"
uppercase("text")

Description:
Calls String.toLowerCase() on given parameter p and returns its result.

Usage:

// Result: "text"
lowercase("TeXt")

Description:
Calls Apache Commons StringUtils.isAlpha() on given parameter p and returns its result.

Usage:

// Result: true
is_alpha("text")

Description:
Calls Apache Commons StringUtils.isAlphanumeric() on given parameter p and returns its result.

Usage:

// Result: true
is_alphanum("text123")

Description:
Calls Apache Commons StringUtils.isWhitespace() on given parameter p and returns its result.

Usage:

// Result: false
is_whitespace("text123")
// Result: true
is_whitespace("   ")

Description:
Calls Apache Commons StringUtils.isNumeric() on given parameter p and returns its result.

Usage:

// Result: true
is_num("123")

Description:
Calls Apache Commons StringUtils.isNumericSpace() on given parameter p and returns its result.

Usage:

// Result: true
is_numspace("  123")

Description:
Calls Apache Commons StringUtils.isAlphaSpace() on given parameter p and returns its result.

Usage:

// Result: true
is_alphaspace("  text  ")

Description:
Calls Apache Commons StringUtils.isAlphaNumericSpace() on given parameter p and returns its result.

Usage:

// Result: true
is_alphanumspace(" 123 text  ")

Description:
Calls p1.split(p2) and returns its result converted to the list.

Usage:

// Result: [ "a", "b", "c" ]
split("a|b|c", "|")

Description:
Calls p1.split(p2) converting the result to the list. Then calls String.strip() on each element.

Usage:

// Result: ["a", "b", "c"]
split_trim("a | b | c", "|")

Description:
Calls p1.startsWith(p2) and returns its result.

Usage:

// Result: true
starts_width("abc", "ab")

Description:
Calls p1.endsWith(p2) and returns its result.

Usage:

// Result: true
ends_width("abc", "bc")

Description:
Calls p1.contains(p2) and returns its result.

Usage:

// Result: true
contains("abc", "bc")

Description:
Calls p1.indexOf(p2) and returns its result.

Usage:

// Result: 1
index_of("abc", "bc")

Description:
Calls p1.substring(p2, p3) and returns its result.

Usage:

// Result: "bc"
substr("abc", 1, 3)

Description:
Calls p1.replace(p2, p3) and returns its result.

Usage:

// Result: "aBC"
replace("abc", "bc", "BC")

Description:
Converts given integer or string to double value.

Usage:

// Result: 1.2
to_double("1.2")
// Result: 1.0
to_double(1)

Description:
Converts given double or string to integer value. In case of double value it will be rounded to the nearest integer value.

Usage:

// Result: 1
to_int("1.2")
to_int(1.2)

Description:
Returns absolute value for parameter x.

Usage:

// Result: 1
abs(-1)
// Result: 1.5
abs(-1.5)

Description:
Calls Math.ceil(d).

Usage:

// Result: 2.0
ceil(1.5)

Description:
Calls Math.floor(d).

Usage:

// Result: 1.0
floor(1.5)

Description:
Calls Math.rint(d).

Usage:

// Result: 1.0
rint(1.2)

Description:
Calls Math.round(d).

Usage:

// Result: 1
round(1.2)

Description:
Calls Math.signum(d).

Usage:

// Result: 1.0
signum(1.2)

Description:
Calls Math.sqrt(d).

Usage:

// Result: 4.0
sqrt(2.0)

Description:
Calls Math.cbrt(d).

Usage:

// Result: -3.0
cbrt(-27.0)

Description:
Calls Math.acos(d).

Usage:

acos(1.0)

Description:
Calls Math.asin(d).

Usage:

asin(1.0)

Description:
Calls Math.atan(d).

Usage:

atan(1.0)

Description:
Calls Math.tan(d).

Usage:

tan(1.0)

Description:
Calls Math.sin(d).

Usage:

sin(1.0)

Description:
Calls Math.cos(d).

Usage:

cos(1.0)

Description:
Calls Math.tanh(d).

Usage:

tanh(1.0)

Description:
Calls Math.sinh(d).

Usage:

sinh(1.0)

Description:
Calls Math.cosh(d).

Usage:

cosh(1.0)

Description:
Calls Math.atan2(d1, d2).

Usage:

atan2(1.0, 1.0)

Description:
Calls Math.toDegrees(d).

Usage:

degrees(1.0)

Description:
Calls Math.toRadians(d).

Usage:

radians(1.0)

Description:
Calls Math.exp(d).

Usage:

exp(1.0)

Description:
Calls Math.expm1(d).

Usage:

expm1(1.0)

Description:
Calls Math.log(d).

Usage:

log(1.0)

Description:
Calls Math.log10(d).

Usage:

log10(1.0)

Description:
Returns square of >x

Usage:

// Result: 4
square(2)

Description:
Calls Math.log1p(d).

Usage:

log1p(1.0)

Description:
Returns PI constant.

Usage:

// Result: 3.14159265359
pi

Description:
Returns Euler constant.

Usage:

// Result: 0.5772156649
euler

Description:
Returns maximum value for given list. Throws runtime exception if the list is empty. This function uses a natural ordering.

Usage:

// Result: 3
max(list(1, 2, 3))

Description:
Returns minimum value for given list. Throws runtime exception if the list is empty. This function uses a natural ordering.

Usage:

// Result: 1
min(list(1, 2, 3))

Description:
Returns average (mean) value for given list of ints, doubles or strings. Throws runtime exception if the list is empty. If list list contains strings, they have to be convertable to int or double.

Usage:

// Result: 2.0
avg(list(1, 2, 3))
avg(list("1.0", 2, "3"))

Description:
Returns standard deviation value for given list of ints, doubles or strings. Throws runtime exception if the list is empty. If list list contains strings, they have to be convertable to int or double.

Usage:

stdev(list(1, 2, 3))
stdev(list("1.0", 2, "3"))

Description:
Calls Math.pow(d1, d2).

Usage:

// Result: 4.0
pow(2.0, 2.0)

Description:
Calls Math.hypot(d1, d2).

Usage:

hypot(2.0, 2.0)

Description:
Returns new list with given parameters.

Usage:

// Result: []
list
// Result: [1, 2, 3]
list(1, 2, 3)
// Result: ["1", true, 1.25]
list("1", 2 == 2, to_double('1.25'))

Description:
Gets element from either list or map c. For list the k must be 0-based integer index in the list.

Usage:

// Result: 1
get(list(1, 2, 3), 0)
// Result: true
get(json('{"a": true}'), "a")

Description:
Calls c.contains(x) and returns its result.

Usage:

// Result: true
has(list("a", "b"), "a")

Description:
Calls c.containsAll(x) and returns its result.

Usage:

// Result: false
has_all(list("a", "b"), "a")

Description:
Checks if list c contains any of the elements from the list x.

Usage:

// Result: true
has_any(list("a", "b"), list("a"))

Description:
Returns first element from the list c or null if the list is empty.

Usage:

// Result: "a"
first(list("a", "b"))

Description:
Returns last element from the list c or null if the list is empty.

Usage:

// Result: "b"
last(list("a", "b"))

Description:
Returns list of keys for map m.

Usage:

// Result: ["a", "b"]
keys(json('{"a": true, "b": 1}'))

Description:
Returns list of values for map m.

Usage:

// Result: [true, 1]
values(json('{"a": true, "b": 1}'))

Description:
Returns reversed list x. This function uses the natural sorting order.

Usage:

// Result: [3, 2, 1]
reverse(list(1, 2, 3))

Description:
Returns sorted list x. This function uses the natural sorting order.

Usage:

// Result: [1, 2, 3]
sort(list(2, 1, 3))

Description:
Checks if given string, list or map x is empty.

Usage:

// Result: false
is_empty("text")
is_empty(list(1))
is_empty(json('{"a": 1}'))

Description:
Checks if given string, list or map x is non empty.

Usage:

// Result: true
non_empty("text")
non_empty(list(1))
non_empty(json('{"a": 1}'))

Description:
Makes list x distinct.

Usage:

// Result: [1, 2, 3]
distinct(list(1, 2, 2, 3, 1))

Description:
Concatenates lists x1 and x2.

Usage:

// Result: [1, 2, 3, 4]
concat(list(1, 2), list(3, 4))

Description:
Finds a part token with alias or ID a, if any, and returns its metadata property p. If part token cannot be found the runtime exception will be thrown. Returns null if metadata does not exist.

Usage:

// Result: 'prop' property of 'alias' part token of the current token.
meta_part('alias', 'prop')

Description:
Gets token metadata property p. See token metadata for more information.

Usage:

// Result: 'nlpcraft:num:unit' token metadata property.
meta_tok('nlpcraft:num:unit')

Description:
Gets model metadata property p.

Usage:

// Result: 'my:prop' model metadata property.
meta_model('my:prop')

Description:
Gets user REST call data property p. See REST API for more details on ask and ask/sync REST calls.

Usage:

// Result: 'my:prop' user request data property.
meta_req('my:prop')

Description:
Gets user metadata property p.

Usage:

// Result: 'my:prop' user metadata property.
meta_user('my:prop')

Description:
Gets company metadata property p.

Usage:

// Result: 'my:prop' company metadata property.
meta_company('my:prop')

Description:
Gets intent metadata property p.

Usage:

// Result: 'my:prop' intent metadata property.
meta_intent('my:prop')

Description:
Gets conversation metadata property p.

Usage:

// Result: 'my:prop' conversation metadata property.
meta_conv('my:prop')

Description:
Gets fragment metadata property p. Fragment metadata can be optionally passed in when referencing the fragment to parameterize it.

Usage:

// Result: 'my:prop' fragment metadata property.
meta_frag('my:prop')

Description:
Gets system property or environment variable p.

Usage:

// Result: 'java.home' system property.
meta_sys('java.home')
// Result: 'HOME' environment variable.
meta_sys('HOME')

Description:
Returns current year.

Usage:

// Result: 2021
year

Description:
Returns current month: 1 ... 12.

Usage:

// Result: 5
month

Description:
Returns current day of the month: 1 ... 31.

Usage:

// Result: 5
day_of_month

Description:
Returns current day of the week: 1 ... 7.

Usage:

// Result: 5
day_of_week

Description:
Returns current day of the year: 1 ... 365.

Usage:

// Result: 51
day_of_year

Description:
Returns current hour: 0 ... 23.

Usage:

// Result: 11
hour

Description:
Returns current minute: 0 ... 59.

Usage:

// Result: 11
minute

Description:
Returns current second: 0 ... 59.

Usage:

// Result: 11
second

Description:
Returns current week of the month: 1 ... 4.

Usage:

// Result: 2
week_of_month

Description:
Returns current week of the year: 1 ... 56.

Usage:

// Result: 21
week_of_year

Description:
Returns current quarter: 1 ... 4.

Usage:

// Result: 2
quarter

Description:
Returns current time in milliseconds.

Usage:

// Result: 122312341212
now

Description:
Returns server request ID.

Usage:

// Result: server request ID.
req_id

Description:
Returns request normalized text.

Usage:

// Result: request normalized text.
req_normtext

Description:
Gets UTC/GMT timestamp in ms when user input was received.

Usage:

// Result: input receive timsstamp in ms.
req_tstamp

Description:
Gets remote client address that made the original REST call. Returns null if remote client address is not available.

Usage:

// Result: remote client address or 'null'.
req_addr

Description:
Gets remote client agent that made the original REST call. Returns null if remote client agent is not available.

Usage:

// Result: remote client agent or 'null'.
req_agent

Description:
Returns user ID.

Usage:

// Result: user ID.
user_id

Description:
Returns user first name.

Usage:

// Result: user first name.
user_fname

Description:
Returns user last name.

Usage:

// Result: user last name.
user_lname

Description:
Returns user email.

Usage:

// Result: user email.
user_email

Description:
Returns user admin flag.

Usage:

// Result: user admin flag.
user_admin

Description:
Returns user signup timestamp.

Usage:

// Result: user signup timestamp in milliseconds.
user_signup_tstamp

Description:
Returns company ID.

Usage:

// Result: company ID.
comp_id

Description:
Returns company name.

Usage:

// Result: company name.
comp_name

Description:
Returns company website.

Usage:

// Result: company website.
comp_website

Description:
Returns company country.

Usage:

// Result: company country.
comp_country

Description:
Returns company region.

Usage:

// Result: company region.
comp_region

Description:
Returns company city.

Usage:

// Result: company region.
comp_city

Description:
Returns company address.

Usage:

// Result: company address.
comp_addr

Description:
Returns company postal code.

Usage:

// Result: company postal code.
comp_postcode

Description:
This function provides 'if-then-else' equivalent as IDL does not provide branching on the language level. This function will evaluate c parameter and either return then value if it evaluates to true or else value in case if it evaluates to false. Note that evaluation will be short-circuit, i.e. either then or else will actually be computed but not both.

Usage:

// Result:
//  - 'list(1, 2, 3)' if 1st parameter is 'true'.
//  - 'null' if 1st parameter is 'false'.
if(meta_model('my_prop') == true, list(1, 2, 3), null)

Description:
Converts JSON in p parameter to a map. Use single quoted string to avoid escaping double quotes in JSON.

Usage:

// Result: Map.
json('{"a": 2, "b": [1, 2, 3]}')

Description:
Converts p parameter to a string. In case of a list this function will convert individual list elements to string and return the list of strings.

Usage:

// Result: "1.25"
to_string(1.25)
// Result: list("1", "2", "3")
to_string(list(1, 2, 3))

Description:
Returns p if it is not null, a otherwise. Note that evaluation will be short-circuit, i.e. a will be evaluated only if p is null.

Usage:

// Result: 'some_prop' model metadata or 'text' if one is 'null'.
@dflt = 'text'
or_else(meta_model('some_prop'), @dflt)

IDL Location

IDL declarations can be placed in different locations based on user preferences:

  • @NCIntent annotation takes a string as its parameter that should be a valid IDL declaration. For example, Scala code snippet:

                    @NCIntent("import('/opt/myproj/global_fragments.idl')") // Importing.
                    @NCIntent("intent=act term(act)={has(tok_groups, 'act')} fragment(f1)") // Defining in place.
                    def onMatch(
                        @NCIntentTerm("act") actTok: NCToken,
                        @NCIntentTerm("loc") locToks: List[NCToken]
                    ): NCResult = {
                        ...
                    }
                
  • External JSON/YAML data model configuration can provide one or more IDL declarations in intents field. For example:

                    {
                        "id": "nlpcraft.alarm.ex",
                        "name": "Alarm Example Model",
                        .
                        .
                        .
                        "intents": [
                            "import('/opt/myproj/global_fragments.idl')", // Importing.
                            "import('/opt/myproj/my_intents.idl')", // Importing.
                            "intent=alarm term~{#=='x:alarm'}" // Defining in place.
                        ]
                    }
                
  • External *.idl files contain IDL declarations and can be imported in any other places where IDL declarations are allowed. See import() statement explanation below. For example:
                        /*
                         * File 'my_intents.idl'.
                         * ======================
                         */
    
                        import('/opt/globals.idl') // Import global intents and fragments.
    
                        // Fragments.
                        // ----------
                        fragment=buzz term~{# == 'x:alarm'}
                        fragment=when
                            term(nums)~{
                                // Term variables.
                                @type = meta_tok('nlpcraft:num:unittype')
                                @iseq = meta_tok('nlpcraft:num:isequalcondition')
    
                                # == 'nlpcraft:num' && @type != 'datetime' && @iseq == true
                            }[0,7]
    
                        // Intents.
                        // --------
                        intent=alarm
                            fragment(buzz)
                            fragment(when)
                    

Binding Intent

IDL intents must be bound to their callback methods. This binding is accomplished using the following Java annotations:

AnnotationTargetDescription
@NCIntentCallback method or model class

When applied to a method this annotation allows to defines IDL intent in-place on the method serving as its callback. This annotation can also be applied to a model's class in which case it will just declare the intent without binding it and the callback method will need to use @NCIntentRef annotation to actually bind it to the declared intent above. Note that multiple intents can be bound to the same callback method, but only one callback method can be bound with a given intent.

This method is ideal for simple intents and quick declaration right in the source code and has all the benefits of having IDL to be part of the source code. However, multi-line IDL declaration can be awkward to add and maintain depending on JVM language, i.e. multi-line string literal support. In such cases it is advisable to move IDL declarations into separate *.idl file or files and import them either in the JSON/YAML model or at the model class level.

@NCIntentRefCallback method This annotation allows to reference an intent defined elsewhere like an external JSON or YAML model definition, *.idl file, or other @NCIntent annotations. In real applications, this is a most common way to bound an externally defined intent to its callback method.
@NCIntentTermCallback method parameter This annotation marks a formal callback method parameter to receive term's tokens when the intent to which this term belongs is selected as the best match.
@NCIntentSampleCallback method Annotation that provides one or more sample of the input that associated intent should match on. Although this annotation is optional it's highly recommended to provide at least several samples per intent. There's no upper limit on how many examples can be provided and typically the more examples the better for the built-in tools. These samples serve documentation purpose as well as used in built-in model auto-validation and and synonym suggesting tools.
@NCIntentSampleRefCallback method Annotation that allows to load samples of the input that associated intent should match on from the external sources like local file, classpath resource or URL. Although this annotation is optional it's highly recommended to provide at least several samples per intent. There's no upper limit on how many examples can be provided and typically the more examples the better for the built-in tools. These samples serve documentation purpose as well as used in built-in model auto-validation and and synonym suggesting tools.
@NCModelAddClassesModel Class This annotation allows adding specified classes to the list of classes that NLPCraft will scan when searching for intent callbacks. By default, only the model class itself and its ancestors are scanned. Using this annotation, larger models can be modularized and split into different compilation units. See also @NCModelAddPackage.
@NCModelAddPackageModel Class This annotation allows adding specified packages to the list of classes that NLPCraft will scan when searching for intent callbacks. All classes in the package will be scanned recursively when searching for annotated intent callbacks. By default, only the model class itself and its ancestors are scanned. Using this annotation, larger models can be modularized and split into different compilation units. See also @NCModelAddClasses.

Here's a couple of examples of intent declarations to illustrate the basics of intent declaration and usage.

An intent from Light Switch Scala example:

            @NCIntent("intent=act term(act)={groups @@ 'act'} term(loc)={trim(id) == 'ls:loc'}*")
            @NCIntentSample(Array(
                "Turn the lights off in the entire house.",
                "Switch on the illumination in the master bedroom closet.",
                "Get the lights on.",
                "Please, put the light out in the upstairs bedroom.",
                "Set the lights on in the entire house.",
                "Turn the lights off in the guest bedroom.",
                "Could you please switch off all the lights?",
                "Dial off illumination on the 2nd floor.",
                "Please, no lights!",
                "Kill off all the lights now!",
                "No lights in the bedroom, please."
            ))
            def onMatch(
                @NCIntentTerm("act") actTok: NCToken,
                @NCIntentTerm("loc") locToks: List[NCToken]
            ): NCResult = {
                ...
            }
        

NOTES:

  • The intent is defined in-place using @NCIntent annotation.
  • A term match is defined as one or more tokens. Term can be optional if its min quantifier is zero.
  • An intent act has two non-conversational terms: one mandatory term and another that can match zero or more tokens with method onMatch(...) as its callback.
  • Terms is conversational if it uses '~' and non-conversational if it uses '=' symbol in its definition. If term is conversational, the matching algorithm will look into the conversation context short-term-memory (STM) to seek the matching tokens for this term. Note that the terms that were fully or partially matched using tokens from the conversation context will contribute a smaller weight to the overall intent matching weight since these terms are less specific. Non-conversational terms will be matched using tokens found only in the current user input without looking at the conversation context.
  • Method onMatch(...) will be called if and when this intent is selected as the best match.
  • Note that terms have min=1, max=1 quantifiers by default, i.e. one and only one.
  • First term defines any single token that belongs to the group act. Note that model elements can belong to multiple groups.
  • Second term would match zero or more tokens with ID ls:loc. Note that we use function trim on the token ID.
  • Note that both terms have IDs (act and loc) that are used in onMatch(...) method parameters to automatically assign terms' tokens to the formal method parameters using @NCIntentTerm annotations.

In the following Alarm Clock Java example the intent is defined in JSON model definition and referenced in Java code using @NCIntentTerm annotation:

            {
                "id": "nlpcraft.alarm.ex",
                "name": "Alarm Example Model",
                "version": "1.0",
                "enabledBuiltInTokens": [
                    "nlpcraft:num"
                ],
                "elements": [
                    {
                        "id": "x:alarm",
                        "description": "Alarm token indicator.",
                        "synonyms": [
                            "{ping|buzz|wake|call|hit} {me|up|me up|_}",
                            "{set|_} {my|_} {wake|wake up|_} {alarm|timer|clock|buzzer|call} {up|_}"
                        ]
                    }
                ],
                "intents": [
                    "intent=alarm term~{# == 'x:alarm'} term(nums)~{# == 'nlpcraft:num' && meta_tok('nlpcraft:num:unittype') == 'datetime' && meta_tok('nlpcraft:num:isequalcondition') == true}[0,7]"
                ]
            }
        
            @NCIntentRef("alarm")
            @NCIntentSample({
                "Ping me in 3 minutes",
                "Buzz me in an hour and 15mins",
                "Set my alarm for 30s"
            })
            private NCResult onMatch(
               NCIntentMatch ctx,
               @NCIntentTerm("nums") List<NCToken> numToks
            ) {
               ...
            }
        

NOTES:

  • Intent is defined in the external JSON model declaration (see line 19 in JSON file).
  • This intent is referenced by annotation @NCIntentRef("alarm") with method onMatch(...) as its callback.
  • This example defines an intent with two conversational terms both of which have to found for the intent to match.
  • Terms is conversational if it uses '~' and non-conversational if it uses '=' symbol in its definition. If term is conversational, the matching algorithm will look into the conversation context short-term-memory (STM) to seek the matching tokens for this term. Note that the terms that were fully or partially matched using tokens from the conversation context will contribute a smaller weight to the overall intent matching weight since these terms are less specific. Non-conversational terms will be matched using tokens found only in the current user input without looking at the conversation context.
  • Method onMatch(...) will be called when this intent is the best match detected.
  • Note that terms have min=1, max=1 quantifiers by default.
  • First term is defined as a single mandatory (min=1, max=1) user token with ID x:alarm whose element is defined in the model.
  • Second term is defined as a zero or up to seven numeric built-in nlpcraft:num tokens that have unit type of datetime and are single numbers. Note that Alarm Clock model allows zero tokens in this term which would mean the current time.
  • Given data model definition above the following sentences will be matched by this intent:
    • Ping me in 3 minutes
    • Buzz me in an hour and 15mins
    • Set my alarm for 30s

Intent Matching Logic

In order to understand the intent matching logic lets review the overall user request processing workflow:

Fig. 1 User Request Workflow
  • Step: 0

    Server receives REST call /ask or /ask/sync that contains the text of the sentence that needs to be processed.

  • Step: 1

    At this step the server attempts to find additional variations of the input sentence by substituting certain words in the original text with synonyms from Google's BERT dataset. Note that server will not use the synonyms that are already defined in the model itself - it only tries to compensate for the potential incompleteness of the model. The result of this step is one or more sentences that all have the same meaning as the original text.

  • Step: 2

    At this step the server takes one or more sentences from the previous step and tokenizes them. This process involves converting the text into a sequence of enriched tokens representing named entities. This step also performs the initial server-side enrichment and detection of the built-in named entities.

    The result of this step is a sequence of converted sentences, where each element is a sequence of tokens. These sequences are send down to the data probe that has requested data model deployed.

  • Step: 3

    This is the first step of the probe-side processing. At this point the data probe receives one or more sequences of tokens. Probe then takes each sequence and performs the final enrichment by detecting user-defined elements additionally to the built-in tokens that were detected on the server during step 2 above.

  • Step: 4

    This is an important step for understanding intent matching logic. At this step the data probe takes sequences of tokens generated at the last step and comes up with one or more parsing variants. A parsing variant is a sequence of tokens that is free from token overlapping and other parsing ambiguities. Typically, a single sequence of tokens can produce one (always) or more parsing variants.

    Let's consider the input text 'A B C D' and the following elements defined in our model:

                    "elements": [
                        {
                            "id": "elm1",
                            "synonyms": ["A B"]
                        },
                        {
                            "id": "elm2",
                            "synonyms": ["B C"]
                        },
                        {
                            "id": "elm3",
                            "synonyms": ["D"]
                        }
                    ],
                    

    All of these elements will be detected but since two of them are overlapping (elm1 and elm2) there should be two parsing variants at the output of this step:

    1. elm1('A', 'B') freeword('C') elm3('D')
    2. freeword('A') elm2('B', 'C') elm3('D')

    Note that at this point the system cannot determine which of these variants is the best one for matching - there's simply not enough information at this stage. It can only be determined when each variant is matched against model's intents - which happens in the next step.

  • Step: 5

    At this step the actual matching between intents and variants happens. Each parsing variant from the previous step is matched against each intent. Each matching pair of a variant and an intent produce a match with a certain weight. If there are no matches at all - an error is returned. If matches were found, the match with the biggest weight is selected as a winning match. If multiple matches have the same weight, their respective variants' weights will be used to further sort them out. Finally, the intent's callback from the winning match is called.

    Although details on exact algorithm on weight calculation are too complex, here's the general guidelines on what determines the weight of the match between a parsing variant and the intent. Note that these rules coalesce around the principle idea that the more specific match always wins:

    • A match that captures more tokens has more weight than a match with less tokens. As a corollary, the match with less free words (i.e. unused words) has bigger weight than a match with more free words.
    • Tokens for user-defined elements are more important than built-in tokens.
    • A more specific match has bigger weight. In other words, a match that uses a token from the conversation context (i.e short-term-memory) has less weight than a match that only uses tokens from the current request. In the same way older tokens from the conversation give less weight than the more recent ones.

Intent Callback

Whether the intent is defined directly in @NCIntent annotation or indirectly via @NCIntentRef annotation - it is always bound to a callback method:

  • Callback can only be an instance method on the class implementing NCModel interface.
  • Method must have return type of NCResult.
  • Method can have zero or more parameters:
    • Parameter of type NCIntentMatch, if present, must be first.
    • Any other parameters (other than the first optional NCIntentMatch) must have @NCIntentTerm annotation.
  • Method must support reflection-based invocation.

@NCIntentTerm annotation marks callback parameter to receive term's tokens. This annotations can only be used for the parameters of the callbacks, i.e. methods that are annotated with @NCIntnet or @NCIntentRef. @NCIntentTerm takes a term ID as its only mandatory parameter and should be applied to callback method parameters to get the tokens associated with that term (if and when the intent was matched and that callback was invoked).

Depending on the term quantifier the method parameter type can only be one of the following types:

QuantifierJava TypeScala Type
[1,1]NCTokenNCToken
[0,1]Optional<NCToken>Option[NCToken]
[1,∞] or [0,∞]java.util.List<NCToken>List[NCToken]

For example:

            @NCIntent("intent=id term(termId)~{# == 'my_token'}?")
            private NCResult onMatch(
               @NCIntentTerm("termId") Optional<NCToken> myTok
            ) {
               ...
            }
        

NOTES:

  • Conversational term termId has [0,1] quantifier (it's optional).
  • The formal parameter on the callback has a type of Optional<NCToken> because the term's quantifier is [0,1].
  • Note that callback doesn't have an optional NCIntentMatch parameter.

NCRejection and NCIntentSkip Exceptions

There are two exceptions that can be used by intent callback logic to control intent matching process.

When NCRejection exception is thrown by the callback it indicates that user input cannot be processed as is. This exception typically indicates that user has not provided enough information in the input string to have it processed automatically. In most cases this means that the user's input is either too short or too simple, too long or too complex, missing required context, or is unrelated to the requested data model.

NCIntentSkip is a control flow exception to skip current intent. This exception can be thrown by the intent callback to indicate that current intent should be skipped (even though it was matched and its callback was called). If there's more than one intent matched the next best matching intent will be selected and its callback will be called.

This exception becomes useful when it is hard or impossible to encode the entire matching logic using only declarative IDL. In these cases the intent definition can be relaxed and the "last mile" of intent matching can happen inside of the intent callback's user logic. If it is determined that intent in fact does not match then throwing this exception allows to try next best matching intent, if any.

Note that there's a significant difference between NCIntentSkip exception and model's onMatchedIntent(...) callback. Unlike this callback, the exception does not force re-matching of all intents, it simply picks the next best intent from the list of already matched ones. The model's callback can force a full reevaluation of all intents against the user input.

IDL Expressiveness

Note that usage of NCIntentSkip exception (as well as model's life-cycle callbacks) is a required technique when you cannot express the desired matching logic with only IDL alone. IDL is a high-level declarative language and it does not support a complex programmable logic or other types of sophisticated matching algorithms. In such cases, you can define a broad intent that would broadly match and then define the rest of the more complex matching logic in the callback using NCIntentSkip exception to effectively indicate when intent doesn't match and other intents, if any, have to be tried.

There are many use cases where IDL is not expressive enough. For example, if your intent matching depends on financial market conditions, weather, state from external systems or details of the current user geographical location or social network status - you will need to use NCIntentSkip-based logic or model's callbacks to support that type of matching.

NCIntentMatch Interface

NCIntentMatch interface can be passed into intent callback as its first parameter. This interface provide runtime information about the intent that was matched (i.e. the intent with which this callback was annotated with). Note also that intent context can only be the 1st parameter in the callback, and if not declared as such - it won't be passed in.

Model Callbacks

NCModel interface provides several callbacks that are invoked before, during and after intent matching. They provide an opportunity to inject user cross-cutting concerns into a standard intent matching workflow of NLPCraft. Usage of these callbacks is completely optional, yet they provide convenient joint points for logging, statistic collections, security audit and validation, explicit conversation context management, model metadata updates, and many other aspects that may depend on the standard intent matching workflow:

CallbackDescription
NCModel#onParsedVariant(...)

A callback to accept or reject a parsed variant. This callback is called before any other callbacks at the beginning of the processing pipeline and it is called for each parsed variant. Note that a given user input can have one or more possible parsing variants. Depending on model configuration a user input can produce hundreds or even thousands of parsing variants that can significantly slow down the overall processing. This method allows to filter out unnecessary parsing variants based on variety of user-defined factors like number of tokens, presence of a particular token in the variant, etc.

NCModel#onContext(...)

A callback that is called when a fully assembled query context is ready. This callback is called after all onParsedVariant(...) callbacks are called but before any onMatchedIntent(...) are called, i.e. right before the intent matching is performed. It's called always once per user request processing. Typical use case for this callback is to perform logging, debugging, statistic or usage collection, explicit update or initialization of conversation context, security audit or validation, etc.

NCModel#onMatchedIntent(...)

A callback that is called when intent was successfully matched but right before its callback is called. This callback is called after onContext(...) is called and may be called multiple times depending on its return value. If true is returned than the default workflow will continue and the matched intent's callback will be called. However, if false is returned than the entire existing set of parsing variants will be re-matched against all declared intents again. Returning false allows this method to alter the state of the model (like soft-reset conversation or change metadata) and force the full re-evaluation of the parsing variants against all declared intents. Note that user logic should be careful not to induce infinite loop in this behavior.

Note that this callback may not be called at all based on the return value of onContext(...) callback. Typical use case for this callback is to perform logging, debugging, statistic or usage collection, explicit update or initialization of conversation context, security audit or validation, etc. This callback is especially useful for a soft reset of the conversation context when a condition for such reset can only be derived from within of intent callback.

NCModel#onResult(...)

A callback that is called when successful result is obtained from the intent callback and right before sending it back to the caller. This callback is called after onMatchedIntent(...) is called. Note that this callback may not be called at all, and if called - it's called only once. Typical use case for this callback is to perform logging, debugging, statistic or usage collection, explicit update or initialization of conversation context, security audit or validation, etc.

NCModel#onRejection(...)

A callback that is called when intent callback threw NCRejection exception. This callback is called after onMatchedIntent(...) is called. Note that this callback may not be called at all, and if called - it's called only once. Typical use case for this callback is to perform logging, debugging, statistic or usage collection, explicit update or initialization of conversation context, security audit or validation, etc.

NCModel#onError(...)

A callback that is called when intent callback failed with unexpected exception. Note that this callback may not be called at all, and if called - it's called only once. Typical use case for this callback is to perform logging, debugging, statistic or usage collection, explicit update or initialization of conversation context, security audit or validation, etc.

  • On This Page
  • Overview
  • IDL Syntax
  • Intent Lifecycle
  • Intent Examples
  • Syntax Highlighting
  • IDL Functions
  • Token Functions
  • Text Functions
  • Math Functions
  • Collection Functions
  • Metadata Functions
  • Datetime Functions
  • Request Functions
  • User Functions
  • Company Functions
  • Other Functions
  • IDL Location
  • Intent Binding
  • Intent Matching
  • Intent Callback
  • Model Callbacks
  • Quick Links
  • Examples
  • Javadoc
  • REST API
  • Download
  • Cheat Sheet
  • News & Events
  • Support
  • JIRA
  • Dev List
  • Stack Overflow
  • GitHub
  • Gitter
  • Twitter
  • YouTube
Copyright © 2021 Apache Software Foundation asf Events • Privacy • News • Docs release: 0.9.0 Gitter Built in: