Apache NLPCraft - Natural Language Interface

Overview

NCModel processing logic is defined as a pipeline and the collection of one or more intents to be matched on. The sections below explain what intent is, how to define it in your model, and how it works.

Intent

The goal of the data model implementation is to take the user input text, pass it through processing pipeline and match the resulting variants to a specific user-defined code that will execute for that input. The mechanism that provides this matching is called an intent.

The intent generally refers to the goal that the end-user had in mind when speaking or typing the input utterance. The intent has a declarative part or template written in IDL - Intent Definition Language that strictly defines a particular form the user input. Intent is also bound to a callback method that will be executed when that intent, i.e. its template, is detected as the best match for a given input. A typical data model will have multiple intents defined for each form of the expected user input that model wants to react to.

For example, a data model for banking chatbot or analytics application can have multiple intents for each domain-specific group of input such as opening an account, closing an account, transferring money, getting statements, etc.

Intents can be specific or generic in terms of what input they match. Multiple intents can overlap and NLPCraft will disambiguate such cases to select the intent with the overall best match. In general, the most specific intent match wins.

IDL Syntax

NLPCraft intents are written in Intent Definition Language (IDL). IDL is a relatively straightforward declarative language. For example, here's a simple intent x with two terms a and b:

            /* Intent 'x' definition. */
            intent=x
                term(a)~{# == 'my_elm'} // Term 'a'.
                term(b)={has(ent_groups, "my_group")} // Term 'b'.

IDL intent defines a match between the parsed user input represented as the collection of entities, and the user-define callback method. IDL intents are bound to their callbacks via Java annotation and can be located in the same Java annotations or in external *.idl files.

You can review the formal ANTLR4 grammar for IDL, but here are the general properties of IDL:

IDL has context-free grammar. In simpler terms, all whitespaces outside of string literals are ignored.
IDL supports Java-style comments, both single line // Comment. as well as multi-line /* Comment. */.
String literals can use either single quotes ('text') or double quotes ("text") simplifying IDL usage in Scala - you don't have to escape double quotes. Both quotes can be escaped in string, i.e. "text with \" quote" or 'text with \' quote'
Built-in literals true, false and null for boolean and null values.
Algebraic and logical expression including operator precedence follow standard Java language conventions.
Both integer and real numeric literals can use underscore '_' character for separation as in 200_000.
Numeric literals use Java string conversions.
IDL has only 10 reserved keywords: flow fragment import intent meta options term true false null
Identifiers and literals can use the same Unicode space as Java.
IDL provides over 100 built-in functions to aid in intent matching. IDL functions are pure immutable mathematical functions that work on a runtime stack. In other words, they look like Python functions: IDL length(trim(" text ")) vs. OOP-style " text ".trim().length().
IDL is a lazily evaluated language, i.e. expressions are evaluated only when required during runtime. That means that evaluated left-to-right logical AND and OR operators, for example, skip their right-part expressions if the left expression result is determinative for the overall result - so-called short-circuit evaluation. Some IDL functions like if and or_else also provide the similar short-circuit evaluation.

IDL program consists of intent, fragment, or import statements in any order or combination:

intent statement

Intent is defined as one or more terms. Each term is a predicate over a instance of NCEntity trait. For an intent to match all of its terms have to evaluate to true. Intent definition can be informally explained using the following full-feature example:

                    intent=xa
                        flow="^(?:login)(^:logout)*$"
                        meta={'enabled': true}
                        term(a)={month >= 6 && !# != "z" && meta_intent('enabled') == true}[1,3]
                        term(b)~{
                            @usrTypes = meta_req('user_types')

                            (# == 'order' || # == 'order_cancel') && has_all(@usrTypes, list(1, 2, 3))
                        }

                    intent=xb
                        options={
                            'ordered': false,
                            'unused_free_words': true,
                            'unused_entities': false,
                            'allow_stm_only': false
                        }
                        term(a)={length("some text") > 0}
                        fragment(frag, {'p1': 25, 'p2': {'a': false}})

NOTES:

intent=xa ^{line 1} intent=xb ^{line 11}

options={...} ^{line 12}

Optional. Matching options specified as JSON object. Entire JSON object as well as each individual JSON field is optional. Allows to customize the matching algorithm for this intent:

Option	Type	Description	Default Value
`ordered`	`Boolean`	Whether or not this intent is ordered. For ordered intent the specified order of terms is important for matching this intent. If intent is unordered its terms can be found in any order in the input text. Note that ordered intent significantly limits the user input it can match. In most cases the ordered intent is only applicable to processing of a formal grammar (like a programming language) and mostly unsuitable for the natural language processing. Note that while the `ordered` flag affect entire intent and all its terms, you can define the individual term that depends on the position of the entity. This, in fact, allows you have a subset of terms that order dependant. See the following IDL functions for details: `ent_index()` `ent_all()` `ent_count()` `ent_is_last()` `ent_is_first()` `ent_is_before_type()` `ent_is_before_group()` `ent_is_between_types()` `ent_is_between_groups()` `ent_is_after_type()` `ent_is_after_group()`	`false`
`unused_free_words`	`Boolean`	Whether or not free words - that are unused by intent matching - should be ignored (value `true`) or reject the intent match (value `false`). Free words are the words in the user input that were not recognized as any entity. Typically, for the natural language comprehension it is safe to ignore free words. For the formal grammar, however, this could make the matching logic too loose.	`true`
`unused_entities`	`Boolean`	Whether or not unused entities should be ignored (value `true`) or reject the intent match (value `false`). By default, tne unused entities are not ignored since it is assumed that user would define pipeline entity parser on purpose and construct the intent logic appropriate.	`false`
`allow_stm_only`	`Boolean`	Whether or not the intent can match when all of the matching entities came from STM. By default, this special case is disabled (value `false`). However, in specific intents designed for free-form language comprehension scenario, like, for example, SMS messaging - you may want to enable this option.	`false`

flow="^(?:login)(^:logout)*$" ^{line 2}

Optional. Dialog flow is a history of previously matched intents to match on. If provided, the intent will first match on the history of the previously matched intents before processing its terms by using regular expressions. Dialog flow specification is a string with the standard Java regular expression. The history of previously matched intents is presented as a space separated string of intent IDs that were selected as the best match during the current conversation, in the chronological order with the most recent matched intent ID being the first element in the string. Dialog flow regular expression will be matched against that string representing intent IDs.

In the line 2, the ^(?:login)(^:logout)*$ dialog flow regular expression defines that intent should only match when the immediate previous intent was login and no logout intents are in the history. If the history is "login order order" - this intent will match. However, for "login logout" or "order login" history this dialog flow will not match.

Note that if dialog flow is defined and it doesn't match the history the terms of the intent won't be tested at all.

meta={'enabled': true} ^{line 3}

Optional. Just like the most of the components in NLPCraft, the intent can have its own metadata. Intent metadata is defined as a standard JSON object which will be converted into java.util.Map instance and can be accessed in intent's terms via meta_intent() IDL function. The typical use case for declarative intent metadata is to parameterize its behavior, i.e. the behavior of its terms, with a clearly defined properties that are provided inside intent definition itself.

term(a)={month >= 6 && !# != "z" && meta_intent('enabled') == true}[1,3] ^{line 4} term(b)~{ ^{line 5} @usrTypes = meta_req('user_types') (# == 'order' || # == 'order_cancel') && has_all(@usrTypes, list(1, 2, 3)) } term(a)={length("some text") > 0} ^{line 18}

Term is a building block of the intent. Intent must have at least one term. Term has optional ID, an entity predicate and optional quantifiers. It supports conversation context if it uses '~' symbol or not if it uses '=' symbol in its definition. For the conversational term the system will search for a match using entities from the current request as well as the entities from conversation STM (short-term-memory). For a non-conversational term - only entities from the current request will be considered.

A term is matched if its entity predicate returns true. The matched term represents one or more entities, sequential or not, that were detected in the user input. Intent has a list of terms (always at least one) that all have to be matched in the user input for the intent to match. Note that term can be optional if its min quantifier is zero. Whether the order of the terms is important for matching is governed by intent's ordered parameter.

Term ID (a and b) is optional. It is only required by @NCIntentTerm annotation to link term's entities to a formal parameter of the callback method. Note that term ID follows the same lexical rules as intent ID.

Inside of curly brackets { } you can have an optional list of term variables and the mandatory term expression that must evaluate to a boolean value. Term variable name must start with @ symbol and be unique within the scope of the current term. All term variables must be defined and initialized before term expression which must be the last statement in the term:

                                    term(b)~{
                                        @a = meta_req('a')
                                        @lst = list(1, 2, 3, 4)

                                        has_all(@lst, list(@a, 2))
                                    }

Term variable initialization expression as well as term's expression follow Java-like expression grammar including precedence rules, brackets and logical combinators, as well as built-in IDL functions calls:

                                    term={true} // Special case of 'constant' term.
                                    term={
                                        // Variable declarations.
                                        @a = round(1.25)
                                        @b = meta_req('my_prop')

                                        // Last expression must evaluate to boolean.
                                        (@a + 2) * @b > 0
                                    }
                                    term={
                                        // Variable declarations.
                                        @c = meta_ent('prop')
                                        @lst = list(1, 2, 3)

                                        // Last expression must evaluate to boolean.
                                        abs(@c) > 1 && size(@lst) != 5
                                    }

NOTE: while term variable initialization expressions can have any type - the term's expression itself, i.e. the last expression in the term's body, must evaluate to a boolean result only. Failure to do so will result in a runtime exception during intent evaluation. Note also that such errors cannot be detected during intent compilation phase.

? and [1,3] define an inclusive quantifier for that term, i.e. how many times the match for this term should found. You can use the following quick abbreviations:

* is equal to [0,∞]
+ is equal to [1,∞]
? is equal to [0,1]
No quantifier defaults to [1,1]

As mentioned above the quantifier is inclusive, i.e. the [1,3] means that the term should appear once, two times or three times.

fragment(frag, {'p1': 25, 'p2': {'a': false}}) ^{line 19}

Fragment reference allows to insert the terms defined by that fragment in place of this fragment reference. Fragment reference has mandatory fragment ID parameter and optional JSON second parameter. Optional JSON parameter allows to parameterize the inserted terms' behavior and it is available to the terms via meta_frag() IDL function.

fragment statement

Fragments allow to group and name a set of reusable terms. Such groups can be further parameterized at the place of reference and enable the reuse of one or more terms by multiple intents. For example:

                    // Fragments.
                    fragment=buzz term~{# == meta_frag('id')}
                    fragment=when
                        term(nums)~{
                            // Term variable.
                            @type = meta_ent('num:unittype')
                            @iseq = meta_ent('num:isequalcondition')

                            # == 'num' && @type == 'datetime' && @iseq == true
                        }[0,7]

                    // Intents.
                    intent=alarm
                        // Insert parameterized terms from fragment 'buzz'.
                        fragment(buzz, {"id": "x:alarm"})

                        // Insert terms from fragment 'when'.
                        fragment(when)

NOTES:

Fragment statements (line 2 and 3) have a name (buzz and when) and a list of terms.
Terms follow the same syntax as in intent definition.
When a fragment is referenced in intent (lines 15 and 18) it is replaced with its terms.

import statement
Import statement allows to import IDL declarations from either local file, classpath resource or URL:
```
                    // Import using absolute path.
                    import('/opt/globals.idl')

                    // Import using classpath resource.
                    import('org/apache/nlpcraft/examples/alarm/intents.idl')

                    // Import using URL.
                    import('ftp://user:password@myhost:22/opt/globals.idl')
                
```
NOTES:
- The effect of importing is the same as if the imported declarations were inserted in place of import statement.
- Recursive and cyclic imports are detected and safely ignored.
- Import statement starts with import keyword and has a string parameter that indicates the location of the resource to import.
- For the classpath resource you don't need to specify leading forward slash.

Intent Lifecycle

During NCModelClient initialization it scans the provided model class for the intents. All found intents are compiled into an internal representation.

Note that not all intent-related problems can be detected at the compilation phase, and NCModelClient can be initialized with intents not being completely validated. For example, each term in the intent must evaluate to a boolean result. This can only be checked at runtime. Another example is the number and the types of parameters passed into IDL function which is only checked at runtime as well.

Intents are compiled only once during the NCModelClient initialization and cannot be re-compiled. Model logic, however, can affect the intent behavior through NCModel callback methods and metadata all of which can change at runtime and are accessible through IDL functions.

Intent Examples

Here's few of intent examples with explanations:

Example 1:

        intent=a
            term~{# == 'x:type'}
            term(nums)~{# == 'num' && lowercase(meta_ent('num:unittype')) == 'datetime'}[0,2]

NOTES:

Intent has ID a.
Intent uses default conversational support (true) and default order (false).
Intent has two conversational terms (~) that have to be found for the intent to match. Note that second term is optional as it has [0,2] quantifier.
Both terms have to be found in the user input for the intent to match.
First term matches any single entity with type x:type.
Second term can appear zero, once or two times and it matches entity with ID num with num:unittype metadata property equal to 'datetime' string.
IDL function lowercase used on num:unittype metadata property value.
Note that since second term has ID (nums) it can be references by @NCIntentTerm annotation by the callback formal parameter.

Example 2:

        intent=id2
            flow='id1 id2'
            term={# == 'myent' && signum(get(meta_ent('score'), 'best')) != -1}
            term={has_any(ent_groups, list('actors', 'owners'))}

NOTES:

Intent has ID id2.
Intent has dialog flow pattern: 'id1 id2'. It expects the sequence of intents id1 and id2 somewhere in the history of previously matched intents in the course of the current conversation.
Intent has two non-conversational terms (=). Both terms have to be present only once (their implicit quantifiers are [1,1]).
Both terms have to be found in the user input for the intent to match.
First term should be a entity with ID myent and have metadata property score of type map. This map should have a value with the string key 'best'. signum of this map value should not equal -1. Note that meta_ent(), get() and signum() are all built-in IDL functions.
Second term should be an entity that belongs to either actors or owners group.

IDL Functions

IDL provides over 100 built-in functions that can be used in IDL intent definitions. IDL function call takes on traditional fun_name(p1, p2, ... pk) syntax form. If function has no parameters, the brackets are optional. IDL function operates on stack - its parameters are taken from the stack and its result is put back onto stack which in turn can become a parameter for the next function call and so on. IDL functions can have zero or more parameters and always have one result value. Some IDL functions support variable number of parameters.

Special Shorthand #

The frequently used IDL function ent_type() has a special shorthand #. For example, the following expressions are all equal:

                ent_type() == 'type'
                ent_type == 'type' // Remember - empty parens are optional.
                # == 'type'

When chaining the function calls IDL uses mathematical notation (a-la Python) rather than object-oriented one: IDL length(trim(" text ")) vs. OOP-style " text ".trim().length().

IDL functions operate with the following types:

JVM Type	IDL Name	Notes
`java.lang.String`	`String`
`java.lang.Long` `java.lang.Integer` `java.lang.Short` `java.lang.Byte`	`Long`	Smaller numerical types will be converted to `java.lang.Long`.
`java.lang.Double` `java.lang.Float`	`Double`	`java.lang.Float` will be converted to `java.lang.Double`.
`java.lang.Boolean`	`Boolean`	You can use `true` or `false` literals.
`java.util.List<T>`	`List[T]`	Use `list(...)` IDL function to create new list.
`java.util.Map<K,V>`	`Map[K,V]`
`NCEntity`	`Entity`
`java.lang.Object`	`Any`	Any of the supported types above. Use `null` literal for null value.

Some IDL functions are polymorphic, i.e. they can accept arguments and return result of multiple types. Encountering unsupported types will result in a runtime error during intent matching. It is especially important to watch out for the types when adding objects to various metadata containers and using that metadata in the IDL expressions.

Unsupported Types

Detection of the unsupported types by IDL functions cannot be done during IDL compilation and can only be done during runtime execution. This means that even though the model compiles IDL intents and NCModelClient starts successfully - it does not guarantee that intents will operate correctly.

All IDL functions are organized into the following groups:

Description:
Returns entity type for the current entity (default) or the provided one by the optional parameter t. Note that this functions has a special shorthand #.

Usage:

// Result: 'true' if the current entity type is equal to 'my_type'.
ent_type == 'my_type'
# == 'my_type'
ent_type(ent_this) == 'my_type'
#(ent_this) == 'my_type'

Description:
Gets the list of groups the current entity (default) or the provided one by the optional parameter t belongs to. Note that, by default, if not specified explicitly, entity always belongs to one group with type equal to entity type. May return an empty list but never a null.

Usage:

// Result: list of groups this entity belongs to.
ent_groups
ent_groups(ent_this)

Description:
Returns current entity.

Usage:

// Result: current entity.
ent_this

Description:
Returns entity's original text. If t is not provided the current entity is assumed.

Usage:

// Result: entity original input text.
ent_text

Description:
Returns entity's index in the original input. Note that this is an index of the entity and not of the character. If t is not provided the current entity is assumed.

Usage:

// Result: 'true' if index of this entity in the original input is equal to 1.
ent_index == 1
ent_index(ent_this) == 1

Description:
Returns true if this entity is the first in the original input. Note that this checks index of the entity and not of the character. If t is not provided the current entity is assumed.

Usage:

// Result: 'true' if this entity is the first entity in the original input.
ent_is_first
ent_is_first(ent_this)

Description:
Returns true if this entity is the last in the original input. Note that this checks index of the entity and not of the character. If t is not provided the current entity is assumed

Usage:

// Result: 'true' if this entity is the last entity in the original input.
ent_is_last
ent_is_last(ent_this)

Description:
Returns true if there is a entity with type type after this entity.

Usage:

// Result: 'true' if there is a entity with type 'a' after this entity.
ent_is_before_type('a')

Description:
Returns true if there is a entity with type type before this entity.

Usage:

// Result: 'true' if there is a entity with type 'a' before this entity.
ent_is_after_type('a')

Description:
Returns true if this entity is located between entities with types type1 and type2.

Usage:

// Result: 'true' if this entity is located after entity with type 'before' and before the entity with type 'after'.
ent_is_between_types('before', 'after')

Description:
Returns true if this entity is located between entities with group IDs grp1 and grp2.

Usage:

// Result: 'true' if this entity is located after entity belonging to the group 'before' and before the entity belonging to the group 'after'.
ent_is_between_groups('before', 'after')

Description:
Returns true if there is a entity that belongs to the group grp after this entity.

Usage:

// Result: 'true' if there is a entity that belongs to the group 'grp' after this entity.
ent_is_before_group('grp')

Description:
Returns true if there is a entity that belongs to the group grp before this entity.

Usage:

// Result: 'true' if there is a entity that belongs to the group 'grp' before this entity.
ent_is_after_group('grp')

Description:
Returns all entities from the original input.

Usage:

// Result: list of all entities for the original input.
ent_all

Description:
Returns number of entities from the original input. It is equivalent to size(ent_all)

Usage:

// Result: number of all entities for the original input.
ent_count

Description:
Returns list of entities from the original input with type type.

Usage:

// Result: list of entities for the original input that have type 'type'.
ent_all_for_type('type')

Description:
Returns list of entities from the original input that belong to the group grp.

Usage:

// Result: list of entities for the original input that belong to th group 'grp'.
ent_all_for_group('grp')

Description:
Returns size or length of the given string, list or map. This function has aliases: size and count.

Usage:

// Result: 9
length("some text")

// Result: 3
@lst = list(1, 2, 3)
size(@lst)
count(@lst)

Description:
Returns true if string s matches Java regular expression rx, false otherwise.

Usage:

regex('textabc', '^text.*$') // Returns 'true'.
regex('_textabc', '^text.*$') // Returns 'false'.

Description:
Calls String.trim() on given parameter p and returns its result. This function has alias: strip

Usage:

// Result: "text"
trim(" text ")
strip(" text ")

Description:
Calls String.toUpperCase() on given parameter p and returns its result.

Usage:

// Result: "TEXT"
uppercase("text")

Description:
Calls String.toLowerCase() on given parameter p and returns its result.

Usage:

// Result: "text"
lowercase("TeXt")

Description:
Calls Apache Commons StringUtils.isAlpha() on given parameter p and returns its result.

Usage:

// Result: true
is_alpha("text")

Description:
Calls Apache Commons StringUtils.isAlphanumeric() on given parameter p and returns its result.

Usage:

// Result: true
is_alphanum("text123")

Description:
Calls Apache Commons StringUtils.isWhitespace() on given parameter p and returns its result.

Usage:

// Result: false
is_whitespace("text123")
// Result: true
is_whitespace("   ")

Description:
Calls Apache Commons StringUtils.isNumeric() on given parameter p and returns its result.

Usage:

// Result: true
is_num("123")

Description:
Calls Apache Commons StringUtils.isNumericSpace() on given parameter p and returns its result.

Usage:

// Result: true
is_numspace("  123")

Description:
Calls Apache Commons StringUtils.isAlphaSpace() on given parameter p and returns its result.

Usage:

// Result: true
is_alphaspace("  text  ")

Description:
Calls Apache Commons StringUtils.isAlphaNumericSpace() on given parameter p and returns its result.

Usage:

// Result: true
is_alphanumspace(" 123 text  ")

Description:
Calls p1.split(p2) and returns its result converted to the list.

Usage:

// Result: [ "a", "b", "c" ]
split("a|b|c", "|")

Description:
Calls p1.split(p2) converting the result to the list. Then calls String.strip() on each element.

Usage:

// Result: ["a", "b", "c"]
split_trim("a | b | c", "|")

Description:
Calls p1.startsWith(p2) and returns its result.

Usage:

// Result: true
starts_width("abc", "ab")

Description:
Calls p1.endsWith(p2) and returns its result.

Usage:

// Result: true
ends_width("abc", "bc")

Description:
Calls p1.contains(p2) and returns its result.

Usage:

// Result: true
contains("abc", "bc")

Description:
Calls p1.indexOf(p2) and returns its result.

Usage:

// Result: 1
index_of("abc", "bc")

Description:
Calls p1.substring(p2, p3) and returns its result.

Usage:

// Result: "bc"
substr("abc", 1, 3)

Description:
Calls p1.replace(p2, p3) and returns its result.

Usage:

// Result: "aBC"
replace("abc", "bc", "BC")

Description:
Converts given integer or string to double value.

Usage:

// Result: 1.2
to_double("1.2")
// Result: 1.0
to_double(1)

Description:
Converts given double or string to integer value. In case of double value it will be rounded to the nearest integer value.

Usage:

// Result: 1
to_int("1.2")
to_int(1.2)

Description:
Returns absolute value for parameter x.

Usage:

// Result: 1
abs(-1)
// Result: 1.5
abs(-1.5)

Description:
Calls Math.ceil(d).

Usage:

// Result: 2.0
ceil(1.5)

Description:
Calls Math.floor(d).

Usage:

// Result: 1.0
floor(1.5)

Description:
Calls Math.rint(d).

Usage:

// Result: 1.0
rint(1.2)

Description:
Calls Math.round(d).

Usage:

// Result: 1
round(1.2)

Description:
Calls Math.signum(d).

Usage:

// Result: 1.0
signum(1.2)

Description:
Calls Math.sqrt(d).

Usage:

// Result: 4.0
sqrt(2.0)

Description:
Calls Math.cbrt(d).

Usage:

// Result: -3.0
cbrt(-27.0)

Description:
Calls Math.acos(d).

Usage:

acos(1.0)

Description:
Calls Math.asin(d).

Usage:

asin(1.0)

Description:
Calls Math.atan(d).

Usage:

atan(1.0)

Description:
Calls Math.tan(d).

Usage:

tan(1.0)

Description:
Calls Math.sin(d).

Usage:

sin(1.0)

Description:
Calls Math.cos(d).

Usage:

cos(1.0)

Description:
Calls Math.tanh(d).

Usage:

tanh(1.0)

Description:
Calls Math.sinh(d).

Usage:

sinh(1.0)

Description:
Calls Math.cosh(d).

Usage:

cosh(1.0)

Description:
Calls Math.atan2(d1, d2).

Usage:

atan2(1.0, 1.0)

Description:
Calls Math.toDegrees(d).

Usage:

degrees(1.0)

Description:
Calls Math.toRadians(d).

Usage:

radians(1.0)

Description:
Calls Math.exp(d).

Usage:

exp(1.0)

Description:
Calls Math.expm1(d).

Usage:

expm1(1.0)

Description:
Calls Math.log(d).

Usage:

log(1.0)

Description:
Calls Math.log10(d).

Usage:

log10(1.0)

Description:
Returns square of >x

Usage:

// Result: 4
square(2)

Description:
Calls Math.log1p(d).

Usage:

log1p(1.0)

Description:
Returns PI constant.

Usage:

// Result: 3.14159265359
pi

Description:
Returns Euler constant.

Usage:

// Result: 0.5772156649
euler

Description:
Returns maximum value for given list. Throws runtime exception if the list is empty. This function uses a natural ordering.

Usage:

// Result: 3
max(list(1, 2, 3))

Description:
Returns minimum value for given list. Throws runtime exception if the list is empty. This function uses a natural ordering.

Usage:

// Result: 1
min(list(1, 2, 3))

Description:
Returns average (mean) value for given list of ints, doubles or strings. Throws runtime exception if the list is empty. If list list contains strings, they have to be convertable to int or double.

Usage:

// Result: 2.0
avg(list(1, 2, 3))
avg(list("1.0", 2, "3"))

Description:
Returns standard deviation value for given list of ints, doubles or strings. Throws runtime exception if the list is empty. If list list contains strings, they have to be convertable to int or double.

Usage:

stdev(list(1, 2, 3))
stdev(list("1.0", 2, "3"))

Description:
Calls Math.pow(d1, d2).

Usage:

// Result: 4.0
pow(2.0, 2.0)

Description:
Calls Math.hypot(d1, d2).

Usage:

hypot(2.0, 2.0)

Description:
Returns new list with given parameters.

Usage:

// Result: []
list
// Result: [1, 2, 3]
list(1, 2, 3)
// Result: ["1", true, 1.25]
list("1", 2 == 2, to_double('1.25'))

Description:
Gets element from either list or map c. For list the k must be 0-based integer index in the list.

Usage:

// Result: 1
get(list(1, 2, 3), 0)
// Result: true
get(json('{"a": true}'), "a")

Description:
Calls c.contains(x) and returns its result.

Usage:

// Result: true
has(list("a", "b"), "a")

Description:
Calls c.containsAll(x) and returns its result.

Usage:

// Result: false
has_all(list("a", "b"), "a")

Description:
Checks if list c contains any of the elements from the list x.

Usage:

// Result: true
has_any(list("a", "b"), list("a"))

Description:
Returns first element from the list c or null if the list is empty.

Usage:

// Result: "a"
first(list("a", "b"))

Description:
Returns last element from the list c or null if the list is empty.

Usage:

// Result: "b"
last(list("a", "b"))

Description:
Returns list of keys for map m.

Usage:

// Result: ["a", "b"]
keys(json('{"a": true, "b": 1}'))

Description:
Returns list of values for map m.

Usage:

// Result: [true, 1]
values(json('{"a": true, "b": 1}'))

Description:
Returns reversed list x. This function uses the natural sorting order.

Usage:

// Result: [3, 2, 1]
reverse(list(1, 2, 3))

Description:
Returns sorted list x. This function uses the natural sorting order.

Usage:

// Result: [1, 2, 3]
sort(list(2, 1, 3))

Description:
Checks if given string, list or map x is empty.

Usage:

// Result: false
is_empty("text")
is_empty(list(1))
is_empty(json('{"a": 1}'))

Description:
Checks if given string, list or map x is non empty.

Usage:

// Result: true
non_empty("text")
non_empty(list(1))
non_empty(json('{"a": 1}'))

Description:
Makes list x distinct.

Usage:

// Result: [1, 2, 3]
distinct(list(1, 2, 2, 3, 1))

Description:
Concatenates lists x1 and x2.

Usage:

// Result: [1, 2, 3, 4]
concat(list(1, 2), list(3, 4))

Description:
Gets entity metadata property p.

Usage:

// Result: 'nlp:token:text' entity metadata property.
meta_ent('nlp:token:text')

Description:
Gets request metadata property p.

Usage:

// Result: 'my:prop' user request data property.
meta_req('my:prop')

Description:
Gets intent metadata property p.

Usage:

// Result: 'my:prop' intent metadata property.
meta_intent('my:prop')

Description:
Gets conversation metadata property p.

Usage:

// Result: 'my:prop' conversation metadata property.
meta_conv('my:prop')

Description:
Gets fragment metadata property p. Fragment metadata can be optionally passed in when referencing the fragment to parameterize it.

Usage:

// Result: 'my:prop' fragment metadata property.
meta_frag('my:prop')

Description:
Gets system property or environment variable p.

Usage:

// Result: 'java.home' system property.
meta_sys('java.home')
// Result: 'HOME' environment variable.
meta_sys('HOME')

Description:
Gets configuration property p.

Usage:

// Result: 'my:prop' configuration property.
meta_cfg('my:prop')

Description:
Returns current year.

Usage:

// Result: 2021
year

Description:
Returns current month: 1 ... 12.

Usage:

// Result: 5
month

Description:
Returns current day of the month: 1 ... 31.

Usage:

// Result: 5
day_of_month

Description:
Returns current day of the week: 1 ... 7.

Usage:

// Result: 5
day_of_week

Description:
Returns current day of the year: 1 ... 365.

Usage:

// Result: 51
day_of_year

Description:
Returns current hour: 0 ... 23.

Usage:

// Result: 11
hour

Description:
Returns current minute: 0 ... 59.

Usage:

// Result: 11
minute

Description:
Returns current second: 0 ... 59.

Usage:

// Result: 11
second

Description:
Returns current week of the month: 1 ... 4.

Usage:

// Result: 2
week_of_month

Description:
Returns current week of the year: 1 ... 56.

Usage:

// Result: 21
week_of_year

Description:
Returns current quarter: 1 ... 4.

Usage:

// Result: 2
quarter

Description:
Returns current time in milliseconds.

Usage:

// Result: 122312341212
now

Description:
Returns request ID.

Usage:

// Result: request ID.
req_id

Description:
Returns request text.

Usage:

// Result: request text.
req_text

Description:
Gets UTC/GMT timestamp in ms when user input was received.

Usage:

// Result: input receive timsstamp in ms.
req_tstamp

Description:
Returns user ID

Usage:

// Result: user ID.
user_id

Description:
This function provides 'if-then-else' equivalent as IDL does not provide branching on the language level. This function will evaluate c parameter and either return then value if it evaluates to true or else value in case if it evaluates to false. Note that evaluation will be short-circuit, i.e. either then or else will actually be computed but not both.

Usage:

// Result:
//  - 'list(1, 2, 3)' if 1st parameter is 'true'.
//  - 'null' if 1st parameter is 'false'.
if(meta_model('my_prop') == true, list(1, 2, 3), null)

Description:
Converts JSON in p parameter to a map. Use single quoted string to avoid escaping double quotes in JSON.

Usage:

// Result: Map.
json('{"a": 2, "b": [1, 2, 3]}')

Description:
Converts p parameter to a string. In case of a list this function will convert individual list elements to string and return the list of strings.

Usage:

// Result: "1.25"
to_string(1.25)
// Result: list("1", "2", "3")
to_string(list(1, 2, 3))

Description:
Returns p if it is not null, a otherwise. Note that evaluation will be short-circuit, i.e. a will be evaluated only if p is null.

Usage:

// Result: 'some_prop' model metadata or 'text' if one is 'null'.
@dflt = 'text'
or_else(meta_model('some_prop'), @dflt)

IDL Location

IDL declarations can be placed in different locations based on user preferences:

NCIntent java annotation takes a string as its parameter that should be a valid IDL declaration. For example, Scala code snippet:

                @NCIntent("import('/opt/myproj/global_fragments.idl')") // Importing.
                @NCIntent("intent=act term(act)={has(ent_groups, 'act')} fragment(f1)") // Defining in place.
                def onMatch(
                    @NCIntentTerm("act") actEnt: NCEntity,
                    @NCIntentTerm("loc") locEnts: List[NCEntity]
                ): NCResult = {
                    ...
                }

External *.idl files contain IDL declarations and can be imported in any other places where IDL declarations are allowed. See import() statement explanation below. For example:

                    /*
                     * File 'my_intents.idl'.
                     * ======================
                     */

                    import('/opt/globals.idl') // Import global intents and fragments.

                    // Fragments.
                    // ----------
                    fragment=buzz term~{# == 'x:alarm'}
                    fragment=when
                        term(nums)~{
                            // Term variables.
                            @type = meta_ent('num:unittype')
                            @iseq = meta_ent('num:isequalcondition')

                            # == 'num' && @type != 'datetime' && @iseq == true
                        }[0,7]

                    // Intents.
                    // --------
                    intent=alarm
                        fragment(buzz)
                        fragment(when)

Binding Intent

IDL intents must be bound to their callback methods. This binding is accomplished using the following Java annotations:

Annotation	Target	Description
`@NCIntent`	Callback method or model class	When applied to a method this annotation allows to define IDL intent in-place on the method serving as its callback. This annotation can also be applied to a model's class in which case it will just declare the intent without binding it and the callback method will need to use `@NCIntentRef` annotation to actually bind it to the declared intent above. Note that multiple intents can be bound to the same callback method, but only one callback method can be bound with a given intent. This method is ideal for simple intents and quick declaration right in the source code and has all the benefits of having IDL to be part of the source code. However, multi-line IDL declaration can be awkward to add and maintain depending on Scala language, i.e. multi-line string literal support. In such cases it is advisable to move IDL declarations into separate `*.idl` file or files and import them at the model class level.
`@NCIntentRef`	Callback method	This annotation allows to reference an intent defined elsewhere like an external `*.idl` file, or other `@NCIntent` annotations. In real applications, this is a most common way to bound an externally defined intent to its callback method.
`@NCIntentObject`	Model class field	Marker annotation that can be applied to class member of main model. The fields objects annotated with this annotation are scanned the same way as main model.
`@NCIntentTerm`	Callback method parameter	This annotation marks a formal callback method parameter to receive term's entities when the intent to which this term belongs is selected as the best match.

Here's a couple of examples of intent declarations to illustrate the basics of intent declaration and usage.

An intent from Light Switch example:

            @NCIntent("intent=ls term(act)={has(ent_groups, 'act')} term(loc)={# == 'ls:loc'}*")
            def onMatch(
                @ctx: NCContext,
                @im: NCIntentMatch,
                @NCIntentTerm("act") actEnt: NCEntity,
                @NCIntentTerm("loc") locEnts: List[NCEntity]
            ): NCResult = {
                ...
            }

NOTES:

The intent is defined in-place using @NCIntent annotation.
A term match is defined as one or more entities. Term can be optional if its min quantifier is zero.
An intent act has two non-conversational terms: one mandatory term and another that can match zero or more entities with method onMatch(...) as its callback.
Terms is conversational if it uses '~' and non-conversational if it uses '=' symbol in its definition. If term is conversational, the matching algorithm will look into the conversation context short-term-memory (STM) to seek the matching entities for this term. Note that the terms that were fully or partially matched using entities from the conversation context will contribute a smaller weight to the overall intent matching weight since these terms are less specific. Non-conversational terms will be matched using entities found only in the current user input without looking at the conversation context.
Method onMatch(...) will be called if and when this intent is selected as the best match.
Note that terms have min=1, max=1 quantifiers by default, i.e. one and only one.
First term defines any single entity that belongs to the group act. Note that model elements can belong to multiple groups.
Note that both terms have IDs (act and loc) that are used in onMatch(...) method parameters to automatically assign terms' entities to the formal method parameters using @NCIntentTerm annotations.

In the following Time example the intent is defined model class and referenced in code using @NCIntentRef annotation:

            @NCIntent("fragment=city term(city)~{# == 'opennlp:location'}")
            @NCIntent("intent=intent2 term~{# == 'x:time'} fragment(city)")
            class TimeModel extends NCModel(
                ...
                @NCIntentRef("intent2")
                private def onRemoteMatch(
                    ctx: NCContext, im: NCIntentMatch, @NCIntentTerm("city") cityEnt: NCEntity
                ): NCResult =
                    ...

NOTES:

Intent is defined in the model, line 2.
This intent is referenced by annotation @NCIntentRef("intent2") with method onMatch(...) as its callback, line 5.
This example defines an intent with two conversational terms both of which have to found for the intent to match.
Terms is conversational if it uses '~' and non-conversational if it uses '=' symbol in its definition. If term is conversational, the matching algorithm will look into the conversation context short-term-memory (STM) to seek the matching entities for this term. Note that the terms that were fully or partially matched using entities from the conversation context will contribute a smaller weight to the overall intent matching weight since these terms are less specific. Non-conversational terms will be matched using entities found only in the current user input without looking at the conversation context.
Method onMatch(...) will be called when this intent is the best match detected.
Note that terms have min=1, max=1 quantifiers by default.
First term is defined as a single mandatory (min=1, max=1) entity with ID x:time whose element is defined in the model.
Second term is defined as a single mandatory (min=1, max=1) entity with entityopennlp:location.
Given data model definition above the following sentences will be matched by this intent:
- What time is it now in New York City?
- Show me time of the day in London.
- Can you please give me the Tokyo's current date and time.

Intent Matching Logic

NCPipeline processing result is collection of NCVariant instances. NCSemanticEntityParser is used for following example configured via JSON file. Let's consider the input text 'A B C D' and the following elements defined in our model:

            "elements": [
                {
                    "id": "elm1",
                    "synonyms": ["A B"]
                },
                {
                    "id": "elm2",
                    "synonyms": ["B C"]
                },
                {
                    "id": "elm3",
                    "synonyms": ["D"]
                }
            ],

All of these elements will be detected but since two of them are overlapping (elm1 and elm2) there should be two parsing variants at the output of this step:

elm1('A', 'B') freeword('C') elm3('D')
freeword('A') elm2('B', 'C') elm3('D')

Note that initially the system cannot determine which of these variants is the best one for matching - there's simply not enough information at this stage. It can only be determined when each variant is matched against model's intents. So, each parsing variant is matched against each intent. Each matching pair of a variant and an intent produce a match with a certain weight. If there are no matches at all - an error is returned. If matches were found, the match with the biggest weight is selected as a winning match. If multiple matches have the same weight, their respective variants' weights will be used to further sort them out. Finally, the intent's callback from the winning match is called.

Although details on exact algorithm on weight calculation are too complex, here's the general guidelines on what determines the weight of the match between a parsing variant and the intent. Note that these rules coalesce around the principle idea that the more specific match always wins:

A match that captures more entities has more weight than a match with less entities. As a corollary, the match with less free words (i.e. unused words) has bigger weight than a match with more free words.
Entities for user-defined elements are more important than built-in entities.
A more specific match has bigger weight. In other words, a match that uses an entity from the conversation context (i.e short-term-memory) has less weight than a match that only uses entities from the current request. In the same way older entities from the conversation give less weight than the more recent ones.

Intent Callback

Whether the intent is defined directly in @NCIntent annotation or indirectly via @NCIntentRef annotation - it is always bound to a callback method:

Callback can only be an instance method on the class implementing NCModel trait.
Method must have return type of NCResult.
Method should have at least two parameters:
- Parameter of type NCContext must be first.
- Parameter of type NCIntentMatch must be second.
- Any other parameters must have @NCIntentTerm annotation.
Method must support reflection-based invocation.

@NCIntentTerm annotation marks callback parameter to receive term's entities. This annotations can only be used for the parameters of the callbacks, i.e. methods that are annotated with @NCIntnet or @NCIntentRef. @NCIntentTerm takes a term ID as its only mandatory parameter and should be applied to callback method parameters to get the entities associated with that term (if and when the intent was matched and that callback was invoked).

Depending on the term quantifier the method parameter type can only be one of the following types:

Quantifier	Scala Type
`[1,1]`	`NCEntity`
`[0,1]`	`Option[NCEntity]`
`[1,∞]` or `[0,∞]`	`List[NCEntity]`

For example:

            NCIntent("intent=id term(termId)~{# == 'my_ent'}?")
            private def onMatch(
                ctx: NCContext,
                im: NCIntentMatch,
                @NCIntentTerm("termId") myEnt: Option[NCEntity]
            ): NCResult = {
               ...
            }

NOTES:

Conversational term termId has [0,1] quantifier (it's optional).
The formal parameter on the callback has a type of Option[NCEntity] because the term's quantifier is [0,1].

`NCRejection` and `NCIntentSkip` Exceptions

There are two exceptions that can be used by intent callback logic to control intent matching process.

When NCRejection exception is thrown by the callback it indicates that user input cannot be processed as is. This exception typically indicates that user has not provided enough information in the input string to have it processed automatically. In most cases this means that the user's input is either too short or too simple, too long or too complex, missing required context, or is unrelated to the requested data model.

NCIntentSkip is a control flow exception to skip current intent. This exception can be thrown by the intent callback to indicate that current intent should be skipped (even though it was matched and its callback was called). If there's more than one intent matched the next best matching intent will be selected and its callback will be called.

This exception becomes useful when it is hard or impossible to encode the entire matching logic using only declarative IDL. In these cases the intent definition can be relaxed and the "last mile" of intent matching can happen inside of the intent callback's user logic. If it is determined that intent in fact does not match then throwing this exception allows to try next best matching intent, if any.

Note that there's a significant difference between NCIntentSkip exception and model's NCModel#onMatchedIntent callback. Unlike this callback, the exception does not force re-matching of all intents, it simply picks the next best intent from the list of already matched ones. The model's callback can force a full reevaluation of all intents against the user input.

IDL Expressiveness

Note that usage of NCIntentSkip exception (as well as model's life-cycle callbacks) is a required technique when you cannot express the desired matching logic with only IDL alone. IDL is a high-level declarative language and it does not support a complex programmable logic or other types of sophisticated matching algorithms. In such cases, you can define a broad intent that would broadly match and then define the rest of the more complex matching logic in the callback using NCIntentSkip exception to effectively indicate when intent doesn't match and other intents, if any, have to be tried.

There are many use cases where IDL is not expressive enough. For example, if your intent matching depends on financial market conditions, weather, state from external systems or details of the current user geographical location or social network status - you will need to use NCIntentSkip-based logic or model's callbacks to support that type of matching.

`NCContext` Trait

NCContext trait passed into intent callback as its first parameter. This trait provide runtime information about the model configuration, request, extracted tokens and all entities variants, conversation control trait NCConversation.

`NCIntentMatch` Trait

NCIntentMatch trait passed into intent callback as its second parameter. This trait provide runtime information about the intent that was matched (i.e. the intent with which this callback was annotated with).

Intent Matching

Overview

Intent

IDL Syntax

Intent Lifecycle

Intent Examples

IDL Functions

ent_type(t: Entityopt) ⇒ String, # ⇒ String Returns entity type

ent_groups(t: Entityopt) ⇒ List[String] Gets the list of groups this entity belongs to

ent_this ⇒ Entity Returns current entity

ent_text(t: Entityopt) ⇒ String Returns entity's original text

ent_index(t: Entityopt) ⇒ Long Returns entity's index in the original input

ent_is_first(t: Entityopt) ⇒ Boolean Returns true if this entity is the first in the original input

ent_is_last(t: Entityopt) ⇒ Boolean Returns true if this entity is the last in the original input

ent_is_before_type(type: String) ⇒ Boolean Returns true if there is a entity with type type after this entity

ent_is_after_type(type: String) ⇒ Boolean Returns true if there is a entity with type type before this entity

ent_is_between_types(type1: String, type2: String) ⇒ Boolean Returns true if this entity is located between entities with types type1 and type2

ent_is_between_groups(grp1: String, grp2: String) ⇒ Boolean Returns true if this entity is located between entities with group IDs grp1 and grp2

ent_is_before_group(grp: String) ⇒ Boolean Returns true if there is a entity that belongs to the group grp after this entity

ent_is_after_group(grp: String) ⇒ Boolean Returns true if there is a entity that belongs to the group grp before this entity

ent_all ⇒ List[Entity] Returns all entities from the original input

ent_count ⇒ Long Returns number of entities from the original input

ent_all_for_type(type: String) ⇒ List[Entity] Returns list of entities from the original input with type type

ent_all_for_group(grp: String) ⇒ List[Entity] Returns list of entities from the original input that belong to the group grp

length(p: {String|List|Map}) ⇒ Long Returns size or length of the given string, list or map

regex(s: String, rx: String) ⇒ Boolean Checks whether string s matches Java regular expression rx

trim(p: String) ⇒ Long Calls String.trim()

uppercase(p: String) ⇒ String Calls String.toUpperCase()

lowercase(p: String) ⇒ String Calls String.toLowerCase()

is_alpha(p: String) ⇒ Boolean Calls StringUtils.isAlpha()

is_alphanum(p: String) ⇒ Boolean Calls StringUtils.isAlphanumeric()

is_whitespace(p: String) ⇒ Boolean Calls StringUtils.isWhitespace()

is_num(p: String) ⇒ Boolean Calls StringUtils.isNumeric()

is_numspace(p: String) ⇒ Boolean Calls StringUtils.isNumericSpace()

is_alphaspace(p: String) ⇒ Boolean Calls StringUtils.isAlphaSpace()

is_alphanumspace(p: String) ⇒ Boolean Calls StringUtils.isAlphaNumericSpace()

split(p1: String, p2: String) ⇒ List[String] Calls p1.split(p2)

split_trim(p1: String, p2: String) ⇒ List[String] Calls p1.split(p2)

starts_with(p1: String, p2: String) ⇒ Boolean Calls p1.startsWith(p2)

ends_with(p1: String, p2: String) ⇒ Boolean Calls p1.endsWith(p2)

contains(p1: String, p2: String) ⇒ Boolean Calls p1.contains(p2)

index_of(p1: String, p2: String) ⇒ Long Calls p1.indexOf(p2)

substr(p1: String, p2: Long, p3: Long) ⇒ String Calls p1.substring(p2, p3)

replace(p1: String, p2: String, p3: String) ⇒ String Calls p1.replace(p2, p3)

to_double(p1: {Long|String}) ⇒ Double Converts given integer or string to double value

to_int(p1: {Double|String}) ⇒ Long Converts given double or string to integer value

abs(x: {Long|Double}) ⇒ {Long|Double} Returns absolute value for parameter x

ceil(d: Double) ⇒ Double Calls Math.ceil(d)

floor(d: Double) ⇒ Double Calls Math.floor(d)

rint(d: Double) ⇒ Double Calls Math.rint(d)

round(d: Double) ⇒ Long Calls Math.round(d)

signum(d: Double) ⇒ Double Calls Math.signum(d)

sqrt(d: Double) ⇒ Double Calls Math.sqrt(d)

cbrt(d: Double) ⇒ Double Calls Math.cbrt(d)

acos(d: Double) ⇒ Double Calls Math.acos(d)

asin(d: Double) ⇒ Double Calls Math.asin(d)

atan(d: Double) ⇒ Double Calls Math.atan(d)

tan(d: Double) ⇒ Double Calls Math.tan(d)

sin(d: Double) ⇒ Double Calls Math.sin(d)

cos(d: Double) ⇒ Double Calls Math.cos(d)

tanh(d: Double) ⇒ Double Calls Math.tanh(d)

sinh(d: Double) ⇒ Double Calls Math.sinh(d)

cosh(d: Double) ⇒ Double Calls Math.cosh(d)

atan2(d1: Double, d2: Double) ⇒ Double Calls Math.atan2(d2, d2)

degrees(d: Double) ⇒ Double Calls Math.toDegrees(d)

radians(d: Double) ⇒ Double Calls Math.toRadians(d)

exp(d: Double) ⇒ Double Calls Math.exp(d)

expm1(d: Double) ⇒ Double Calls Math.expm1(d)

log(d: Double) ⇒ Double Calls Math.log(d)

log10(d: Double) ⇒ Double Calls Math.log10(d)

square(x: {Long|Double}) ⇒ {Long|Double} Returns square of >x

log1p(d: Double) ⇒ Double Calls Math.log1p(d)

pi ⇒ Double Returns PI constant

pi ⇒ Double Returns Euler constant

max(c: List) ⇒ Any Returns maximum value for given list

min(c: List) ⇒ Any Returns minimum value for given list

avg(c: List[{Long|Double|String}]) ⇒ Double Returns average (mean) value for given list

stdev(c: List[{Long|Double|String}]) ⇒ Double Returns standard deviation value for given list

pow(d1: Double, d2: Double) ⇒ Double Calls Math.pow(d2, d2)

hypot(d1: Double, d2: Double) ⇒ Double Calls Math.hypot(d2, d2)

`NCRejection` and `NCIntentSkip` Exceptions

`NCContext` Trait

`NCIntentMatch` Trait