Data Model processing logic is defined as a collection of one or more intents. The sections below explain what intent is, how to define it in your model, and how it works.
The goal of the data model implementation is to take the user input text and match it to a specific user-defined code that will execute for that input. The mechanism that provides this matching is called an intent.
The intent generally refers to the goal that the end-user had in mind when speaking or typing the input utterance. The intent has a declarative part or template written in IDL - Intent Definition Language that strictly defines a particular form the user input. Intent is also bound to a callback method that will be executed when that intent, i.e. its template, is detected as the best match for a given input. A typical data model will have multiple intents defined for each form of the expected user input that model wants to react to.
For example, a data model for banking chatbot or analytics application can have multiple intents for each domain-specific group of input such as opening an account, closing an account, transferring money, getting statements, etc.
Intents can be specific or generic in terms of what input they match. Multiple intents can overlap and NLPCraft will disambiguate such cases to select the intent with the overall best match. In general, the most specific intent match wins.
NLPCraft intents are written in Intent Definition Language (IDL). IDL is a relatively straightforward declarative language. For example, here's a simple intent x
with two terms a
and b
:
intent=x term(a)~{# == 'my_elm'} term(b)={has(tok_groups, "my_group")}
IDL intent defines a match between the parsed user input represented as the collection of tokens, and the user-define callback method. IDL intents are bound to their callbacks via Java annotation and can be located in the same Java annotations or placed in model YAML/JSON file as well as in external *.idl
files.
You can review the formal ANTLR4 grammar for IDL, but here are the general properties of IDL:
// Comment.
as well as multi-line /* Comment. */
.'text'
) or double quotes ("text"
) simplifying IDL usage in JSON or Java languages - you don't have to escape double quotes. Both quotes can be escaped in string, i.e. "text with \" quote"
or 'text with \' quote'
true
, false
and null
for boolean and null values.'_'
character for separation as in 200_000
.flow fragment import intent meta options term true false null
length(trim(" text "))
vs. OOP-style " text ".trim().length()
.if
and or_else
also provide the similar short-circuit evaluation.IDL program consists of intent, fragment, or import statements in any order or combination:
intent
statement
Intent is defined as one or more terms. Each term is a predicate over a instance of NCToken interface. For an intent to match all of its terms have to evaluate to true. Intent definition can be informally explained using the following full-feature example:
intent=xa flow="^(?:login)(^:logout)*$" meta={'enabled': true} term(a)={month >= 6 && !# != "z" && meta_intent('enabled') == true}[1,3] term(b)~{ @usrTypes = meta_model('user_types') (# == 'order' || # == 'order_cancel') && has_all(@usrTypes, list(1, 2, 3)) } intent=xb options={ 'ordered': false, 'unused_free_words': true, 'unused_sys_toks': true, 'unused_usr_toks': false, 'allow_stm_only': false } flow=/#flowModelMethod/ term(a)=/org.mypackage.MyClass#termMethod/? fragment(frag, {'p1': 25, 'p2': {'a': false}})
NOTES:
intent=xa
line 1intent=xb
line 12xa
and xb
are the mandatory intent IDs. Intent ID is any arbitrary unique string matching the following lexer template: (UNI_CHAR|UNDERSCORE|LETTER|DOLLAR)+(UNI_CHAR|DOLLAR|LETTER|[0-9]|COLON|MINUS|UNDERSCORE)*
options={...}
line 13Option | Type | Description | Default Value |
ordered | Boolean | Whether or not this intent is ordered. For ordered intent the specified order of terms is important for matching this intent. If intent is unordered its terms can be found in any order in the input text. Note that ordered intent significantly limits the user input it can match. In most cases the ordered intent is only applicable to processing of a formal grammar (like a programming language) and mostly unsuitable for the natural language processing. Note that while the
| false |
unused_free_words | Boolean | Whether or not free words - that are unused by intent matching - should be ignored (value true ) or reject the intent match (value false ). Free words are the words in the user input that were not recognized as any user or system token. Typically, for the natural language comprehension it is safe to ignore free words. For the formal grammar, however, this could make the matching logic too loose. | true |
unused_sys_toks | Boolean | Whether or not unused system tokens should be ignored (value true ) or reject the intent match (value false ). By default, tne unused system tokens are ignored. | true |
unused_usr_toks | Boolean | Whether or not unused user-defined tokens should be ignored (value true ) or reject the intent match (value false ). By default, tne unused user tokens are not ignored since it is assumed that user would define his or her own tokens on purpose and construct the intent logic appropriate. | false |
allow_stm_only | Boolean | Whether or not the intent can match when all of the matching tokens came from STM. By default, this special case is disabled (value false ). However, in specific intents designed for free-form language comprehension scenario, like, for example, SMS messaging - you may want to enable this option. | false |
flow="^(?:login)(^:logout)*$"
line 2flow=/#flowModelMethod/
line 20Optional. Dialog flow is a history of previously matched intents to match on. If provided, the intent will first match on the history of the previously matched intents before processing its terms. There are two way to define a match on the dialog flow:
Regular Expression
In this case dialog flow specification is a string with the standard Java regular expression. The history of previously matched intents is presented as a space separated string of intent IDs that were selected as the best match during the current conversation, in the chronological order with the most recent matched intent ID being the first element in the string. Dialog flow regular expression will be matched against that string representing intent IDs.
In the line 2, the ^(?:login)(^:logout)*$
dialog flow regular expression defines that intent should only match when the immediate previous intent was login
and no logout
intents are in the history. If the history is "login order order"
- this intent will match. However, for "login logout"
or "order login"
history this dialog flow will not match.
User-Defined Callback
In this case the dialog flow specification is defined as a callback in a form /x.y.z.Cass#method/
, where x.y.z.Class
should be a fully qualified name of the class where callback is defined, and method
must be the name of the callback method. This method should take one parameter of type java.util.List[NCDialogFlowItem]
and return boolean
result.
Class name is optional in which case the model class will be used by default. Note that if the custom class is in fact specified, the instance of this class will be created for each dialog flow test. This class must have a no-arg constructor to instantiate via standard Java reflection and its creation must be as light as possible to avoid performance degradation during its instantiation. For this reasons it is recommended to have dialog flow callback on the model class itself which will avoid instantiating the class on each dialog flow evaluation.
Note that if dialog flow is defined and it doesn't match the history the terms of the intent won't be tested at all.
meta={'enabled': true}
line 3 Optional. Just like the most of the components in NLPCraft, the intent can have its own metadata. Intent metadata is defined as a standard JSON object which will be converted into java.util.Map
instance and can be accessed in intent's terms via meta_intent()
IDL function. The typical use case for declarative intent metadata is to parameterize its behavior, i.e. the behavior of its terms, with a clearly defined properties that are provided inside intent definition itself.
term(a)={month >= 6 && !# != "z" && meta_intent('enabled') == true}[1,3]
line 4term(b)~{
line 5@usrTypes = meta_model('user_types')
(# == 'order' || # == 'order_cancel') && has_all(@usrTypes, list(1, 2, 3))
}
term(a)=/org.mypackage.MyClass#termMethod/?
line 21 Term is a building block of the intent. Intent must have at least one term. Term has optional ID, a token predicate and optional quantifiers. It supports conversation context if it uses '~'
symbol or not if it uses '='
symbol in its definition. For the conversational term the system will search for a match using tokens from the current request as well as the tokens from conversation STM (short-term-memory). For a non-conversational term - only tokens from the current request will be considered.
A term is matched if its token predicate returns true. The matched term represents one or more tokens, sequential or not, that were detected in the user input. Intent has a list of terms (always at least one) that all have to be matched in the user input for the intent to match. Note that term can be optional if its min quantifier is zero. Whether the order of the terms is important for matching is governed by intent's ordered
parameter.
Term ID (a
and b
) is optional. It is only required by @NCIntentTerm
annotation to link term's tokens to a formal parameter of the callback method. Note that term ID follows the same lexical rules as intent ID.
Term's body can be defined in two ways:
IDL Expression
Inside of curly brackets {
}
you can have an optional list of term variables and the mandatory term expression that must evaluate to a boolean value. Term variable name must start with @
symbol and be unique within the scope of the current term. All term variables must be defined and initialized before term expression which must be the last statement in the term:
term(b)~{ @a = meta_model('a') @lst = list(1, 2, 3, 4) has_all(@lst, list(@a, 2)) }
Term variable initialization expression as well as term's expression follow Java-like expression grammar including precedence rules, brackets and logical combinators, as well as built-in IDL functions calls:
term={true} // Special case of 'constant' term. term={ // Variable declarations. @a = round(1.25) @b = meta_model('my_prop') // Last expression must evaluate to boolean. (@a + 2) * @b > 0 } term={ // Variable declarations. @c = meta_tok('prop') @lst = list(1, 2, 3) // Last expression must evaluate to boolean. abs(@c) > 1 && size(@lst) != 5 }
NOTE: while term variable initialization expressions can have any type - the term's expression itself, i.e. the last expression in the term's body, must evaluate to a boolean result only. Failure to do so will result in a runtime exception during intent evaluation. Note also that such errors cannot be detected during intent compilation phase.
User-Defined Callback
In this case the term's body is defined as a callback in a form /x.y.z.Cass#method/
, where x.y.z.Class
should be a fully qualified name of the class where callback is defined, and method
must be the name of the callback method. This method should take one parameter of type NCTokenPredicateContext
and return an instance of NCTokenPredicateResult
as its result:
term(a)=/org.mypackage.MyClass#termMethod/?
Class name is optional in which case the model class will be used by default. Note that if the custom class is in fact specified, the instance of this class will be created for each term evaluation. This class must have a no-arg constructor to instantiate via standard Java reflection and its creation must be as light as possible to avoid performance degradation during its instantiation. For this reason it is recommended to have user-defined term callback on the model class itself which will avoid instantiating the class on each term evaluation.
?
and [1,3]
define an inclusive quantifier for that term, i.e. how many times the match for this term should found. You can use the following quick abbreviations:
*
is equal to [0,∞]
+
is equal to [1,∞]
?
is equal to [0,1]
[1,1]
As mentioned above the quantifier is inclusive, i.e. the [1,3]
means that the term should appear once, two times or three times.
fragment(frag, {'p1': 25, 'p2': {'a': false}})
line 22 Fragment reference allows to insert the terms defined by that fragment in place of this fragment reference. Fragment reference has mandatory fragment ID parameter and optional JSON second parameter. Optional JSON parameter allows to parameterize the inserted terms' behavior and it is available to the terms via meta_frag()
IDL function.
fragment
statement
Fragments allow to group and name a set of reusable terms. Such groups can be further parameterized at the place of reference and enable the reuse of one or more terms by multiple intents. For example:
// Fragments. fragment=buzz term~{# == meta_frag('id')} fragment=when term(nums)~{ // Term variable. @type = meta_tok('nlpcraft:num:unittype') @iseq = meta_tok('nlpcraft:num:isequalcondition') # == 'nlpcraft:num' && @type == 'datetime' && @iseq == true }[0,7] // Intents. intent=alarm // Insert parameterized terms from fragment 'buzz'. fragment(buzz, {"id": "x:alarm"}) // Insert terms from fragment 'when'. fragment(when)
NOTES:
buzz
and when
) and a list of terms. import
statement
Import statement allows to import IDL declarations from either local file, classpath resource or URL:
// Import using absolute path. import('/opt/globals.idl') // Import using classpath resource. import('org/apache/nlpcraft/examples/alarm/intents.idl') // Import using URL. import('ftp://user:password@myhost:22/opt/globals.idl')
NOTES:
import
keyword and has a string parameter that indicates the location of the resource to import.During NLPCraft data probe start it scans the models provided in its configuration for the intents. The scanning process goes through JSON/YAML external configurations as well as model classes when looking for IDL intents. All found intents are compiled into an internal representation before the data probe completes its start up sequence.
@NCModelAddClasses
and @NCModelAddPackage
Annotations
You can use these annotations to add specified classes and packages to the list of classes that will be scanned when NLPCraft searches for the annotated intent callbacks. By default, only the model class itself and its ancestors are scanned. Larger models can be modularized and split into separate compilation units to simplify their development and maintenance.
Note that not all intents problems can be detected at the compilation phase, and probe can start with intents not being completely validated. For example, each term in the intent must evaluate to a boolean result. This can only be checked at runtime. Another example is the number and the types of parameters passed into IDL function which is only checked at runtime as well.
Intents are compiled only once during the data probe start up sequence and cannot be re-compiled without data probe restart. Model logic, however, can affect the intent behavior through model callbacks, model metadata, user and company metadata, as well as request data all of which can change at runtime and are accessible through IDL functions.
Here's few of intent examples with explanations:
Example 1:
intent=a term~{# == 'x:id'} term(nums)~{# == 'nlpcraft:num' && lowercase(meta_tok('nlpcraft:num:unittype')) == 'datetime'}[0,2]
NOTES:
a
.true
) and default order (false
).~
) that have to be found for the intent to match. Note that second term is optional as it has [0,2]
quantifier.x:id
.nlpcraft:num
with nlpcraft:num:unittype
metadata property equal to 'datetime'
string.lowercase
used on nlpcraft:num:unittype
metadata property value.nums
) it can be references by @NCIntentTerm
annotation by the callback formal parameter.Example 2:
intent=id2 flow='id1 id2' term={# == 'mytok' && signum(get(meta_tok('score'), 'best')) != -1} term={has_any(tok_groups, list('actors', 'owners')) && size(meta_part('partAlias, 'text')) > 10}
NOTES:
id2
.'id1 id2'
. It expects the sequence of intents id1
and id2
somewhere in the history of previously matched intents in the course of the current conversation.=
). Both terms have to be present only once (their implicit quantifiers are [1,1]
).mytok
and have metadata property score
of type map. This map should have a value with the string key 'best'
. signum
of this map value should not equal -1
. Note that meta_tok()
, get()
and signum()
are all built-in IDL functions.actors
or owners
group. It should have a part token whose with alias partAlias
. That part token should have metadata property text
of type string, list or map. The length of this string, list or map should be greater than 10
.NLPCraft IDL has relatively simple syntax and you can easily configure its syntax highlighting in most modern code editors and IDEs. Here are two examples of how to add IDL syntax highlighting:
NLPCraft project comes with idea/nlpcraft_idl_idea_settings.zip
file that contains syntax highlighting configuration for *.idl
file types. Import this file (File -> Manage IDE Settings -> Import Settings...) and you will get proper syntax highlighting for *.idl
files in your project.
For highlighting the IDL syntax on the web you can use SyntaxHighlighter JavaScript library that is used for all IDL code on this website.
To add custom language support, create a new brush file shBrushIdl.js
with the following content and place it under scripts
folder in your local SyntaxHighlighter installation:
;(function() { // CommonJS typeof(require) != 'undefined' ? SyntaxHighlighter = require('shCore').SyntaxHighlighter : null; function Brush() { const keywords = 'flow fragment import intent meta options term'; const literals = 'false null true'; const symbols = '[\\[\\]{}*@+?~=]+'; const fns = 'abs asin atan atan2 avg cbrt ceil comp_addr comp_city comp_country comp_id comp_name comp_postcode comp_region comp_website concat contains cos cosh count day_of_month day_of_week day_of_year degrees distinct ends_with euler exp expm1 first floor get has has_all has_any hour hypot if index_of is_alpha is_alphanum is_alphanumspace is_alphaspace is_empty is_num is_numspace is_whitespace json keys last length list log log10 log1p lowercase max meta_company meta_conv meta_frag meta_intent meta_model meta_part meta_req meta_sys meta_tok meta_user min minute month non_empty now or_else pi pow quarter radians rand replace req_addr req_agent req_id req_normtext req_tstamp reverse rint round second signum sin sinh size sort split split_trim sqrt square starts_with stdev strip substr tan tanh to_double to_int to_string tok_aliases tok_all tok_all_for_group tok_all_for_id tok_all_for_parent tok_ancestors tok_count tok_end_idx tok_find_part tok_find_parts tok_groups tok_has_part tok_id tok_index tok_is_abstract tok_is_after_group tok_is_after_id tok_is_after_parent tok_is_before_group tok_is_before_id tok_is_before_parent tok_is_bracketed tok_is_direct tok_is_english tok_is_first tok_is_freeword tok_is_last tok_is_permutated tok_is_quoted tok_is_stopword tok_is_swear tok_is_user tok_is_wordnet tok_lemma tok_parent tok_pos tok_sparsity tok_start_idx tok_stem tok_this tok_unid tok_value trim uppercase user_admin user_email user_fname user_id user_lname user_signup_tstamp values week_of_month week_of_year year'; this.regexList = [ { regex: SyntaxHighlighter.regexLib.singleLineCComments, css: 'comments' }, // One line comments. { regex: SyntaxHighlighter.regexLib.multiLineCComments, css: 'comments' }, // Multiline comments. { regex: SyntaxHighlighter.regexLib.doubleQuotedString, css: 'string' }, // String. { regex: SyntaxHighlighter.regexLib.singleQuotedString, css: 'string' }, // String. { regex: /0x[a-f0-9]+|\d+(\.\d+)?/gi, css: 'value' }, // Numbers. { regex: new RegExp(this.getKeywords(keywords), 'gm'), css: 'keyword' }, // Keywords. { regex: new RegExp(this.getKeywords(literals), 'gm'), css: 'color1' }, // Literals. { regex: /<|>|<=|>=|==|!=|&&|\|\|/g, css: 'color2' }, // Operators. { regex: new RegExp(this.getKeywords(fns), 'gm'), css: 'functions' }, // Functions. { regex: new RegExp(symbols, 'gm'), css: 'color3' } // Symbols. ]; } Brush.prototype = new SyntaxHighlighter.Highlighter(); Brush.aliases = ['idl']; SyntaxHighlighter.brushes.Idl = Brush; // CommonJS. typeof(exports) != 'undefined' ? exports.Brush = Brush : null; })();
Make sure to include this script in your page:
<script src="/path/to/your/scripts/shBrushIdl.js" type="text/javascript"></script>
And then you can use it to display IDL code from HTML using <pre>
tag and brush: idl
CSS class:
<pre class="brush: idl"> intent=xa flow="^(?:login)(^:logout)*$" meta={'enabled': true} term(a)={month >= 6 && # != "z" && meta_intent('enabled') == true}[1,3] term(b)~{ @usrTypes = meta_model('user_types') (# == 'order' || # == 'order_cancel') && has_all(@usrTypes, list(1, 2, 3)) } </pre>
IDL provides over 150 built-in functions that can be used in IDL intent definitions. IDL function call takes on traditional fun_name(p1, p2, ... pk)
syntax form. If function has no parameters, the brackets are optional. IDL function operates on stack - its parameters are taken from the stack and its result is put back onto stack which in turn can become a parameter for the next function call and so on. IDL functions can have zero or more parameters and always have one result value. Some IDL functions support variable number of parameters. Note that you cannot define your own functions in IDL - in such cases you need to use the term with the user-defined callback method.
Special Shorthand #
The frequently used IDL function tok_id()
has a special shorthand #
. For example, the following expressions are all equal:
tok_id() == 'id' tok_id == 'id' // Remember - empty parens are optional. # == 'id'
When chaining the function calls IDL uses mathematical notation (a-la Python) rather than object-oriented one: IDL length(trim(" text "))
vs. OOP-style " text ".trim().length()
.
IDL functions operate with the following types:
JVM Type | IDL Name | Notes |
---|---|---|
java.lang.String | String | |
java.lang.Long java.lang.Integer java.lang.Short java.lang.Byte | Long | Smaller numerical types will be converted to java.lang.Long . |
java.lang.Double java.lang.Float | Double | java.lang.Float will be converted to java.lang.Double . |
java.lang.Boolean | Boolean | You can use true or false literals. |
java.util.List<T> | List[T] | Use list(...) IDL function to create new list. |
java.util.Map<K,V> | Map[K,V] | |
NCToken | Token | |
java.lang.Object | Any | Any of the supported types above. Use null literal for null value. |
Some IDL functions are polymorphic, i.e. they can accept arguments and return result of multiple types. Encountering unsupported types will result in a runtime error during intent matching. It is especially important to watch out for the types when adding objects to various metadata containers and using that metadata in the IDL expressions.
Unsupported Types
Detection of the unsupported types by IDL functions cannot be done during IDL compilation and can only be done during runtime execution. This means that even though the data probe compiles IDL intents and starts successfully - it does not guarantee that intents will operate correctly.
All IDL functions are organized into the following groups:
Description:
Returns token ID for the current token (default) or the provided one by the optional paremeter t
. Note that this functions has a special shorthand #
.
Usage:
// Result: 'true' if the current token ID is equal to 'my_id'. tok_id == 'my_id' # == 'my_id' tok_id(tok_this) == 'my_id' #(tok_this) == 'my_id'
Description:
Returns token lemma for the current token (default) or the provided one by the optional paremeter t
.
Usage:
// Result: 'true' if the current token lemma is equal to 'work'. tok_lemma == 'work' tok_lemma(tok_this) == 'work'
Description:
Returns token stem for the current token (default) or the provided one by the optional paremeter t
.
Usage:
// Result: 'true' if the current token stem is equal to 'work'. tok_stem == 'work' tok_stem(tok_this) == 'work'
Description:
Returns token PoS tag for the current token (default) or the provided one by the optional paremeter t
.
Usage:
// Result: 'true' if the current token PoS tag is equal to 'NN'. tok_pos == 'NN' tok_pos(tok_this) == 'NN'
Description:
Returns token sparsity value for the current token (default) or the provided one by the optional paremeter t
.
Usage:
// Result: token sparsity value. tok_sparsity tok_sparsity(tok_this)
Description:
Returns internal token globally unique ID for the current token (default) or the provided one by the optional paremeter t
.
Usage:
// Result: internal token globally unique ID. tok_unid tok_unid(tok_this)
Description:
Returns token abstract flag for the current token (default) or the provided one by the optional paremeter t
.
Usage:
// Result: token abstract flag. tok_is_abstract tok_is_abstract(tok_this)
Description:
Returns token bracketed flag for the current token (default) or the provided one by the optional paremeter t
.
Usage:
// Result: token bracketed flag. tok_is_bracketed tok_is_bracketed(tok_this)
Description:
Returns token direct flag for the current token (default) or the provided one by the optional paremeter t
.
Usage:
// Result: token direct flag. tok_is_direct tok_is_direct(tok_this)
Description:
Returns token permutated flag for the current token (default) or the provided one by the optional paremeter t
.
Usage:
// Result: token permutated flag. tok_is_permutated tok_is_permutated(tok_this)
Description:
Returns token English detection flag for the current token (default) or the provided one by the optional paremeter t
.
Usage:
// Result: token English detection flag. tok_is_english tok_is_english(tok_this)
Description:
Returns token freeword flag for the current token (default) or the provided one by the optional paremeter t
.
Usage:
// Result: token freeword flag. tok_is_freeword tok_is_freeword(tok_this)
Description:
Returns token quoted flag for the current token (default) or the provided one by the optional paremeter t
.
Usage:
// Result: token quoted flag. tok_is_quoted tok_is_quoted(tok_this)
Description:
Returns token stopword flag for the current token (default) or the provided one by the optional paremeter t
.
Usage:
// Result: token stopword flag. tok_is_stopword tok_is_stopword(tok_this)
Description:
Returns token swear word flag for the current token (default) or the provided one by the optional paremeter t
.
Usage:
// Result: token swear flag. tok_is_swear tok_is_swear(tok_this)
Description:
Returns if this token is defined by user-defined model element or a built-in element for the current token (default) or the provided one by the optional paremeter t
.
Usage:
// Result: wether or not this is defined by user model element vs. built-in, tok_is_user tok_is_user(tok_this)
Description:
Returns if this token's text is a known part of WordNet dictionary for the current token (default) or the provided one by the optional paremeter t
.
Usage:
// Result: whether or not this token is part of WordNet dictionary. tok_is_wordnet tok_is_wordnet(tok_this)
Description:
Gets the list of all parent IDs for the current token (default) or the provided one by the optional paremeter t
up to the root. This only available for user-defined model elements - built-in tokens do not have parents and will return an empty list. May return an empty list but never a null
.
Usage:
// Result: list of all ancestors. tok_ancestors tok_ancestors(tok_this)
Description:
Gets the optional parent ID of the model element the current token (default) or the provided one by the optional paremeter t
represents. This only available for user-defined model elements - built-in tokens do not have parents and this will return null
.
Usage:
// Result: list of all ancestors. tok_parent tok_parent(tok_this)
Description:
Gets the list of groups the current token (default) or the provided one by the optional paremeter t
belongs to. Note that, by default, if not specified explicitly, token always belongs to one group with ID equal to token ID. May return an empty list but never a null
.
Usage:
// Result: list of groups this token belongs to. tok_groups tok_groups(tok_this)
Description:
Gets the value if the current token (default) or the provided one by the optional paremeter t
was detected via element's value (or its synonyms). Otherwise returns null
. Only applicable for user-defined model elements - built-in tokens do not have values and it will return null
.
Usage:
// Result: the token value if this token was detected via element's value tok_value tok_value(tok_this)
Description:
Gets optional list of aliases the current token (default) or the provided one by the optional paremeter t
is known by. Token can get an alias if it is a part of other composed token and IDL expression that was used to match it specified an alias. Note that token can have zero, one or more aliases. May return an empty list but never a null
.
Usage:
// Result: checks if this token is known by 'alias' alias. has(tok_aliases, 'alias') has(tok_aliases(tok_this), 'alias')
Description:
Gets start character index of the current token (default) or the provided one by the optional paremeter t
in the original text.
Usage:
// Result: start character index of this token in the original text. tok_start_idx tok_start_idx(tok_this)
Description:
Gets end character index of the current token (default) or the provided one by the optional paremeter t
in the original text. If t
is not provided the current token is assumed.
Usage:
// Result: end character index of this token in the original text. tok_end_idx tok_end_idx(tok_this)
Description:
Returns current token.
Usage:
// Result: current token. tok_this
Description:
Finds part token with given ID or aliase traversing entire part token graph. The start token is provided by t
parameter. Token ID or alias to find is defined by a
parameter. This function throws runtime exception if given alias or ID cannot be found or more than one token is found. This function never returns null
. If more than one token is expected - use tok_find_parts()
function instead. See also tok_has_part()
function to check if certain part token exists.
Usage:
// Result: part token of the current token found by 'alias' alias, // if any, or throws runtime exception. tok_find_part(tok_this, 'alias')
Description:
Checks the existence of the part token with given ID or aliase traversing entire part token graph. The start token is provided by t
parameter. Token ID or alias to find is defined by a
parameter. See also if()
function for 'if-then-else' branching support.
Usage:
// Result: 'true' if part token of the current token found by 'alias' alias, 'false' otherwise. tok_has_part(tok_this, 'alias') // Result: part token 'alias' if it exists or the current token if it does not. @this = tok_this @tok = if(tok_has_part(@this, 'alias'), tok_find_part(@this, 'alias'), @this)
Description:
Finds part tokens with given ID or aliase traversing entire part token graph. The start token is provided by t
parameter. Token ID or alias to find is defined by a
parameter. This function may return an empty list but never a null
.
Usage:
// Result: list of part tokens, potentially empty, of the current token found by 'alias' alias. tok_find_parts(tok_this, 'alias') // Result: part token 'alias' if it exists or the current token if it does not. @this = tok_this @parts = tok_find_parts(@this, 'alias') @tok = if(is_empty(@parts), @this, first(@parts))
Description:
Returns token's original text. If t
is not provided the current token is assumed.
Usage:
// Result: token original input text. tok_txt
Description:
Returns token's normalized text. If t
is not provided the current token is assumed.
Usage:
// Result: token normalized input text. tok_norm_txt
Description:
Returns token's index in the original input. Note that this is an index of the token and not of the character. If t
is not provided the current token is assumed.
Usage:
// Result: 'true' if index of this token in the original input is equal to 1. tok_index == 1 tok_index(tok_this) == 1
Description:
Returns true
if this token is the first in the original input. Note that this checks index of the token and not of the character. If t
is not provided the current token is assumed.
Usage:
// Result: 'true' if this token is the first token in the original input. tok_is_first tok_is_first(tok_this)
Description:
Returns true
if this token is the last in the original input. Note that this checks index of the token and not of the character. If t
is not provided the current token is assumed
Usage:
// Result: 'true' if this token is the last token in the original input. tok_is_last tok_is_last(tok_this)
Description:
Returns true
if there is a token with ID id
after this token.
Usage:
// Result: 'true' if there is a token with ID 'a' after this token. tok_is_before_id('a')
Description:
Returns true
if there is a token with ID id
before this token.
Usage:
// Result: 'true' if there is a token with ID 'a' before this token. tok_is_after_id('a')
Description:
Returns true
if this token is located between tokens with IDs id1
and id2
.
Usage:
// Result: 'true' if this token is located after token with ID 'before' and before the token with ID 'after'. tok_is_between_ids('before', 'after')
Description:
Returns true
if this token is located between tokens with group IDs grp1
and grp2
.
Usage:
// Result: 'true' if this token is located after token belonging to the group 'before' and before the token belonging to the group 'after'. tok_is_between_groups('before', 'after')
Description:
Returns true
if this token is located between tokens with parent IDs id1
and id2
.
Usage:
// Result: 'true' if this token is located after token with parent ID 'before' and before the token with parent ID 'after'. tok_is_between_parents('before', 'after')
Description:
Returns true
if there is a token that belongs to the group grp
after this token.
Usage:
// Result: 'true' if there is a token that belongs to the group 'grp' after this token. tok_is_before_group('grp')
Description:
Returns true
if there is a token that belongs to the group grp
before this token.
Usage:
// Result: 'true' if there is a token that belongs to the group 'grp' before this token. tok_is_after_group('grp')
Description:
Returns true
if there is a token with parent ID parentId
before this token.
Usage:
// Result: 'true' if there is a token with parent ID 'owner' before this token. tok_is_after_parent('owner')
Description:
Returns true
if there is a token with parent ID parentId
after this token.
Usage:
// Result: 'true' if there is a token with parent ID 'owner' after this token. tok_is_before_parent('owner')
Description:
Returns all tokens from the original input.
Usage:
// Result: list of all tokens for the original input. tok_all
Description:
Returns number of tokens from the original input. It is equivalent to size(tok_all)
Usage:
// Result: number of all tokens for the original input. tok_count
Description:
Returns list of tokens from the original input with ID id
.
Usage:
// Result: list of tokens for the original input that have ID 'id'. tok_all_for_id('id')
Description:
Returns list of tokens from the original input with parent ID parentId
.
Usage:
// Result: list of tokens for the original input that have parent ID 'id'. tok_all_for_parent('id')
Description:
Returns list of tokens from the original input that belong to the group grp
.
Usage:
// Result: list of tokens for the original input that belong to th group 'grp'. tok_all_for_group('grp')
Description:
Returns size or length of the given string, list or map. This function has aliases: size
and count
.
Usage:
// Result: 9 length("some text") // Result: 3 @lst = list(1, 2, 3) size(@lst) count(@lst)
Description:
Returns true
if string s
matches Java regular expression rx
, false
otherwise.
Usage:
regex('textabc', '^text.*$') // Returns 'true'. regex('_textabc', '^text.*$') // Returns 'false'.
Description:
Calls String.trim()
on given parameter p
and returns its result. This function has alias: strip
Usage:
// Result: "text" trim(" text ") strip(" text ")
Description:
Calls String.toUpperCase()
on given parameter p
and returns its result.
Usage:
// Result: "TEXT" uppercase("text")
Description:
Calls String.toLowerCase()
on given parameter p
and returns its result.
Usage:
// Result: "text" lowercase("TeXt")
Description:
Calls Apache Commons StringUtils.isAlpha()
on given parameter p
and returns its result.
Usage:
// Result: true is_alpha("text")
Description:
Calls Apache Commons StringUtils.isAlphanumeric()
on given parameter p
and returns its result.
Usage:
// Result: true is_alphanum("text123")
Description:
Calls Apache Commons StringUtils.isWhitespace()
on given parameter p
and returns its result.
Usage:
// Result: false is_whitespace("text123") // Result: true is_whitespace(" ")
Description:
Calls Apache Commons StringUtils.isNumeric()
on given parameter p
and returns its result.
Usage:
// Result: true is_num("123")
Description:
Calls Apache Commons StringUtils.isNumericSpace()
on given parameter p
and returns its result.
Usage:
// Result: true is_numspace(" 123")
Description:
Calls Apache Commons StringUtils.isAlphaSpace()
on given parameter p
and returns its result.
Usage:
// Result: true is_alphaspace(" text ")
Description:
Calls Apache Commons StringUtils.isAlphaNumericSpace()
on given parameter p
and returns its result.
Usage:
// Result: true is_alphanumspace(" 123 text ")
Description:
Calls p1
.split(p2)
and returns its result converted to the list.
Usage:
// Result: [ "a", "b", "c" ] split("a|b|c", "|")
Description:
Calls p1
.split(p2)
converting the result to the list. Then calls String.strip() on each element.
Usage:
// Result: ["a", "b", "c"] split_trim("a | b | c", "|")
Description:
Calls p1
.startsWith(p2)
and returns its result.
Usage:
// Result: true starts_width("abc", "ab")
Description:
Calls p1
.endsWith(p2)
and returns its result.
Usage:
// Result: true ends_width("abc", "bc")
Description:
Calls p1
.contains(p2)
and returns its result.
Usage:
// Result: true contains("abc", "bc")
Description:
Calls p1
.substring(p2, p3)
and returns its result.
Usage:
// Result: "bc" substr("abc", 1, 3)
Description:
Calls p1
.replace(p2, p3)
and returns its result.
Usage:
// Result: "aBC" replace("abc", "bc", "BC")
Description:
Converts given integer or string to double value.
Usage:
// Result: 1.2 to_double("1.2") // Result: 1.0 to_double(1)
Description:
Converts given double or string to integer value. In case of double value it will be rounded to the nearest integer value.
Usage:
// Result: 1 to_int("1.2") to_int(1.2)
Description:
Returns absolute value for parameter x
.
Usage:
// Result: 1 abs(-1) // Result: 1.5 abs(-1.5)
Description:
Returns square of >x
Usage:
// Result: 4 square(2)
Description:
Returns PI constant.
Usage:
// Result: 3.14159265359 pi
Description:
Returns Euler constant.
Usage:
// Result: 0.5772156649 euler
Description:
Returns maximum value for given list. Throws runtime exception if the list is empty. This function uses a natural ordering.
Usage:
// Result: 3 max(list(1, 2, 3))
Description:
Returns minimum value for given list. Throws runtime exception if the list is empty. This function uses a natural ordering.
Usage:
// Result: 1 min(list(1, 2, 3))
Description:
Returns average (mean) value for given list of ints, doubles or strings. Throws runtime exception if the list is empty. If list list contains strings, they have to be convertable to int or double.
Usage:
// Result: 2.0 avg(list(1, 2, 3)) avg(list("1.0", 2, "3"))
Description:
Returns standard deviation value for given list of ints, doubles or strings. Throws runtime exception if the list is empty. If list list contains strings, they have to be convertable to int or double.
Usage:
stdev(list(1, 2, 3)) stdev(list("1.0", 2, "3"))
Description:
Returns new list with given parameters.
Usage:
// Result: [] list // Result: [1, 2, 3] list(1, 2, 3) // Result: ["1", true, 1.25] list("1", 2 == 2, to_double('1.25'))
Description:
Gets element from either list or map c
. For list the k
must be 0-based integer index in the list.
Usage:
// Result: 1 get(list(1, 2, 3), 0) // Result: true get(json('{"a": true}'), "a")
Description:
Calls c
.contains(x
) and returns its result.
Usage:
// Result: true has(list("a", "b"), "a")
Description:
Calls c
.containsAll(x
) and returns its result.
Usage:
// Result: false has_all(list("a", "b"), "a")
Description:
Checks if list c
contains any of the elements from the list x
.
Usage:
// Result: true has_any(list("a", "b"), list("a"))
Description:
Returns first element from the list c
or null
if the list is empty.
Usage:
// Result: "a" first(list("a", "b"))
Description:
Returns last element from the list c
or null
if the list is empty.
Usage:
// Result: "b" last(list("a", "b"))
Description:
Returns list of keys for map m
.
Usage:
// Result: ["a", "b"] keys(json('{"a": true, "b": 1}'))
Description:
Returns list of values for map m
.
Usage:
// Result: [true, 1] values(json('{"a": true, "b": 1}'))
Description:
Returns reversed list x
. This function uses the natural sorting order.
Usage:
// Result: [3, 2, 1] reverse(list(1, 2, 3))
Description:
Returns sorted list x
. This function uses the natural sorting order.
Usage:
// Result: [1, 2, 3] sort(list(2, 1, 3))
Description:
Checks if given string, list or map x
is empty.
Usage:
// Result: false is_empty("text") is_empty(list(1)) is_empty(json('{"a": 1}'))
Description:
Checks if given string, list or map x
is non empty.
Usage:
// Result: true non_empty("text") non_empty(list(1)) non_empty(json('{"a": 1}'))
Description:
Makes list x
distinct.
Usage:
// Result: [1, 2, 3] distinct(list(1, 2, 2, 3, 1))
Description:
Concatenates lists x1
and x2
.
Usage:
// Result: [1, 2, 3, 4] concat(list(1, 2), list(3, 4))
Description:
Finds a part token with alias or ID a
, if any, and returns its metadata property p
. If part token cannot be found the runtime exception will be thrown. Returns null
if metadata does not exist.
Usage:
// Result: 'prop' property of 'alias' part token of the current token. meta_part('alias', 'prop')
Description:
Gets token metadata property p
. See token metadata for more information.
Usage:
// Result: 'nlpcraft:num:unit' token metadata property. meta_tok('nlpcraft:num:unit')
Description:
Gets model metadata property p
.
Usage:
// Result: 'my:prop' model metadata property. meta_model('my:prop')
Description:
Gets user REST call data property p
. See REST API for more details on ask
and ask/sync
REST calls.
Usage:
// Result: 'my:prop' user request data property. meta_req('my:prop')
Description:
Gets user metadata property p
.
Usage:
// Result: 'my:prop' user metadata property. meta_user('my:prop')
Description:
Gets company metadata property p
.
Usage:
// Result: 'my:prop' company metadata property. meta_company('my:prop')
Description:
Gets intent metadata property p
.
Usage:
// Result: 'my:prop' intent metadata property. meta_intent('my:prop')
Description:
Gets conversation metadata property p
.
Usage:
// Result: 'my:prop' conversation metadata property. meta_conv('my:prop')
Description:
Gets fragment metadata property p
. Fragment metadata can be optionally passed in when referencing the fragment to parameterize it.
Usage:
// Result: 'my:prop' fragment metadata property. meta_frag('my:prop')
Description:
Gets system property or environment variable p
.
Usage:
// Result: 'java.home' system property. meta_sys('java.home') // Result: 'HOME' environment variable. meta_sys('HOME')
Description:
Returns current year.
Usage:
// Result: 2021 year
Description:
Returns current month: 1 ... 12.
Usage:
// Result: 5 month
Description:
Returns current day of the month: 1 ... 31.
Usage:
// Result: 5 day_of_month
Description:
Returns current day of the week: 1 ... 7.
Usage:
// Result: 5 day_of_week
Description:
Returns current day of the year: 1 ... 365.
Usage:
// Result: 51 day_of_year
Description:
Returns current hour: 0 ... 23.
Usage:
// Result: 11 hour
Description:
Returns current minute: 0 ... 59.
Usage:
// Result: 11 minute
Description:
Returns current second: 0 ... 59.
Usage:
// Result: 11 second
Description:
Returns current week of the month: 1 ... 4.
Usage:
// Result: 2 week_of_month
Description:
Returns current week of the year: 1 ... 56.
Usage:
// Result: 21 week_of_year
Description:
Returns current quarter: 1 ... 4.
Usage:
// Result: 2 quarter
Description:
Returns current time in milliseconds.
Usage:
// Result: 122312341212 now
Description:
Returns request normalized text.
Usage:
// Result: request normalized text. req_normtext
Description:
Gets UTC/GMT timestamp in ms when user input was received.
Usage:
// Result: input receive timsstamp in ms. req_tstamp
Description:
Gets remote client address that made the original REST call. Returns null
if remote client address is not available.
Usage:
// Result: remote client address or 'null'. req_addr
Description:
Gets remote client agent that made the original REST call. Returns null
if remote client agent is not available.
Usage:
// Result: remote client agent or 'null'. req_agent
Description:
Returns user signup timestamp.
Usage:
// Result: user signup timestamp in milliseconds. user_signup_tstamp
Description:
This function provides 'if-then-else' equivalent as IDL does not provide branching on the language level. This function will evaluate c
parameter and either return then
value if it evaluates to true
or else
value in case if it evaluates to false
. Note that evaluation will be short-circuit, i.e. either then
or else
will actually be computed but not both.
Usage:
// Result: // - 'list(1, 2, 3)' if 1st parameter is 'true'. // - 'null' if 1st parameter is 'false'. if(meta_model('my_prop') == true, list(1, 2, 3), null)
Description:
Converts JSON in p
parameter to a map. Use single quoted string to avoid escaping double quotes in JSON.
Usage:
// Result: Map. json('{"a": 2, "b": [1, 2, 3]}')
Description:
Converts p
parameter to a string. In case of a list this function will convert individual list elements to string and return the list of strings.
Usage:
// Result: "1.25" to_string(1.25) // Result: list("1", "2", "3") to_string(list(1, 2, 3))
Description:
Returns p
if it is not null
, a
otherwise. Note that evaluation will be short-circuit, i.e. a
will be evaluated only if p
is null
.
Usage:
// Result: 'some_prop' model metadata or 'text' if one is 'null'. @dflt = 'text' or_else(meta_model('some_prop'), @dflt)
IDL declarations can be placed in different locations based on user preferences:
@NCIntent annotation takes a string as its parameter that should be a valid IDL declaration. For example, Scala code snippet:
@NCIntent("import('/opt/myproj/global_fragments.idl')") // Importing. @NCIntent("intent=act term(act)={has(tok_groups, 'act')} fragment(f1)") // Defining in place. def onMatch( @NCIntentTerm("act") actTok: NCToken, @NCIntentTerm("loc") locToks: List[NCToken] ): NCResult = { ... }
External JSON/YAML data model configuration can provide one or more IDL declarations in intents
field. For example:
{ "id": "nlpcraft.alarm.ex", "name": "Alarm Example Model", . . . "intents": [ "import('/opt/myproj/global_fragments.idl')", // Importing. "import('/opt/myproj/my_intents.idl')", // Importing. "intent=alarm term~{#=='x:alarm'}" // Defining in place. ] }
*.idl
files contain IDL declarations and can be imported in any other places where IDL declarations are allowed. See import()
statement explanation below. For example:/* * File 'my_intents.idl'. * ====================== */ import('/opt/globals.idl') // Import global intents and fragments. // Fragments. // ---------- fragment=buzz term~{# == 'x:alarm'} fragment=when term(nums)~{ // Term variables. @type = meta_tok('nlpcraft:num:unittype') @iseq = meta_tok('nlpcraft:num:isequalcondition') # == 'nlpcraft:num' && @type != 'datetime' && @iseq == true }[0,7] // Intents. // -------- intent=alarm fragment(buzz) fragment(when)
IDL intents must be bound to their callback methods. This binding is accomplished using the following Java annotations:
Annotation | Target | Description |
---|---|---|
@NCIntent | Callback method or model class | When applied to a method this annotation allows to defines IDL intent in-place on the method serving as its callback. This annotation can also be applied to a model's class in which case it will just declare the intent without binding it and the callback method will need to use @NCIntentRef annotation to actually bind it to the declared intent above. Note that multiple intents can be bound to the same callback method, but only one callback method can be bound with a given intent. This method is ideal for simple intents and quick declaration right in the source code and has all the benefits of having IDL to be part of the source code. However, multi-line IDL declaration can be awkward to add and maintain depending on JVM language, i.e. multi-line string literal support. In such cases it is advisable to move IDL declarations into separate |
@NCIntentRef | Callback method | This annotation allows to reference an intent defined elsewhere like an external JSON or YAML model definition, *.idl file, or other @NCIntent annotations. In real applications, this is a most common way to bound an externally defined intent to its callback method. |
@NCIntentTerm | Callback method parameter | This annotation marks a formal callback method parameter to receive term's tokens when the intent to which this term belongs is selected as the best match. |
@NCIntentSample | Callback method | Annotation that provides one or more sample of the input that associated intent should match on. Although this annotation is optional it's highly recommended to provide at least several samples per intent. There's no upper limit on how many examples can be provided and typically the more examples the better for the built-in tools. These samples serve documentation purpose as well as used in built-in model auto-validation and and synonym suggesting tools. |
@NCIntentSampleRef | Callback method | Annotation that allows to load samples of the input that associated intent should match on from the external sources like local file, classpath resource or URL. Although this annotation is optional it's highly recommended to provide at least several samples per intent. There's no upper limit on how many examples can be provided and typically the more examples the better for the built-in tools. These samples serve documentation purpose as well as used in built-in model auto-validation and and synonym suggesting tools. |
@NCModelAddClasses | Model Class | This annotation allows adding specified classes to the list of classes that NLPCraft will scan when searching for intent callbacks. By default, only the model class itself and its ancestors are scanned. Using this annotation, larger models can be modularized and split into different compilation units. See also @NCModelAddPackage. |
@NCModelAddPackage | Model Class | This annotation allows adding specified packages to the list of classes that NLPCraft will scan when searching for intent callbacks. All classes in the package will be scanned recursively when searching for annotated intent callbacks. By default, only the model class itself and its ancestors are scanned. Using this annotation, larger models can be modularized and split into different compilation units. See also @NCModelAddClasses. |
Here's a couple of examples of intent declarations to illustrate the basics of intent declaration and usage.
An intent from Light Switch Scala example:
@NCIntent("intent=act term(act)={groups @@ 'act'} term(loc)={trim(id) == 'ls:loc'}*") @NCIntentSample(Array( "Turn the lights off in the entire house.", "Switch on the illumination in the master bedroom closet.", "Get the lights on.", "Please, put the light out in the upstairs bedroom.", "Set the lights on in the entire house.", "Turn the lights off in the guest bedroom.", "Could you please switch off all the lights?", "Dial off illumination on the 2nd floor.", "Please, no lights!", "Kill off all the lights now!", "No lights in the bedroom, please." )) def onMatch( @NCIntentTerm("act") actTok: NCToken, @NCIntentTerm("loc") locToks: List[NCToken] ): NCResult = { ... }
NOTES:
@NCIntent
annotation.act
has two non-conversational terms: one mandatory term and another that can match zero or more tokens with method onMatch(...)
as its callback.'~'
and non-conversational if it uses '='
symbol in its definition. If term is conversational, the matching algorithm will look into the conversation context short-term-memory (STM) to seek the matching tokens for this term. Note that the terms that were fully or partially matched using tokens from the conversation context will contribute a smaller weight to the overall intent matching weight since these terms are less specific. Non-conversational terms will be matched using tokens found only in the current user input without looking at the conversation context.onMatch(...)
will be called if and when this intent is selected as the best match.min=1, max=1
quantifiers by default, i.e. one and only one.act
. Note that model elements can belong to multiple groups.ls:loc
. Note that we use function trim
on the token ID.act
and loc
) that are used in onMatch(...)
method parameters to automatically assign terms' tokens to the formal method parameters using @NCIntentTerm
annotations. In the following Alarm Clock Java example the intent is defined in JSON model definition and referenced in Java code using @NCIntentTerm
annotation:
{ "id": "nlpcraft.alarm.ex", "name": "Alarm Example Model", "version": "1.0", "enabledBuiltInTokens": [ "nlpcraft:num" ], "elements": [ { "id": "x:alarm", "description": "Alarm token indicator.", "synonyms": [ "{ping|buzz|wake|call|hit} {me|up|me up|_}", "{set|_} {my|_} {wake|wake up|_} {alarm|timer|clock|buzzer|call} {up|_}" ] } ], "intents": [ "intent=alarm term~{# == 'x:alarm'} term(nums)~{# == 'nlpcraft:num' && meta_tok('nlpcraft:num:unittype') == 'datetime' && meta_tok('nlpcraft:num:isequalcondition') == true}[0,7]" ] }
@NCIntentRef("alarm") @NCIntentSample({ "Ping me in 3 minutes", "Buzz me in an hour and 15mins", "Set my alarm for 30s" }) private NCResult onMatch( NCIntentMatch ctx, @NCIntentTerm("nums") List<NCToken> numToks ) { ... }
NOTES:
@NCIntentRef("alarm")
with method onMatch(...)
as its callback.'~'
and non-conversational if it uses '='
symbol in its definition. If term is conversational, the matching algorithm will look into the conversation context short-term-memory (STM) to seek the matching tokens for this term. Note that the terms that were fully or partially matched using tokens from the conversation context will contribute a smaller weight to the overall intent matching weight since these terms are less specific. Non-conversational terms will be matched using tokens found only in the current user input without looking at the conversation context.onMatch(...)
will be called when this intent is the best match detected.min=1, max=1
quantifiers by default.min=1, max=1
) user token with ID x:alarm
whose element is defined in the model.nlpcraft:num
tokens that have unit type of datetime
and are single numbers. Note that Alarm Clock model allows zero tokens in this term which would mean the current time.Ping me in 3 minutes
Buzz me in an hour and 15mins
Set my alarm for 30s
In order to understand the intent matching logic lets review the overall user request processing workflow:
Server receives REST call /ask
or /ask/sync
that contains the text of the sentence that needs to be processed.
At this step the server attempts to find additional variations of the input sentence by substituting certain words in the original text with synonyms from Google's BERT dataset. Note that server will not use the synonyms that are already defined in the model itself - it only tries to compensate for the potential incompleteness of the model. The result of this step is one or more sentences that all have the same meaning as the original text.
At this step the server takes one or more sentences from the previous step and tokenizes them. This process involves converting the text into a sequence of enriched tokens representing named entities. This step also performs the initial server-side enrichment and detection of the built-in named entities.
The result of this step is a sequence of converted sentences, where each element is a sequence of tokens. These sequences are send down to the data probe that has requested data model deployed.
This is the first step of the probe-side processing. At this point the data probe receives one or more sequences of tokens. Probe then takes each sequence and performs the final enrichment by detecting user-defined elements additionally to the built-in tokens that were detected on the server during step 2 above.
This is an important step for understanding intent matching logic. At this step the data probe takes sequences of tokens generated at the last step and comes up with one or more parsing variants. A parsing variant is a sequence of tokens that is free from token overlapping and other parsing ambiguities. Typically, a single sequence of tokens can produce one (always) or more parsing variants.
Let's consider the input text 'A B C D'
and the following elements defined in our model:
"elements": [ { "id": "elm1", "synonyms": ["A B"] }, { "id": "elm2", "synonyms": ["B C"] }, { "id": "elm3", "synonyms": ["D"] } ],
All of these elements will be detected but since two of them are overlapping (elm1
and elm2
) there should be two parsing variants at the output of this step:
elm1
('A', 'B') freeword
('C') elm3
('D')freeword
('A') elm2
('B', 'C') elm3
('D')Note that at this point the system cannot determine which of these variants is the best one for matching - there's simply not enough information at this stage. It can only be determined when each variant is matched against model's intents - which happens in the next step.
At this step the actual matching between intents and variants happens. Each parsing variant from the previous step is matched against each intent. Each matching pair of a variant and an intent produce a match with a certain weight. If there are no matches at all - an error is returned. If matches were found, the match with the biggest weight is selected as a winning match. If multiple matches have the same weight, their respective variants' weights will be used to further sort them out. Finally, the intent's callback from the winning match is called.
Although details on exact algorithm on weight calculation are too complex, here's the general guidelines on what determines the weight of the match between a parsing variant and the intent. Note that these rules coalesce around the principle idea that the more specific match always wins:
Whether the intent is defined directly in @NCIntent
annotation or indirectly via @NCIntentRef
annotation - it is always bound to a callback method:
@NCIntentTerm
annotation. @NCIntentTerm
annotation marks callback parameter to receive term's tokens. This annotations can only be used for the parameters of the callbacks, i.e. methods that are annotated with @NCIntnet
or @NCIntentRef
. @NCIntentTerm
takes a term ID as its only mandatory parameter and should be applied to callback method parameters to get the tokens associated with that term (if and when the intent was matched and that callback was invoked).
Depending on the term quantifier the method parameter type can only be one of the following types:
Quantifier | Java Type | Scala Type |
---|---|---|
[1,1] | NCToken | NCToken |
[0,1] | Optional<NCToken> | Option[NCToken] |
[1,∞] or [0,∞] | java.util.List<NCToken> | List[NCToken] |
For example:
@NCIntent("intent=id term(termId)~{# == 'my_token'}?") private NCResult onMatch( @NCIntentTerm("termId") Optional<NCToken> myTok ) { ... }
NOTES:
termId
has [0,1]
quantifier (it's optional).Optional<NCToken>
because the term's quantifier is [0,1]
.NCRejection
and NCIntentSkip
Exceptions There are two exceptions that can be used by intent callback logic to control intent matching process.
When NCRejection exception is thrown by the callback it indicates that user input cannot be processed as is. This exception typically indicates that user has not provided enough information in the input string to have it processed automatically. In most cases this means that the user's input is either too short or too simple, too long or too complex, missing required context, or is unrelated to the requested data model.
NCIntentSkip is a control flow exception to skip current intent. This exception can be thrown by the intent callback to indicate that current intent should be skipped (even though it was matched and its callback was called). If there's more than one intent matched the next best matching intent will be selected and its callback will be called.
This exception becomes useful when it is hard or impossible to encode the entire matching logic using only declarative IDL. In these cases the intent definition can be relaxed and the "last mile" of intent matching can happen inside of the intent callback's user logic. If it is determined that intent in fact does not match then throwing this exception allows to try next best matching intent, if any.
Note that there's a significant difference between NCIntentSkip exception and model's onMatchedIntent(...) callback. Unlike this callback, the exception does not force re-matching of all intents, it simply picks the next best intent from the list of already matched ones. The model's callback can force a full reevaluation of all intents against the user input.
IDL Expressiveness
Note that usage of NCIntentSkip
exception (as well as model's life-cycle callbacks) is a required technique when you cannot express the desired matching logic with only IDL alone. IDL is a high-level declarative language and it does not support a complex programmable logic or other types of sophisticated matching algorithms. In such cases, you can define a broad intent that would broadly match and then define the rest of the more complex matching logic in the callback using NCIntentSkip
exception to effectively indicate when intent doesn't match and other intents, if any, have to be tried.
There are many use cases where IDL is not expressive enough. For example, if your intent matching depends on financial market conditions, weather, state from external systems or details of the current user geographical location or social network status - you will need to use NCIntentSkip
-based logic or model's callbacks to support that type of matching.
NCIntentMatch
Interface NCIntentMatch interface can be passed into intent callback as its first parameter. This interface provide runtime information about the intent that was matched (i.e. the intent with which this callback was annotated with). Note also that intent context can only be the 1st parameter in the callback, and if not declared as such - it won't be passed in.
NCModel interface provides several callbacks that are invoked before, during and after intent matching. They provide an opportunity to inject user cross-cutting concerns into a standard intent matching workflow of NLPCraft. Usage of these callbacks is completely optional, yet they provide convenient joint points for logging, statistic collections, security audit and validation, explicit conversation context management, model metadata updates, and many other aspects that may depend on the standard intent matching workflow:
Callback | Description |
---|---|
NCModel#onParsedVariant(...) | A callback to accept or reject a parsed variant. This callback is called before any other callbacks at the beginning of the processing pipeline and it is called for each parsed variant. Note that a given user input can have one or more possible parsing variants. Depending on model configuration a user input can produce hundreds or even thousands of parsing variants that can significantly slow down the overall processing. This method allows to filter out unnecessary parsing variants based on variety of user-defined factors like number of tokens, presence of a particular token in the variant, etc. |
NCModel#onContext(...) | A callback that is called when a fully assembled query context is ready. This callback is called after all |
NCModel#onMatchedIntent(...) | A callback that is called when intent was successfully matched but right before its callback is called. This callback is called after Note that this callback may not be called at all based on the return value of |
NCModel#onResult(...) | A callback that is called when successful result is obtained from the intent callback and right before sending it back to the caller. This callback is called after |
NCModel#onRejection(...) | A callback that is called when intent callback threw |
NCModel#onError(...) | A callback that is called when intent callback failed with unexpected exception. Note that this callback may not be called at all, and if called - it's called only once. Typical use case for this callback is to perform logging, debugging, statistic or usage collection, explicit update or initialization of conversation context, security audit or validation, etc. |
Token
FunctionsText
FunctionsMath
FunctionsCollection
FunctionsMetadata
FunctionsDatetime
FunctionsRequest
FunctionsUser
FunctionsCompany
FunctionsOther
Functions