org.apache.nlpcraft.nlp.parsers

Type members

Classlikes

class NCNLPEntityParser(predicate: NCToken => Boolean) extends NCEntityParser

Common NLP data entity parser.

Common NLP data entity parser.

This parser converts list of input NCToken instances one-to-one to the list of NCEntity instances with type nlp:entity. All NCEntity instances in the result list will contain the following metadata properties:

  • nlp:entity:text - token's text.
  • nlp:entity:index - token's index in the input sentence.
  • nlp:entity:startCharIndex - token text's first character index in the input sentence.
  • nlp:entity:endCharIndex - token text 's last character index in the input sentence.

Note that NCEntity instances inherit all NCToken metadata properties from its corresponding token with new name that is prefixed with 'nlp:entity:'. For example, for token property prop the corresponding inherited entity property name will be nlp:entity:prop.

Value parameters:
predicate

Predicate which allows to filter list of converted NCToken instances. Only tokens that satisfy given predicate will convert to entity by this parser. By default all NCToken instances are converted.

Source:
NCNLPEntityParser.scala
class NCOpenNLPEntityParser(findersMdlsRes: List[String]) extends NCEntityParser with LazyLogging

OpenNLP based language independent entity parser configured using OpenNLP name finders models.

OpenNLP based language independent entity parser configured using OpenNLP name finders models.

This parser prepares NCEntity instances which are detected by the provided models. These entities are created with type opennlp:modelName, where modelName is the model name. This parser also adds opennlp:modelName:probability double metadata property to the entities extracted from the corresponding model.

Some of free OpenNLP community-maintained models can be found here.

NOTE: that parser can be configured with multiple models and therefore may produce different types of NCEntity instances with each input NCToken being "mapped" into zero, one or more different entities. As a result, each input token may be included into more than one output NCEntity instances (or none at all).

Value parameters:
findersMdlsRes

Relative paths, absolute paths, resources or URLs to OpenNLP name finders models.

Source:
NCOpenNLPEntityParser.scala
class NCOpenNLPTokenParser(tokMdlRes: String) extends NCTokenParser with LazyLogging

OpenNLP based language independent entity parser configured using path to OpenNLP tokenizer model.

OpenNLP based language independent entity parser configured using path to OpenNLP tokenizer model.

Some of free OpenNLP community-maintained models can be found here.

Value parameters:
tokMdlRes

Relative path, absolute path, classpath resource or URL to the tokenizer model.

Source:
NCOpenNLPTokenParser.scala

This trait defines a named entity that is used by NCSemanticEntityParser.

This trait defines a named entity that is used by NCSemanticEntityParser.

THe main purpose of this trait is to provide a set of synonyms by which this named entity can be matched in the input text. Each synonym consists of one or more individual words. Synonym matching is performed on the normalized and stemmatized forms of both a synonym and a user input on first phase and if the first attempt was not successful, it tries to match stemmatized forms of synonyms with lemmatized and the stemmatized forms of user input. This approach provides more accurate matching and doesn't force users to provide synonyms in their initial words form.

Note that element's type is its implicit synonym so that even if no additional synonyms are defined at least one synonym always exists.

1st Phase: on the 1st phase NCSemanticEntityParser will use stemmatized forms of both the synonym and the user input. For example, aa single synonyms argue will match all following words argued, argues and arguing by utilizing the same stem argu.* Note that you can control stemmatization level by choosing preferable algorithm, look at the following article Differences Between Porter and Lancaster Stemming Algorithms. Also note that stemmatization approach effectiveness varies depending on the chosen languages.

2ng Phase: at the second phase, if the 1st phase didn't produce a match, NCSemanticEntityParser will try to use lemmatized and then stemmatized version of the user input against stemmatized form of the synonym. For example, if an element is defined via synonym go, all following user input texts will be matched: go, gone, goes, went. Note that it is enough to define just initial word's form for the synonym.

Beside described above synonyms, semantic element can also have an optional set of special synonyms called values or "proper nouns" for this element. Unlike basic synonyms, each value is a pair of a name and a set of standard synonyms by which that value, and ultimately its element, can be recognized in the user input. Note that the value name itself acts as an implicit synonym even when no additional synonyms added for that value.

Example 1.

- id: "ord:menu"
 description: "Order menu."
 synonyms:
   - "{menu|carte|card}"
   - "{products|goods|food|item|_} list"

This YAML representation describes semantic entity ord:menu that can be detected via synonyms: menu, products, products list etc.

Example 2.

- id: "ord:pizza:size"
 description: "Size of pizza."
 values:
   "small": [ "{small|smallest|min|minimal|tiny} {size|piece|_}" ]
   "medium": [ "{medium|intermediate|normal|regular} {size|piece|_}" ]
   "large": [ "{big|biggest|large|max|maximum|huge|enormous} {size|piece|_}" ]

This YAML definition describes semantic entity ord:pizza:size that can be detected via values synonyms: small, medium size, big piece etc. Note that value (small, medium or large in this example) is passed in created NCEntity as a property with a key element-type:value (ord:pizza:size:value in this example).

NOTE: these examples show how semantic elements can be defined via YAML format when these elements passed in NCSemanticEntityParser via resource definition, but there aren't any differences when semantic elements defined via JSON/YAML files or prepared programmatically.

See detailed description on the website Semantic Parser.

See also:
Source:
NCSemanticElement.scala
class NCSemanticEntityParser extends NCEntityParser with LazyLogging

Semantic entity parser implementation.

Semantic entity parser implementation.

This synonyms based parser provides simple yet powerful way to find domain specific data in the input text. It is configured via NCSemanticElement list which represents all possible named entities that this parser can detect.

Semantic elements can be configured via YAML or JSON files in special format or passed in this parser as programmatically prepared list. Semantic elements contain set of synonyms which can use special macros. These macros also can be provided via YAML and JSON files or passed directly in case of programmatically prepared NCSemanticElement list.

Example of YAML elements definition.

macros:
 "<OF>": "{of|for|per}"
 "<CUR>": "{current|present|now|local}"
 "<TIME>": "{time <OF> day|day time|date|time|moment|datetime|hour|o'clock|clock|date time}"
elements:
 - id: "x:time"
   description: "Date and/or time token indicator."
   synonyms:
     - "{<CUR>|_} <TIME>"
     - "what <TIME> {is it now|now|is it|_}"

Given this simple definition the x:time element can be detected by a large number of synonyms like day time, local day time, time of day, local time of day, what hour is it, etc.

Value parameters:
elements

Programmatically prepared NCSemanticElement instances. Note that either the model or elements must be supplied at least.

macros

Macros map which are used for extracting NCSemanticElement synonyms defined via macros. More information at https://nlpcraft.apache.org/built-in-entity-parser.html#macros.

mdlResOpt

Optional relative path, absolute path, classpath resource or URL to YAML or JSON semantic model which contains NCSemanticElement definitions. Note that either the model or elements must be supplied at least.

parser

NCTokenParser implementation which will be used for NCSemanticElement synonyms tokenization. It should be same implementation as used in NCPipeline.getTokenParser.

stemmer

NCStemmer implementation which used to match tokens and given NCSemanticElement synonyms.

See also:
Source:
NCSemanticEntityParser.scala