NCDictionaryTokenEnricher
Dictionary-based "known-word" token enricher.
This enricher adds dict
boolean metadata property to the token
instance if word it represents is a known dictionary word, i.e. the configured dictionary contains this word's
lemma. The value true
of the metadata property indicates that this word's lemma is found in the dictionary,
false
value indicates otherwise.
NOTE: this implementation requires lemma
string metadata property that contains
token's lemma. You can configure NCOpenNLPTokenEnricher for required language that provides this
metadata property before this enricher in your pipeline.
- Value parameters:
- dictRes
Relative path, absolute path, classpath resource or URL to the dictionary. The dictionary should have a simple plain text format with one lemma per line, empty lines are skipped, duplicates ignored, lines starting with # symbol will be treated as comments and ignored. Note that the search in the dictionary is implemented using words' lemma and case is ignored.
- Source:
- NCDictionaryTokenEnricher.scala
Value members
Concrete methods
Enriches, or otherwise modifies, previously parsed tokens.
Enriches, or otherwise modifies, previously parsed tokens.
- Definition Classes
- Source:
- NCDictionaryTokenEnricher.scala
Inherited methods
Called when the component starts. Default implementation is no-op.
Called when the component starts. Default implementation is no-op.
- Value parameters:
- cfg
Configuration of the model this component is associated with.
- Inherited from:
- NCLifecycle
- Source:
- NCLifecycle.scala
Called when the component stops. Default implementation is no-op.
Called when the component stops. Default implementation is no-op.
- Value parameters:
- cfg
Configuration of the model this component is associated with.
- Inherited from:
- NCLifecycle
- Source:
- NCLifecycle.scala