Dictionary-based "known-word" token enricher.
This enricher adds
dict boolean metadata property to the token
instance if word it represents is a known dictionary word, i.e. the configured dictionary contains this word's
lemma. The value
true of the metadata property indicates that this word's lemma is found in the dictionary,
false value indicates otherwise.
NOTE: this implementation requires
lemma string metadata property that contains
token's lemma. You can configure NCOpenNLPTokenEnricher for required language that provides this
metadata property before this enricher in your pipeline.
- Value parameters:
Relative path, absolute path, classpath resource or URL to the dictionary. The dictionary should have a simple plain text format with one lemma per line, empty lines are skipped, duplicates ignored, lines starting with # symbol will be treated as comments and ignored. Note that the search in the dictionary is implemented using words' lemma and case is ignored.