NCEnStopWordsTokenEnricher
Stopword token enricher for English (EN) language. Stopwords are the words which are filtered out (i.e. stopped) before processing of natural language text because they are insignificant.
This enricher adds stopword
boolean metadata property to the token
instance if the word it represents is an English stopword. The value true
of this metadata property indicates that
this word is detected as a stopword, false
value indicates otherwise. This implementation works off the
algorithm that uses an internal list of English stopwords as well as a procedural logic to determine the stopword
status of the token. This algorithm should work fine for most of the general uses cases. User can also add
additional stopwords or exceptions for the existing ones using corresponding parameters in NCEnStopWordsTokenEnricher
constructor.
More information about stopwords can be found at https://en.wikipedia.org/wiki/Stop_word.
NOTE: this implementation requires lemma
and pos
string metadata properties that
contain token's lemma and part of speech accordingly. You can configure NCOpenNLPTokenEnricher with the model
for English language that would provide these metadata properties before this enricher in your pipeline.
- Value parameters:
- addSet
User defined collection of additional stopwords. These words will be stemmatized by the given
stemmer
before attempting to find a match. Default value is an empty set.- exclSet
User defined collection of exceptions, i.e. the words which should not be marked as stopwords during processing. These words will be stemmatized by the given
stemmer
before attempting to find a match. Default value is an empty set.- stemmer
English stemmer implementation. Default value is the instance of NCEnStemmer.
- Source:
- NCEnStopWordsTokenEnricher.scala
Value members
Concrete methods
Enriches, or otherwise modifies, previously parsed tokens.
Enriches, or otherwise modifies, previously parsed tokens.
- Definition Classes
- Source:
- NCEnStopWordsTokenEnricher.scala
Inherited methods
Called when the component starts. Default implementation is no-op.
Called when the component starts. Default implementation is no-op.
- Value parameters:
- cfg
Configuration of the model this component is associated with.
- Inherited from:
- NCLifecycle
- Source:
- NCLifecycle.scala
Called when the component stops. Default implementation is no-op.
Called when the component stops. Default implementation is no-op.
- Value parameters:
- cfg
Configuration of the model this component is associated with.
- Inherited from:
- NCLifecycle
- Source:
- NCLifecycle.scala