NCEnStopWordsTokenEnricher

class NCEnStopWordsTokenEnricher(addSet: Set[String], exclSet: Set[String], stemmer: NCStemmer) extends NCTokenEnricher with LazyLogging

Stopword token enricher for English (EN) language. Stopwords are the words which are filtered out (i.e. stopped) before processing of natural language text because they are insignificant.

This enricher adds stopword boolean metadata property to the token instance if the word it represents is an English stopword. The value true of this metadata property indicates that this word is detected as a stopword, false value indicates otherwise. This implementation works off the algorithm that uses an internal list of English stopwords as well as a procedural logic to determine the stopword status of the token. This algorithm should work fine for most of the general uses cases. User can also add additional stopwords or exceptions for the existing ones using corresponding parameters in NCEnStopWordsTokenEnricher constructor.

More information about stopwords can be found at https://en.wikipedia.org/wiki/Stop_word.

NOTE: this implementation requires lemma and pos string metadata properties that contain token's lemma and part of speech accordingly. You can configure NCOpenNLPTokenEnricher with the model for English language that would provide these metadata properties before this enricher in your pipeline.

Value parameters:
addSet

User defined collection of additional stopwords. These words will be stemmatized by the given stemmer before attempting to find a match. Default value is an empty set.

exclSet

User defined collection of exceptions, i.e. the words which should not be marked as stopwords during processing. These words will be stemmatized by the given stemmer before attempting to find a match. Default value is an empty set.

stemmer

English stemmer implementation. Default value is the instance of NCEnStemmer.

Source:
NCEnStopWordsTokenEnricher.scala
trait LazyLogging
class Object
trait Matchable
class Any

Value members

Concrete methods

override def enrich(req: NCRequest, cfg: NCModelConfig, toks: List[NCToken]): Unit

Enriches, or otherwise modifies, previously parsed tokens.

Enriches, or otherwise modifies, previously parsed tokens.

Definition Classes
Source:
NCEnStopWordsTokenEnricher.scala

Inherited methods

Called when the component starts. Default implementation is no-op.

Called when the component starts. Default implementation is no-op.

Value parameters:
cfg

Configuration of the model this component is associated with.

Inherited from:
NCLifecycle
Source:
NCLifecycle.scala

Called when the component stops. Default implementation is no-op.

Called when the component stops. Default implementation is no-op.

Value parameters:
cfg

Configuration of the model this component is associated with.

Inherited from:
NCLifecycle
Source:
NCLifecycle.scala

Inherited fields

lazy protected val logger: Logger
Inherited from:
LazyLogging
Source:
Logging.scala