Skip to end of metadata
Go to start of metadata

Author

Lily Dong

Creation date

06/06/2011

Firing policy

all

Package

org.seasr.meandre.components.tools.text.normalize.porter

DESCRIPTION

Overview:
This component transforms terms into their word stems. In this way, different forms of the same word (plurals etc...) will be recognized as the same term.The algorithm used is the Porter stemming method.

References:
See: http://www.tartarus.org/~martin/PorterStemmer/

Data Type Restrictions:
The input document must have been tokenized.

Data Handling:
This component will modify (as described above) the document object that is input.

Scalability:
This compnent makes one pass over the token list resulting in linear time complexity per the number of tokens. Memory usage is proportional to the number tokens.

Trigger Criteria:
All.

INPUTS

Name

Description

Example

object
The tokens or tokenized_sentences to be stemmed
TYPE: org.seasr.datatypes.BasicDataTypes.Strings
TYPE: org.seasr.datatypes.BasicDataTypes.StringsMap

 

OUTPUTS

Name

Description

Example

tokenized_sentences
The stemmed tokenized sentences
TYPE: org.seasr.datatypes.BasicDataTypes.StringsMap

 

error
This port is used to output any unhandled errors encountered during the execution of this component

 

tokens
The stemmed tokens
TYPE: org.seasr.datatypes.BasicDataTypes.Strings

 

PROPERTIES

Name

Description

Default value

_debug_level
Controls the verbosity of debug messages printed by the component during execution.
Possible values are: off, severe, warning, info, config, fine, finer, finest, all
Append ',mirror' to any of the values above to mirror that output to the server logs.
info
_ignore_errors
Set to 'true' to ignore all unhandled exceptions and prevent the flow from being terminated. Setting this property to 'false' will result in the flow being terminated in the event an unhandled exception is thrown during the execution of this component
false
Write a comment…