Author |
Lily Dong |
Creation date |
06/06/2011 |
Firing policy |
all |
Package |
org.seasr.meandre.components.tools.text.normalize.porter |
DESCRIPTION
Overview:
This component transforms terms into their word stems. In this way, different forms of the same word (plurals etc...) will be recognized as the same term.The algorithm used is the Porter stemming method.
References:
See: http://www.tartarus.org/~martin/PorterStemmer/
Data Type Restrictions:
The input document must have been tokenized.
Data Handling:
This component will modify (as described above) the document object that is input.
Scalability:
This compnent makes one pass over the token list resulting in linear time complexity per the number of tokens. Memory usage is proportional to the number tokens.
Trigger Criteria:
All.
INPUTS
Name |
Description |
Example |
|---|---|---|
object |
The tokens or tokenized_sentences to be stemmed TYPE: org.seasr.datatypes.BasicDataTypes.Strings TYPE: org.seasr.datatypes.BasicDataTypes.StringsMap |
|
OUTPUTS
Name |
Description |
Example |
|---|---|---|
tokenized_sentences |
The stemmed tokenized sentences TYPE: org.seasr.datatypes.BasicDataTypes.StringsMap |
|
error |
This port is used to output any unhandled errors encountered during the execution of this component |
|
tokens |
The stemmed tokens TYPE: org.seasr.datatypes.BasicDataTypes.Strings |
|
PROPERTIES
Name |
Description |
Default value |
|---|---|---|
_debug_level |
Controls the verbosity of debug messages printed by the component during execution. Possible values are: off, severe, warning, info, config, fine, finer, finest, all Append ',mirror' to any of the values above to mirror that output to the server logs. |
info |
_ignore_errors |
Set to 'true' to ignore all unhandled exceptions and prevent the flow from being terminated. Setting this property to 'false' will result in the flow being terminated in the event an unhandled exception is thrown during the execution of this component |
false |
Add Comment