Author |
Boris Capitanu |
Creation date |
06/06/2011 |
Firing policy |
any |
Package |
org.seasr.meandre.components.transform.filters |
DESCRIPTION
This component filters the tokens of the input based on the list of tokens provided. The component has 3 inputs for the type of data to be filtered (tokens, token counts or tokenized sentencesand one input for the list of tokens to filter. It will output the same data type it received. If new tokens to filter are provide they either replace the current ones or add them to the black list. The component waits for a black list and then begins processing the data it receives. The component outputs the filtered tokens, token counts or tokenized sentences. The comparison of blacklisted tokens to the data will ignore case by default. Set ignore_case=false to work in case sensitive mode.
INPUTS
Name |
Description |
Example |
|---|---|---|
tokens |
The sequence of tokens to filter. TYPE: java.lang.String TYPE: org.seasr.datatypes.BasicDataTypes.Strings TYPE: byte[] TYPE: org.seasr.datatypes.BasicDataTypes.Bytes TYPE: java.lang.Object |
|
tokenized_sentences |
The tokenized sentences to filter. TYPE: org.seasr.datatypes.BasicDataTypes.StringsMap |
|
tokens_blacklist |
The list of tokens defining the blacklist. TYPE: java.lang.String TYPE: org.seasr.datatypes.BasicDataTypes.Strings TYPE: byte[] TYPE: org.seasr.datatypes.BasicDataTypes.Bytes TYPE: java.lang.Object |
|
token_counts |
The token counts to filter. TYPE: org.seasr.datatypes.BasicDataTypes.IntegersMap TYPE: java.util.Map |
|
OUTPUTS
Name |
Description |
Example |
|---|---|---|
token_counts |
The filtered token counts. TYPE: org.seasr.datatypes.BasicDataTypes.IntegersMap |
|
tokens |
The filtered tokens. TYPE: org.seasr.datatypes.BasicDataTypes.Strings |
|
error |
This port is used to output any unhandled errors encountered during the execution of this component |
|
tokenized_sentences |
The filtered tokenized sentences. TYPE: org.seasr.datatypes.BasicDataTypes.StringsMap |
|
PROPERTIES
Name |
Description |
Default value |
|---|---|---|
_debug_level |
Controls the verbosity of debug messages printed by the component during execution. Possible values are: off, severe, warning, info, config, fine, finer, finest, all Append ',mirror' to any of the values above to mirror that output to the server logs. |
info |
ignore_case |
If set to true then the comparison between the blacklisted tokens and data will ignore case, otherwise case sensitivity will be respected. |
true |
replace |
If set to true then blacklisted tokens get replaced when a new set is provided. When set to false, tokens keep being appended to the blacklist. |
true |
_ignore_errors |
Set to 'true' to ignore all unhandled exceptions and prevent the flow from being terminated. Setting this property to 'false' will result in the flow being terminated in the event an unhandled exception is thrown during the execution of this component |
false |
Add Comment