Author |
Boris Capitanu |
Creation date |
06/06/2011 |
Firing policy |
any |
Package |
org.seasr.meandre.components.transform.text |
DESCRIPTION
Performs spell checking on the input and optionally replaces misspelled words with the top ranked suggestion based on the supplied token counts. The component also produces a list of the misspellings in the document.
INPUTS
Name |
Description |
Example |
|---|---|---|
dictionary |
The wordlist to be used as dictionary or the location of the wordlist file TYPE: java.net.URI TYPE: java.net.URL TYPE: java.lang.String TYPE: org.seasr.datatypes.BasicDataTypes.Strings TYPE: byte[] TYPE: org.seasr.datatypes.BasicDataTypes.Bytes TYPE: java.lang.Object |
|
text |
The text, tokens, or token counts that needs to be spell checked TYPE: java.lang.String TYPE: org.seasr.datatypes.BasicDataTypes.Strings TYPE: byte[] TYPE: org.seasr.datatypes.BasicDataTypes.Bytes TYPE: java.lang.Object |
|
token_counts |
The token counts used for figuring out the most probable replacement for a misspelled word |
|
transformations |
The transformations that should be tried on misspelled words before taking the spell checker's suggestions |
|
OUTPUTS
Name |
Description |
Example |
|---|---|---|
replacement_rules |
The replacement rules for misspelled words in the following format: correctedWord = {badWord1, badWord2, ... }; ... TYPE: org.seasr.datatypes.BasicDataTypes.Strings |
|
error |
This port is used to output any unhandled errors encountered during the execution of this component |
|
replacements |
The replacements suggested for misspelled words TYPE: org.seasr.datatypes.BasicDataTypes.StringsMap |
|
text |
The original text with corrections applied if the 'do_correction' property was set to 'true' TYPE: org.seasr.datatypes.BasicDataTypes.Strings |
|
PROPERTIES
Name |
Description |
Default value |
|---|---|---|
_debug_level |
Controls the verbosity of debug messages printed by the component during execution. Possible values are: off, severe, warning, info, config, fine, finer, finest, all Append ',mirror' to any of the values above to mirror that output to the server logs. |
info |
levenshtein_distance |
The Levenshtein distance is a metric for measuring the amount of difference between two sequences;The value of this property should expressed as a percentage that will depend on the length of the misspelled word |
0.33 |
enable_transforms_only |
True to only use the transformations to make suggestions for correctly spelled words; False to allow the spell checker to also make suggestions. |
false |
do_correction |
True to correct misspelled words, False otherwise |
true |
enable_levenshtein |
Use the Levinstein algorithm to filter the list of suggestions considered |
true |
_ignore_errors |
Set to 'true' to ignore all unhandled exceptions and prevent the flow from being terminated. Setting this property to 'false' will result in the flow being terminated in the event an unhandled exception is thrown during the execution of this component |
false |
Add Comment