Skip to end of metadata
Go to start of metadata

Author

Boris Capitanu

Creation date

06/06/2011

Firing policy

any

Package

org.seasr.meandre.components.transform.text

DESCRIPTION

Performs spell checking on the input and optionally replaces misspelled words with the top ranked suggestion based on the supplied token counts. The component also produces a list of the misspellings in the document.

INPUTS

Name

Description

Example

dictionary
The wordlist to be used as dictionary or the location of the wordlist file
TYPE: java.net.URI
TYPE: java.net.URL
TYPE: java.lang.String
TYPE: org.seasr.datatypes.BasicDataTypes.Strings
TYPE: byte[]
TYPE: org.seasr.datatypes.BasicDataTypes.Bytes
TYPE: java.lang.Object

 

text
The text, tokens, or token counts that needs to be spell checked
TYPE: java.lang.String
TYPE: org.seasr.datatypes.BasicDataTypes.Strings
TYPE: byte[]
TYPE: org.seasr.datatypes.BasicDataTypes.Bytes
TYPE: java.lang.Object

 

token_counts
The token counts used for figuring out the most probable replacement for a misspelled word

 

transformations
The transformations that should be tried on misspelled words before taking the spell checker's suggestions

 

OUTPUTS

Name

Description

Example

replacement_rules
The replacement rules for misspelled words in the following format: correctedWord = {badWord1, badWord2, ... }; ...
TYPE: org.seasr.datatypes.BasicDataTypes.Strings

 

error
This port is used to output any unhandled errors encountered during the execution of this component

 

replacements
The replacements suggested for misspelled words
TYPE: org.seasr.datatypes.BasicDataTypes.StringsMap

 

text
The original text with corrections applied if the 'do_correction' property was set to 'true'
TYPE: org.seasr.datatypes.BasicDataTypes.Strings

 

PROPERTIES

Name

Description

Default value

_debug_level
Controls the verbosity of debug messages printed by the component during execution.
Possible values are: off, severe, warning, info, config, fine, finer, finest, all
Append ',mirror' to any of the values above to mirror that output to the server logs.
info
levenshtein_distance
The Levenshtein distance is a metric for measuring the amount of difference between two sequences;The value of this property should expressed as a percentage that will depend on the length of the misspelled word
0.33
enable_transforms_only
True to only use the transformations to make suggestions for correctly spelled words; False to allow the spell checker to also make suggestions.
false
do_correction
True to correct misspelled words, False otherwise
true
enable_levenshtein
Use the Levinstein algorithm to filter the list of suggestions considered
true
_ignore_errors
Set to 'true' to ignore all unhandled exceptions and prevent the flow from being terminated. Setting this property to 'false' will result in the flow being terminated in the event an unhandled exception is thrown during the execution of this component
false
Write a comment…