Author |
Boris Capitanu |
Creation date |
06/06/2011 |
Firing policy |
all |
Package |
org.seasr.meandre.components.transform.text |
DESCRIPTION
The component breaks a document into chunks (segments) for further processing. It transforms the document of tokenized sentences into segments of size that approximates the number of tuples specified in the property. Segments always end at sentence boundaries.
INPUTS
Name |
Description |
Example |
|---|---|---|
tokenized_sentences |
The tokenized sentences to be segmented. TYPE: org.seasr.datatypes.BasicDataTypes.StringsMap |
|
OUTPUTS
Name |
Description |
Example |
|---|---|---|
tokenized_sentences |
The segments. TYPE: org.seasr.datatypes.BasicDataTypes.StringsMap |
|
error |
This port is used to output any unhandled errors encountered during the execution of this component |
|
PROPERTIES
Name |
Description |
Default value |
|---|---|---|
_debug_level |
Controls the verbosity of debug messages printed by the component during execution. Possible values are: off, severe, warning, info, config, fine, finer, finest, all Append ',mirror' to any of the values above to mirror that output to the server logs. |
info |
segment_size |
The size of the segments to be produced (can be specified as a percentage or an integer). Example (percentage): 0.10 - indicates that segments should contain approximately 10% of the tokens; Example (integer): 200 - indicates the approximate number of tokens to put in each segment. Segments always end at sentence boundaries.) |
200 |
_stream_id |
Defines the stream id used to identify a particular stream of data |
|
wrap_stream |
Should the output be wrapped as a stream? |
true |
_ignore_errors |
Set to 'true' to ignore all unhandled exceptions and prevent the flow from being terminated. Setting this property to 'false' will result in the flow being terminated in the event an unhandled exception is thrown during the execution of this component |
false |
Add Comment