| Author | Lily Dong |
| Creation date | 06/01/2011 |
| Firing policy | all |
| Package | org.seasr.meandre.components.transform.table |
DESCRIPTION
This module generates a training table and a testing table from the original table. Detailed Description: This module presents the user with property setting which allows them to specify the percentages of the original table examples that should be used to build train and test tables. The user can specify whether the train and test examples are selected at random or sequentially from the beginning (train data) and the end (test data) of the original examples. If the examples are selected randomly, the user can specify the seed used by the random number generator. If the train and test percentages sum to more than 100 percent, some examples will appear in both the train and test tables. The train and test percentages can be designated through the property setting. Data Type Restrictions: Although this module works with tables containing any type of data, many supervised learning algorithms will work only on doubles. If one of these algorithms is to be used, the conversion to floating point data should take place prior to this module. Data Handling: This module does not change the original data. It creates an instance of an example table that manages the data data differently. Scalability: This module should scale linearly with the number of rows in the table. The module needs to be able to allocate arrays of integers to hold the indices of the test and train examples.
INPUTS
| Name | Description | Example |
|---|---|---|
originalTable |
Read org.seasr.datatypes.datamining.table.Table containing the data that will be split into training and testing examples as input. |
OUTPUTS
| Name | Description | Example |
|---|---|---|
trainTable |
Output org.seasr.datatypes.datamining.table.Table containing the training data |
|
testTable |
Output org.seasr.datatypes.datamining.table.Table containing the test data |
|
error |
This port is used to output any unhandled errors encountered during the execution of this component |
PROPERTIES
| Name | Description | Default value |
|---|---|---|
samplingMethod |
The method to use when sampling the original examples. The choices are: Random: Train and test examples are drawn randomly from the original table. Sequential: Training examples are taken sequentially from the beginning of the original table and testing examples are taken sequentially from the end of the original table. |
1 |
verbose |
control whether debugging information is output to the console |
true |
seed |
Seed for random sampling.Ignored if Random Sampling is not used. |
123 |
_debug_level |
Controls the verbosity of debug messages printed by the component during execution. Possible values are: off, severe, warning, info, config, fine, finer, finest, all Append ',mirror' to any of the values above to mirror that output to the server logs. |
info |
trainPercent |
The percentage of the data to be used for training the model. |
50 |
testPercent |
The percentage of the data to be used for testing the model. |
50 |
_ignore_errors |
Set to 'true' to ignore all unhandled exceptions and prevent the flow from being terminated. Setting this property to 'false' will result in the flow being terminated in the event an unhandled exception is thrown during the execution of this component |
false |
Add Comment