Author |
Boris Capitanu |
Creation date |
06/06/2011 |
Firing policy |
all |
Package |
org.seasr.meandre.components.analytics.mallet |
DESCRIPTION
This component perform topic analysis in the style of LDA and its variants using Mallet
INPUTS
Name |
Description |
Example |
|---|---|---|
mallet_instance_list |
The list of machine learning instances TYPE: cc.mallet.types.InstanceList |
|
OUTPUTS
Name |
Description |
Example |
|---|---|---|
topic_top_words_xml |
An XML document containing the topics, and for each topic the set of top words (with weights) TYPE: org.w3c.dom.Document |
|
error |
This port is used to output any unhandled errors encountered during the execution of this component |
|
doc_topics_xml |
An XML document containing the processed documents, and for each processed document the set of topics and topic probabilities TYPE: org.w3c.dom.Document |
|
PROPERTIES
Name |
Description |
Default value |
|---|---|---|
num_threads |
The number of threads for parallel training |
1 |
optimize_interval |
The number of iterations between re-estimating dirichlet hyperparameters |
0 |
use_symmetric_alpha |
Only optimize the concentration parameter of the prior over document-topic distributions. This may reduce the number of very small, poorly estimated topics, but may disperse common words over several topics. |
false |
alpha |
Alpha parameter: smoothing over topic distribution |
50.0 |
optimize_burnin |
The number of iterations to run before first estimating dirichlet hyperparameters |
200 |
_ignore_errors |
Set to 'true' to ignore all unhandled exceptions and prevent the flow from being terminated. Setting this property to 'false' will result in the flow being terminated in the event an unhandled exception is thrown during the execution of this component |
false |
random_seed |
The random seed for the Gibbs sampler. Default is 0, which will use the clock. |
0 |
num_top_words |
The number of most probable words to return for each topic after model estimation; use -1 to return all of them |
20 |
beta |
Beta parameter: smoothing over unigram distribution |
0.01 |
num_topics |
The number of topics to fit |
10 |
num_iterations |
The number of iterations of Gibbs sampling |
1000 |
_debug_level |
Controls the verbosity of debug messages printed by the component during execution. Possible values are: off, severe, warning, info, config, fine, finer, finest, all Append ',mirror' to any of the values above to mirror that output to the server logs. |
info |
Add Comment