Skip to end of metadata
Go to start of metadata

Author

Boris Capitanu

Creation date

06/06/2011

Firing policy

all

Package

org.seasr.meandre.components.analytics.mallet

DESCRIPTION

This component perform topic analysis in the style of LDA and its variants using Mallet

INPUTS

Name

Description

Example

mallet_instance_list
The list of machine learning instances
TYPE: cc.mallet.types.InstanceList

 

OUTPUTS

Name

Description

Example

topic_top_words_xml
An XML document containing the topics, and for each topic the set of top words (with weights)
TYPE: org.w3c.dom.Document

 

error
This port is used to output any unhandled errors encountered during the execution of this component

 

doc_topics_xml
An XML document containing the processed documents, and for each processed document the set of topics and topic probabilities
TYPE: org.w3c.dom.Document

 

PROPERTIES

Name

Description

Default value

num_threads
The number of threads for parallel training
1
optimize_interval
The number of iterations between re-estimating dirichlet hyperparameters
0
use_symmetric_alpha
Only optimize the concentration parameter of the prior over document-topic distributions. This may reduce the number of very small, poorly estimated topics, but may disperse common words over several topics.
false
alpha
Alpha parameter: smoothing over topic distribution
50.0
optimize_burnin
The number of iterations to run before first estimating dirichlet hyperparameters
200
_ignore_errors
Set to 'true' to ignore all unhandled exceptions and prevent the flow from being terminated. Setting this property to 'false' will result in the flow being terminated in the event an unhandled exception is thrown during the execution of this component
false
random_seed
The random seed for the Gibbs sampler. Default is 0, which will use the clock.
0
num_top_words
The number of most probable words to return for each topic after model estimation; use -1 to return all of them
20
beta
Beta parameter: smoothing over unigram distribution
0.01
num_topics
The number of topics to fit
10
num_iterations
The number of iterations of Gibbs sampling
1000
_debug_level
Controls the verbosity of debug messages printed by the component during execution.
Possible values are: off, severe, warning, info, config, fine, finer, finest, all
Append ',mirror' to any of the values above to mirror that output to the server logs.
info
Write a comment…