| Author | Boris Capitanu |
| Creation date | 06/01/2011 |
| Firing policy | all |
| Package | org.seasr.meandre.components.discovery.ruleassociation.fpgrowth |
DESCRIPTION
This component implements the FPGrowth algorithm to generate frequent itemsets consisting of items that occur in a sufficient number of examples to satisfy the minimum support criteria.
Detailed Description: This component takes an Item Sets object that has been generated by a Table To Item Sets component and uses the FPGrowth algorithm to find the combinations of items that satisfy a minimum support criteria. An item is an [attribute,value] pair that occurs in the set of examples being mined. The user controls the support criteria via the Minimum Support % property that specifies the percentage of all examples that must contain a given combination of items before that combination is included in the generated output. Each combination of items that satisfies the Minimum Support % is called a Frequent Itemset.
The user can restrict the maximum number of items included in any frequent itemset with the Maximum Items Per Rule property. The generation of sets with large number of items can be computationally expensive, so setting this property in conjunction with the Minimum Support % property helps keep the component runtime reasonable.
In a typical flow the Frequent Item Sets output port from this component is connected to a Compute Confidence component which forms association rules that satisfy a minimum confidence value.
References: For more information on the FPGrowth frequent pattern mining algorithm, see "Mining Frequent Patterns without Candidate Generation"Jiawei Han, Jian Pei, and Yiwen Yin, 2000.
Limitations: The FPGrowth and Compute Confidence components currently build rules with a single item in the consequent.
Data Type Restrictions: While this component can operate on attributes of any datatype, in practice it is usually infeasible to use it with continuous-valued attributes. The component considers each [attribute,value] pair that occurs in the examples individually when building the frequent itemsets. Continuous attributes (and categorical attributes with a large number of values) are less likely to meet the Minimum Support requirements and can result in unacceptably long execution time. Typically Choose Attributes and Binning components should appear in the itinerary prior to the Table to Item Sets component, whose output produces the Item Sets object used as input by this component. The Choosing/Binning components can reduce the number of distinct [attribute,value] pairs that must be considered in this component to a reasonable number.
Data Handling: This component does not modify the input Item Sets in any way.
Scalability: This component creates an array of integers to hold the indices of the items in each frequent itemset. The component may be computationally intensive, and scales with the number of Item Sets entries to search. The user can limit the size of the frequent itemsets although this will have little effect on performance for this algorithm. Choosing/Binning components can be included in the itinerary prior to this components to reduce the number of Item Sets entries.
INPUTS
| Name | Description | Example |
|---|---|---|
item_sets |
An object produced by a Table To Item Sets component containing items that will appear in the frequent itemsets. |
OUTPUTS
| Name | Description | Example |
|---|---|---|
freq_item_sets |
A representation of the frequent itemsets found by the component. This representation encodes the items used in the sets and the number of examples in which each set occurs. This output is typically connected to a Compute Confidence component. |
|
error |
This port is used to output any unhandled errors encountered during the execution of this component |
PROPERTIES
| Name | Description | Default value |
|---|---|---|
min_support |
The percent of all examples that must contain a given set of items before an association rule will be formed containing those items. This value must be greater than 0 and less than or equal to 100. |
20.0 |
verbose |
If this property is true, the component will report progress information to the console. |
True |
_debug_level |
Controls the verbosity of debug messages printed by the component during execution. Possible values are: off, severe, warning, info, config, fine, finer, finest, all Append ',mirror' to any of the values above to mirror that output to the server logs. |
info |
max_items |
The maximum number of items to include in any rule. Does not impact performance for this algorithm as it does for Apriori.This value cannot be less than 2. |
6 |
_ignore_errors |
Set to 'true' to ignore all unhandled exceptions and prevent the flow from being terminated. Setting this property to 'false' will result in the flow being terminated in the event an unhandled exception is thrown during the execution of this component |
false |