FP Growth

Skip to end of metadata
Go to start of metadata
Author Boris Capitanu
Creation date 06/01/2011
Firing policy all
Package org.seasr.meandre.components.discovery.ruleassociation.fpgrowth

DESCRIPTION

This component implements the FPGrowth algorithm to generate frequent itemsets consisting of items that occur in a sufficient number of examples to satisfy the minimum support criteria.

Detailed Description: This component takes an Item Sets object that has been generated by a Table To Item Sets component and uses the FPGrowth algorithm to find the combinations of items that satisfy a minimum support criteria. An item is an [attribute,value] pair that occurs in the set of examples being mined. The user controls the support criteria via the Minimum Support % property that specifies the percentage of all examples that must contain a given combination of items before that combination is included in the generated output. Each combination of items that satisfies the Minimum Support % is called a Frequent Itemset.

The user can restrict the maximum number of items included in any frequent itemset with the Maximum Items Per Rule property. The generation of sets with large number of items can be computationally expensive, so setting this property in conjunction with the Minimum Support % property helps keep the component runtime reasonable.

In a typical flow the Frequent Item Sets output port from this component is connected to a Compute Confidence component which forms association rules that satisfy a minimum confidence value.

References: For more information on the FPGrowth frequent pattern mining algorithm, see "Mining Frequent Patterns without Candidate Generation"Jiawei Han, Jian Pei, and Yiwen Yin, 2000.

Limitations: The FPGrowth and Compute Confidence components currently build rules with a single item in the consequent.

Data Type Restrictions: While this component can operate on attributes of any datatype, in practice it is usually infeasible to use it with continuous-valued attributes. The component considers each [attribute,value] pair that occurs in the examples individually when building the frequent itemsets. Continuous attributes (and categorical attributes with a large number of values) are less likely to meet the Minimum Support requirements and can result in unacceptably long execution time. Typically Choose Attributes and Binning components should appear in the itinerary prior to the Table to Item Sets component, whose output produces the Item Sets object used as input by this component. The Choosing/Binning components can reduce the number of distinct [attribute,value] pairs that must be considered in this component to a reasonable number.

Data Handling: This component does not modify the input Item Sets in any way.

Scalability: This component creates an array of integers to hold the indices of the items in each frequent itemset. The component may be computationally intensive, and scales with the number of Item Sets entries to search. The user can limit the size of the frequent itemsets although this will have little effect on performance for this algorithm. Choosing/Binning components can be included in the itinerary prior to this components to reduce the number of Item Sets entries.

INPUTS

Name Description Example
item_sets
An object produced by a Table To Item Sets component containing items that will appear in the frequent itemsets.
 

OUTPUTS

Name Description Example
freq_item_sets
A representation of the frequent itemsets found by the component. This representation encodes the items used in the sets and the number of examples in which each set occurs. This output is typically connected to a Compute Confidence component.
 
error
This port is used to output any unhandled errors encountered during the execution of this component
 

PROPERTIES

Name Description Default value
min_support
The percent of all examples that must contain a given set of items before an association rule will be formed containing those items. This value must be greater than 0 and less than or equal to 100.
20.0
verbose
If this property is true, the component will report progress information to the console.
True
_debug_level
Controls the verbosity of debug messages printed by the component during execution.
Possible values are: off, severe, warning, info, config, fine, finer, finest, all
Append ',mirror' to any of the values above to mirror that output to the server logs.
info
max_items
The maximum number of items to include in any rule. Does not impact performance for this algorithm as it does for Apriori.This value cannot be less than 2.
6
_ignore_errors
Set to 'true' to ignore all unhandled exceptions and prevent the flow from being terminated. Setting this property to 'false' will result in the flow being terminated in the event an unhandled exception is thrown during the execution of this component
false
Labels:
discovery discovery Delete
rule rule Delete
association association Delete
frequent frequent Delete
pattern pattern Delete
mining mining Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.