Mining Very Long Sequences in Large Databases with PLWAPLong

Document Type


Publication Date


Publication Title

Proceedings of the 2009 International Database Engineering & Applications Symposium

First Page


Last Page



Position Coded Pre-order Linked Web Access Pattern (PLWAP) mining algorithm is one of the existing efficient web sequential pattern mining algorithms, which stores the frequent sequences of the entire sequential database in a compressed tree form with position coded nodes. However, for very long sequences exceeding thirty two nodes, the number of bits an integer position code can hold, the PLWAP algorithm's performance begins to degrade because it employs linked lists to store conjunctions of long position codes and the linked list traversals slow down the algorithm both during tree construction and mining. PLWAP algorithm also uses each and every node in the frequent 1-item event queue to test for that event inclusion in the suffix tree root set during mining. This paper proposes (1) using a different position code numbering scheme where each node is assigned two numeric codes (startPosition, endPosition) instead of one, (2) using pre-knowledge of "Last Descendant" of each tree branch to lower the cost of creating the suffix tree root sets during mining. Experiments show that the proposed new scheme, the PLWAPLong outperforms the PLWAP for long sequences and large databases as well as regular databases.