Ternary search tree

Ternary Search Tree (TST)
Ternary Search Tree (TST)
Type	tree
Operation
Time complexityinbig O notation
Operation	Average
Search	O(logn)
Insert	O(logn)
Delete	O(logn)
Space complexity

Incomputer science,aternary search treeis a type oftrie(sometimes called aprefix tree) where nodes are arranged in a manner similar to abinary search tree,but with up to three children rather than the binary tree's limit of two. Like other prefix trees, a ternary search tree can be used as anassociative mapstructure with the ability for incrementalstring search.However, ternary search trees are more space efficient compared to standard prefix trees, at the cost of speed. Common applications for ternary search trees includespell-checkingandauto-completion.

Description

Each node of a ternary search tree stores a singlecharacter,anobject(or apointerto an object depending on implementation), and pointers to its three children conventionally namedequal kid,lo kidandhi kid,which can also be referred respectively asmiddle (child),lower (child)andhigher (child).^[1]A node may also have a pointer to its parent node as well as an indicator as to whether or not the node marks the end of a word.^[2]Thelo kidpointer must point to a node whose character value isless than the current node.Thehi kidpointer must point to a node whose character isgreater than the current node.^[1]Theequal kidpoints to the next character in the word. The figure below shows a ternary search tree with the strings "cute", "cup", "at", "as", "he", "us" and "i":

c
/ | \
a u h
| | | \
t t e u
/ / | / |
s p e i s

As with other trie data structures, each node in a ternary search tree represents a prefix of the stored strings. All strings in the middle subtree of a node start with that prefix.

Operations

Insertion

Inserting a value into a ternary search can be defined recursively or iteratively much as lookups are defined. This recursive method is continually called on nodes of the tree given a key which gets progressively shorter by pruning characters off the front of the key. If this method reaches a node that has not been created, it creates the node and assigns it the character value of the first character in the key. Whether a new node is created or not, the method checks to see if the first character in the string is greater than or less than the character value in the node and makes a recursive call on the appropriate node as in the lookup operation. If, however, the key's first character is equal to the node's value then the insertion procedure is called on the equal kid and the key's first character is pruned away.^[1]Likebinary search treesand otherdata structures,ternary search trees can become degenerate depending on the order of the keys.^[3]^{[self-published source?]}Inserting keys in Alpha betical order is one way to attain the worst possible degenerate tree.^[1]Inserting the keys in random order often produces a well-balanced tree.^[1]

functioninsertion(stringkey)is
nodep:=root
//initialized to be equal in case root is null
nodelast:=root
intidx:=0
whilepisnotnulldo
//recurse on proper subtree
ifkey[idx]<p.splitcharthen
last:=p
p:=p.left
elseifkey[idx]>p.splitcharthen
last:=p
p:=p.right
else:
// key is already in our Tree
ifidx==length(key)then
return
//trim character from our key
idx:=idx+1
last:=p
p:=p.mid
p:=node()
//add p in as a child of the last non-null node (or root if root is null)
ifroot==nullthen
root:=p
elseiflast.splitchar<key[idx]then
last.right:=p
elseiflast.splitchar>key[idx]then
last.left:=p
else
last.mid:=p
p.splitchar:=key[idx]
idx:=idx+1
// Insert remainder of key
whileidx<length(key)do
p.mid:=node()
p.mid.splitchar:=key[idx]
idx+=1

Search

To look up a particular node or the data associated with a node, a string key is needed. A lookup procedure begins by checking the root node of the tree and determining which of the following conditions has occurred. If the first character of the string is less than the character in the root node, a recursive lookup can be called on the tree whose root is the lo kid of the current root. Similarly, if the first character is greater than the current node in the tree, then a recursive call can be made to the tree whose root is the hi kid of the current node.^[1] As a final case, if the first character of the string is equal to the character of the current node then the function returns the node if there are no more characters in the key. If there are more characters in the key then the first character of the key must be removed and a recursive call is made given the equal kid node and the modified key.^[1] This can also be written in a non-recursive way by using a pointer to the current node and a pointer to the current character of the key.^[1]

Pseudocode

functionsearch(stringquery)is
ifis_empty(query)then
returnfalse

nodep:=root
intidx:=0

whilepisnotnulldo
ifquery[idx]<p.splitcharthen
p:=p.left
elseifquery[idx]>p.splitcharthen
p:=p.right;
else
ifidx=length(query)then
returntrue
idx:=idx+1
p:=p.mid

returnfalse

Deletion

The delete operation consists of searching for a key string in the search tree and finding a node, called firstMid in the below pseudocode, such that the path from the middle child of firstMid to the end of the search path for the key string has no left or right children. This would represent a unique suffix in the ternary tree corresponding to the key string. If there is no such path, this means that the key string is either fully contained as a prefix of another string, or is not in the search tree. Many implementations make use of an end of string character to ensure only the latter case occurs. The path is then deleted from firstMid.mid to the end of the search path. In the case that firstMid is the root, the key string must have been the last string in the tree, and thus the root is set to null after the deletion.

functiondelete(stringkey)is
ifis_empty(key)then
return

nodep:=root
intidx:=0

nodefirstMid:=null
whilepisnotnulldo
ifkey[idx]<p.splitcharthen
firstMid:=null
p:=p.left
elseifkey[idx]>p.splitcharthen
firstMid:=null
p:=p.right
else
firstMid:=p
whilepisnotnullandkey[idx]==p.splitchardo
idx:=idx+1
p:=p.mid

iffirstMid==nullthen
return// No unique string suffix

// At this point, firstMid points to the node before the strings unique suffix occurs
nodeq:=firstMid.mid
nodep:=q
firstMid.mid:=null// disconnect suffix from tree
whileqisnotnulldo//walk down suffix path and delete nodes
p:=q
q:=q.mid
delete(p)// free memory associated with node p
iffirstMid==rootthen
delete(root)//delete the entire tree
root:=null

Running time

The running time of ternary search trees varies significantly with the input. Ternary search trees run best when given severalsimilar strings,especially when those stringsshare a common prefix.Alternatively, ternary search trees are effective when storing a large number of relativelyshort strings(such as words in adictionary).^[1] Running times for ternary search trees are similar tobinary search trees,in that they typically run in logarithmic time, but can run in linear time in the degenerate (worst) case. Further, the size of the strings must also be kept in mind when considering runtime. For example, in the search path for a string of lengthk,there will bektraversals down middle children in the tree, as well as a logarithmic number of traversals down left and right children in the tree. Thus, in a ternary search tree on a small number of very large strings the lengths of the strings can dominate the runtime.^[4]

Time complexities for ternary search tree operations:^[1]

	Average-case running time	Worst-case running time
Lookup	O(logn+k)	O(n+k)
Insertion	O(logn+k)	O(n+k)
Delete	O(logn+k)	O(n+k)

Comparison to other data structures

Tries

While being slower than otherprefix trees,ternary search trees can be better suited for larger data sets due to their space-efficiency.^[1]

Hash maps

Hashtablescan also be used in place of ternary search trees for mapping strings to values. However, hash maps also frequently use more memory than ternary search trees (but not as much as tries). Additionally, hash maps are typically slower at reporting a string that is not in the same data structure, because it must compare the entire string rather than just the first few characters. There is some evidence that shows ternary search trees running faster than hash maps.^[1]Additionally, hash maps do not allow for many of the uses of ternary search trees, such asnear-neighbor lookups.

DAFSAs (deterministic acyclic finite state automaton)

If storing dictionary words is all that is required (i.e., storage of information auxiliary to each word is not required), a minimal deterministic acyclic finite state automaton (DAFSA) would use less space than a trie or a ternary search tree. This is because a DAFSA can compress identical branches from the trie which correspond to the same suffixes (or parts) of different words being stored.

Uses

Ternary search trees can be used to solve many problems in which a large number of strings must be stored and retrieved in an arbitrary order. Some of the most common or most useful of these are below:

Anytime atriecould be used but a less memory-consuming structure is preferred.^[1]
A quick and space-saving data structure formappingstrings to other data.^[3]
To implementauto-completion.^[2]^{[self-published source?]}
As aspell check.^[5]
Near-neighbor searching(of which spell-checking is a special case).^[1]
As adatabaseespecially when inde xing by several non-key fields is desirable.^[5]
In place of ahash table.^[5]

References

^^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j ^k ^l ^m ⁿ"Ternary Search Trees".Dr. Dobb's.
^^a ^bOstrovsky, Igor."Efficient auto-complete with a ternary search tree".
^^a ^bWrobel, Lukasz."Ternary Search Tree".
^Bentley, Jon; Sedgewick, Bob."Ternary Search Tree".
^^a ^b ^cFlint, Wally (February 16, 2001)."Plant your data in a ternary search tree".JavaWorld.Retrieved2020-07-19.

External links

Ternary Search Treespage with papers (by Jon Bentley and Robert Sedgewick) about ternary search trees and algorithms for "sorting and searching strings"
Ternary Search Tries– a video by Robert Sedgewick
TST.java.htmlImplementation in Java of a TST by Robert Sedgewick and Kevin Wayne

[dobbs-1] ^^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j ^k ^l ^m ⁿ"Ternary Search Trees".Dr. Dobb's.

[ostrov-2] Ostrovsky, Igor."Efficient auto-complete with a ternary search tree".

[wrobel-3] Wrobel, Lukasz."Ternary Search Tree".

[sedgewick-4] Bentley, Jon; Sedgewick, Bob."Ternary Search Tree".

[wally-5] Flint, Wally (February 16, 2001)."Plant your data in a ternary search tree".JavaWorld.Retrieved2020-07-19.

[1]

[2]

[3]

[4]

[5]

v t e Tree data structures
Search trees (dynamic sets/associative arrays)	2–3 2–3–4 AA (a,b) AVL B B+ B* B^x (Optimal)Binary search Dancing HTree Interval Order statistic Palindrome (Left-leaning)Red–black Scapegoat Splay T Treap UB Weight-balanced
Heaps	Binary Binomial Brodal d-ary Fibonacci Leftist Pairing Skew binomial Skew van Emde Boas Weak
Tries	Ctrie C-trie (compressed ADT) Hash Radix Suffix Ternary search X-fast Y-fast
Spatialdata partitioning trees	Ball BK BSP Cartesian Hilbert R k-d(implicitk-d) M Metric MVP Octree PH Priority R Quad R R+ R* Segment VP X
Other trees	Cover Exponential Fenwick Finger Fractal tree index Fusion Hash calendar iDistance K-ary Left-child right-sibling Link/cut Log-structured merge Merkle PQ Range SPQR Top

v t e Strings
String metric	Approximate string matching Bitap algorithm Damerau–Levenshtein distance Edit distance Gestalt pattern matching Hamming distance Jaro–Winkler distance Lee distance Levenshtein automaton Levenshtein distance Wagner–Fischer algorithm
String-searching algorithm	Apostolico–Giancarlo algorithm Boyer–Moore string-search algorithm Boyer–Moore–Horspool algorithm Knuth–Morris–Pratt algorithm Rabin–Karp algorithm Raita algorithm Trigram search Two-way string-matching algorithm Zhu–Takaoka string matching algorithm
Multiple string searching	Aho–Corasick Commentz-Walter algorithm
Regular expression	Comparison of regular-expression engines Regular grammar Thompson's construction Nondeterministic finite automaton
Sequence alignment	BLAST Hirschberg's algorithm Needleman–Wunsch algorithm Smith–Waterman algorithm
Data structure	DAFSA Suffix array Suffix automaton Suffix tree Generalized suffix tree Rope Ternary search tree Trie
Other	Parsing Pattern matching Compressed pattern matching Longest common subsequence Longest common substring Sequential pattern mining Sorting String rewriting systems String operations