“Try to be a rainbow in someone’s cloud” – Maya Angelou
This post is a part of IIT-M CSE “Concept-Sunday” series .
src : https://stackoverflow.com/questions/5434813/longest-prefix-matches-for-urls; indianagrace.org
Well we are back to data-structures and we will be discussing about one of the most simple and useful data structure Trie. Primarily Trie is used for searching strings and is a efficient information retrieval data structure; however despite of its usefulness it has failed to find its place in standard library(C++ at least).
Trie – Definition :
Wikipedia says, a trie, also called a digital tree (radix tree or prefix tree) is a kind of search tree — an ordered tree data structure that is used to store a dynamic set or associative array where the keys are usually strings. Simply said trie is a data structure that stores strings by decomposing them into characters. The characters form a path through a tree until the leaf node of the tree which represents a string uniquely.
Trie – Why:
As a direct application, consider a basic spell checker; for each word in text, spell checker goes through its dictionary to find for existence of corresponding word and flags an error if the word is not found. Tries are very efficient for such existence queries on strings and were designed for this very purpose.
Now if the only motive of trie is to efficiently store strings why cant we simply go for standard binary trees or balanced binary trees. To answer this question lets look at the complexity of search in both the scenarios. If maximum length of the string is W and N is the total number of strings then a height balanced tree would take O (W log(N)) to search for string. Whereas a trie achieves the same in at a complexity of O ( W ) but having said that this speedup comes with a cost – cost in terms of storage(each node in trie consumes considerable space as compared to normal tree node).
From an application like spell checker we obviously expect speed, so the natural choice is trie as we choose time over time-space trade off.
Trie – Search :
From implementation perspective trie consists of nodes and edges. Each node has array(we can use linked list representation instead of array representation to save on space but that will add to search time) of 26 pointers which point to the child nodes so each node can have a maximum of 26 children. These 26 pointers correspond to 26 letters of the alphabet and can be directly indexed for each child character. Therefore a search through a trie is a matter of moving along the tree(starting from root) based on the sequence of characters in the search string. In the above trie to search for “IITM” we start from root move down to find first ‘I’ then another level down to find another ‘I’ and the search completes at forth level.
Similarly during insertion we move down the root based on the characters of string to be inserted and reach a leaf node/non-leaf node where extra nodes needs to be added to compensate for missing characters. For example to add “IITB” we trace down to reach ‘T’ and then add a child node for ‘B’; now ‘T’ has two child nodes for ‘M’ and ‘B’ .
Trie – Applications :
As stated earlier a Trie can be used to search in a dictionary; it can also be used to find number of strings matching a given prefix. Based on these concepts we can build real world applications like spell checker; it can also be used in auto-complete feature commonly found in messaging and mailing applications. Tries have also been studied extensively in ip address lookup algorithms(at router), with its variations like binary trie, path compressed trie, multi-bit trie, etc.
NOTE: Trie can be used in above applications.