“* Try to be a rainbow in someone’s cloud*” – Maya Angelou

This post is a part of IIT-M CSE “** Concept-Sunday**” series .

src : https://stackoverflow.com/questions/5434813/longest-prefix-matches-for-urls; indianagrace.org

Well we are back to data-structures and we will be discussing about one of the most simple and useful data structure *Trie. * Primarily *Trie *is used for searching strings and is a efficient information re* trie*val data structure; however despite of its usefulness it has failed to find its place in standard library(C++ at least).

*Trie* – Definition :

Wikipedia says, a *trie,* also called a digital tree (radix tree or prefix tree) is a kind of search tree — an ordered tree data structure that is used to store a dynamic set or associative array where the keys are usually strings. Simply said *trie* is a data structure that stores strings by decomposing them into characters. The characters form a path through a tree until the leaf node of the tree which represents a string uniquely.

*Trie* – Why:

As a direct application, consider a basic spell checker; for each word in text, spell checker goes through its dictionary to find for existence of corresponding word and flags an error if the word is not found. *Tries *are very efficient for such existence queries on strings and were designed for this very purpose.

Now if the only motive of *trie * is to efficiently store strings why cant we simply go for standard binary trees or balanced binary trees. To answer this question lets look at the complexity of search in both the scenarios. If maximum length of the string is *W *and *N* is the* *total number of strings then a height balanced tree would take *O (W log(N)) *to search for string. Whereas a *trie* achieves the same in at a complexity of *O ( W ) *but having said that this speedup comes with a cost – cost in terms of storage(each node in *trie *consumes considerable space as compared to normal tree node).

From an application like spell checker we obviously expect speed, so the natural choice is *trie *as we choose time over time-space trade off.

*Trie* – Search :

From implementation perspective *trie* consists of nodes and edges. Each node has array(we can use linked list representation instead of array representation to save on space but that will add to search time) of 26 pointers which point to the child nodes so each node can have a maximum of 26 children. These 26 pointers correspond to 26 letters of the alphabet and can be directly indexed for each child character. Therefore a search through a trie is a matter of moving along the tree(starting from root) based on the sequence of characters in the search string. In the above *trie *to search for “IITM” we start from root move down to find first ‘I’ then another level down to find another ‘I’ and the search completes at forth level.

Similarly during insertion we move down the root based on the characters of string to be inserted and reach a leaf node/non-leaf node where extra nodes needs to be added to compensate for missing characters. For example to add “IITB” we trace down to reach ‘T’ and then add a child node for ‘B’; now ‘T’ has two child nodes for ‘M’ and ‘B’ .

*Trie* – Applications :

As stated earlier a *Trie *can be used to search in a dictionary; it can also be used to find number of strings matching a given prefix. Based on these concepts we can build real world applications like spell checker; it can also be used in auto-complete feature commonly found in messaging and mailing applications. *Tries* have also been studied extensively in ip address lookup algorithms(at router), with its variations like binary trie, path compressed trie, multi-bit trie, etc.

NOTE: *Trie* **can** be used in above applications.

EOBP!