“Nothing influences people more than a recommendation from a trusted friend” – Mark Zuckerberg
This post is a part of IIT-M CSE “Concept-Sunday” series .
src : sunkist.com/tips/how-to-slice-oranges/; all-free-download.com/free-vector/half-circle-shape.html
Our topic for today is segment tree – which is a famous tree data structure for range queries and range update problems, typically on an array. We will also look at the concept of lazy propagation associated with segment trees.
Given an array, consider the problem of finding sum of values in a given index range. For example if we have have array of size N, we are supposed to find sum of values between indices i and j ( 0 <= i <= j <= N – 1 , considering array indexing starts from zero). We call such a query for consecutive set of indices as a range query.
Now this problem can simply be solved by iterating from index i to index j and calculating the running sum; in this case update is simply a matter of indexing the array to update required position. However range query here takes O(N) worst case time and O(1) time is required for updation.
Next, another way to solve this problem is by maintaining an extra array which stores sum of values from index 0 to current index (cumulative sum). So the j ‘th index in this new array will have sum of values in original array from [0 – j]. Now in this case range query(sum) from index i to index j can be solved by taking difference of extraArray[j] and extraArray[i] i.e in O(1) time; but update here requires updating the extra array which will require O(N) time.
We can choose either of the approach based on our requirements; if range queries are more than update requests then we go by the second approach, else if update requests are more then we go by the first approach. But the question is can we better ? Solving both query and update in O(1) looks like much of an expectation and against the concept of trade-off. Then, in between O(1) and O(N) is O(log N) which seems reasonable. And the first data structure that comes to our mind after seeing O(log N) is a tree – which always comes to our rescue. Naturally we will turn back to tree data structure which will help us solve both range query and update problem in O(log N) time.
Segment Tree – Definition :
Such a tree that helps us solve range queries and update requests in O(log N) time is a segment tree. Each node in the segment tree stores information about a segment of data(array) or an interval of the array. So that traversal of this tree will give us required information about any segment in logarithmic complexity; also updates in such a tree can be achieved in logarithmic time.
It is also known as a static data structure as structure of tree cannot change once data structure is initialised. This also implies that we cannot add or delete elements from the array, only operation available to us in terms of modification to array is updating an existing element.
For the problem of finding sum over a range, segment tree can be described as :
- Each leaf node stores a value from original array.
- Each internal node stores sum of all leaf nodes in the sub-tree rooted at this internal node.
- Effectively each node stores sum of the values stored in its left and right child.
- Value of the leaf nodes belonging to a sub-tree are consecutively placed in the original array.
Such a tree is a full binary tree and N leaf nodes and (N – 1) internal nodes, where N is the total number of elements in the original array.
In the figure above root node stores sum of entire array[0 : 5] ; its left child stores sum of first half [0 : 2] and right child stores sum of the second half [3 : 4]. We can further divide [0 : 2] into [0 : 1] and [2 : 2]; [3 : 4] can be divided as [3 : 3] and [4 : 4]. Thus each leaf node is responsible for a single value and a level above leaf is responsible for 2 values and so on.
This division approach also tells us how to construct such a segment tree. We compute recursively, the value of each node starting from the root node by partitioning the array into two equal halves each time. Values returned from each half are summed to form value for the current node. The basis condition is hit whenever partition has a single value; this single value forms a leaf of the segment tree.
Since this is a full binary tree and static too, we can use an array representation to store such a segment tree which has its own advantages like easy to understand – level order storage of keys(values). When using array representation, the two child nodes of a node array[i] can be accessed using array[2 * i] and array[2 * i + 1] (nodes are stored from index 1 and not 0); parent of array[i] is array[i / 2].
Segment Tree Creation:
- A Segment Tree can be built using recursion (bottom-up approach ).
- Start with the leaves and go up to the root and update the corresponding changes in the nodes that are in the path from leaves to root (leaves represent a single element).
- In each step, the data of two children nodes are used to form an internal parent node.
- Each internal node will represent a union of its children’s intervals.
- Merging may be different for different questions. For example in case of finding minimum each internal node will contain the minimum of its child values, whereas in case of finding sum for given range each internal node will contain the sum of the values of its children.
- So, recursion will end up at the root node which will represent the whole array.
The two operations supported by segment tree are :
Range Query(startIndex, endIndex) : Here we sum the value of all the nodes that only cover leaf(values) within given range.
Update(index, newVal) : Here we update value of all the nodes that have given index in its range(Note that each node in segment tree is responsible for a range of values).
Lazy Update in Segment trees :
Another concept that is closely associated with segment tree is lazy update which is a valuable addition to range update. If there are updates on a range of array indexes, for example if value x is to be added to all values at indexes from i to j in array, then the update function has to be called for every index from i to j.
The key idea is to postpone some update operations until they are actually needed. Because we recursively perform update operations till we reach the leaf node, we can stop at a node when all the nodes in the sub-tree rooted at this node needs to be updated. This case arises when range of the node is completely within the desired update range.
In lazy update we create a lazy array lazy which has same size as that of the segment tree size, and its values are initialised to 0 which indicates no pending updates. If the value of lazy[x] > 0, it indicates that there is a pending update on x th index of segment tree and that update value is equal to lazy[x].
If the range of updation is [i, j] such that 0 < i < j < N – 1.
- If current segment tree node has any pending update, that is if lazy has some values > 0, then first add that pending update to current node.
- If current node’s range lies completely in update query range
- Update the current node.
- Postpone updates to children by setting lazy value for children nodes in the lazy array.
- If current node’s range overlaps with update range, follow the same approach as that of simple update.
- Recur for left and right children.
- Update current node using results of left and right calls.