Skip to main content

Segment Tree

Segment tree can be viewed as an abstract data structure which using some more space to trade for speed. For example, for a typical question with O(N^2) time complexity, the segment tree method can decrease it to O(N*log(N)). 

To make it understandable, let us consider one example.

Say we have an integer array of N size, and what we want is to query the maximum with a query range [idx1, idx2], where idx1 is the left indexes, and idx2 is the right indexes inclusive.

If we only do this kind of query once, then we just need to scan through the array from idx1 to idx2 once, and record the maximum, done. The time complexity is O(N), which is decent enough in most cases even though it is not the optimal one (for example, with a segment tree built, the time complexity can decrease down to O(log(N))).

However, how about we need to query the array N times?

If we continue to use the naïve way above, then the time complexity is O(N^2), since for each query we need to scan the query range once.

We can do faster with a segment tree.

The key point is that (one easier implementation version): 

1. if the array in the question has N elements, then we need to create a helper array with size of 2*N;

2. initial the second half of the helper array as the array in the question, or

    helper[i] = arr[i-n], for i in the range of [N, 2*N-1]

3. initial the first half of the helper array based on the second half with the formula below:

    helper[i] = max(helper[2*i], helper[2*n+1]), for i in the range of [N-1, 1]

For step 3, basically

a). the order has to be from N-1 to 1, since we initial the helper array from the second half, or from bigger ids to smaller ones; (or we have to initial the larger ids first)

b): helper[0] is useless (why?)

c): this structure is like a heap (with the root as the maximum) if you know heap previously

After we have the helper array, we do not have to scan the original array for every query, since we have saved the maximum information for some ranges already. For example, if asking for the maximum for the whole range, then we just need to return helper[1] directly.

Then how about a query with a random range [x, y]?

In the helper array, the question becomes to find the maximum in rang of [x+n, y+n].

Then we will try our best Not to scan the array, but to use the saved maximum in the smaller ids. There will some boundary conditions we have to deal with. The key logic is below:

int query(vector<int>& helper, int x, int y) {
    // assume all integers are non-negative, and x and y are valid
    int res = 0, m = data.size(), n = m/2;
    for(int i = x + n, j = y + n; i <= j; i /= 2, j /= 2) {
        if(i&1) res = max(res, helper[i++]);
        if(!(j&1)) res = max(res, helper[j--]);
    }
    return res;
}

How to understand the logic above?

If the left end is odd, then we need to take it into account before we shrinking the range by dividing the left (id+1) by 2 (why? since if the left end is odd, the maximum information stored at (i+1)/2 will Not include it, so we have to consider it before shrinking the range by almost half);

Similarly, if the right end is even, then we need to take it into account before we shrinking the range by dividing the right (id-1) by 2 (why? since if the right end is even, the maximum information stored at (j-1)/2 will Not include it, so we have to consider it before shrinking the range by almost half).

In this way, we take advantage of the helper array by shrinking the search range almost half for each loop step, so the time complexity is O(log(N)).

Thus if we query N times, then the overall time complexity is O(N*log(N)).

Similar to query, we can do update with time complexity of O(log(N)) as well. This is quite useful since it handles a dynamic situation: the helper array is changing, but the time complexity is still the same. The basically idea is to update the helper array bottom-up (or from bigger ids to smaller ids, until reaching to the root). I will not give the sample code here, but will leave it in the sample questions.

Hopefully now you could know why I call "segment tree" an abstract data structure in the beginning. Please leave your comments if you have some. Happy coding!


Question List


Upper Layer



Comments

Popular posts from this blog

Binary Search - Hard Level - Question 3

Binary Search - Hard Level - Question 3 878. Nth Magical Number A positive integer is magical if it is divisible by either a or b. Given the three integers n, a, and b, return the nth magical number. Since the answer may be very large, return it modulo 10^9 + 7. Analysis: Let us consider some examples first. Example 1, a = 4, b = 2. If b is dividable by a, then all the numbers which is dividable by a should be dividable by b as well. So the nth magical number should be n*b; Example 2, a = 3, b = 2. The multiples of 2 are: 2, 4, 6, 8, 10, 12, ... The multiple of 3 are: 3, 6, 9, 12, ... So the overlap is related to the minimum common multiple between a and b, and we need to remove the overlap which is double-counted. So now, we make some conclusions: 1. the upper bound of the nth magical number should be n*b, where a is the smaller one (or b <= a); 2. there are n*b/a magical numbers smaller than n*b; 3. there are n*b/(minimum common multiple) overlaps. Thus, the overall count is: n +

Recursion - Example

Recursion - Example Leetcode 231  Power of Two Given an integer n, return true if it is a power of two. Otherwise, return false. An integer n is a power of two, if there exists an integer x such that n == 2^x Constraints: -2^31 <= n <= 2^31 - 1 Analysis: One way is to think about this question recursively: if n%2 == 1, then n must not be power of 2; if not, then we just need to consider whether (n/2) is a power of 2 or not. This is exactly the "same question with a smaller size"! It is trivial to figure out the base cases: if n == 0, return false; if n == 1, return true. See the code below: class Solution { public: bool isPowerOfTwo(int n) { // base cases if(n == 0) return false; if(n == 1) return true; // converging if(n%2 == 1) return false; return isPowerOfTwo(n/2); } }; If interested, there are some other ways to solve this problem. For example, using bit manipulation, we can have the following solution: class