Segment Tree

Segment tree can be viewed as an abstract data structure which using some more space to trade for speed. For example, for a typical question with O(N^2) time complexity, the segment tree method can decrease it to O(N*log(N)).

To make it understandable, let us consider one example.

Say we have an integer array of N size, and what we want is to query the maximum with a query range [idx1, idx2], where idx1 is the left indexes, and idx2 is the right indexes inclusive.

If we only do this kind of query once, then we just need to scan through the array from idx1 to idx2 once, and record the maximum, done. The time complexity is O(N), which is decent enough in most cases even though it is not the optimal one (for example, with a segment tree built, the time complexity can decrease down to O(log(N))).

However, how about we need to query the array N times?

If we continue to use the naïve way above, then the time complexity is O(N^2), since for each query we need to scan the query range once.

We can do faster with a segment tree.

The key point is that (one easier implementation version):

1. if the array in the question has N elements, then we need to create a helper array with size of 2*N;

2. initial the second half of the helper array as the array in the question, or

helper[i] = arr[i-n], for i in the range of [N, 2*N-1]

3. initial the first half of the helper array based on the second half with the formula below:

helper[i] = max(helper[2*i], helper[2*n+1]), for i in the range of [N-1, 1]

For step 3, basically

a). the order has to be from N-1 to 1, since we initial the helper array from the second half, or from bigger ids to smaller ones; (or we have to initial the larger ids first)

b): helper[0] is useless (why?)

c): this structure is like a heap (with the root as the maximum) if you know heap previously

After we have the helper array, we do not have to scan the original array for every query, since we have saved the maximum information for some ranges already. For example, if asking for the maximum for the whole range, then we just need to return helper[1] directly.

Then how about a query with a random range [x, y]?

In the helper array, the question becomes to find the maximum in rang of [x+n, y+n].

Then we will try our best Not to scan the array, but to use the saved maximum in the smaller ids. There will some boundary conditions we have to deal with. The key logic is below:

int query(vector<int>& helper, int x, int y) {
    // assume all integers are non-negative, and x and y are valid
    int res = 0, m = data.size(), n = m/2;
    for(int i = x + n, j = y + n; i <= j; i /= 2, j /= 2) {
        if(i&1) res = max(res, helper[i++]);
        if(!(j&1)) res = max(res, helper[j--]);
    }
    return res;
}

How to understand the logic above?

If the left end is odd, then we need to take it into account before we shrinking the range by dividing the left (id+1) by 2 (why? since if the left end is odd, the maximum information stored at (i+1)/2 will Not include it, so we have to consider it before shrinking the range by almost half);

Similarly, if the right end is even, then we need to take it into account before we shrinking the range by dividing the right (id-1) by 2 (why? since if the right end is even, the maximum information stored at (j-1)/2 will Not include it, so we have to consider it before shrinking the range by almost half).

In this way, we take advantage of the helper array by shrinking the search range almost half for each loop step, so the time complexity is O(log(N)).

Thus if we query N times, then the overall time complexity is O(N*log(N)).

Similar to query, we can do update with time complexity of O(log(N)) as well. This is quite useful since it handles a dynamic situation: the helper array is changing, but the time complexity is still the same. The basically idea is to update the helper array bottom-up (or from bigger ids to smaller ids, until reaching to the root). I will not give the sample code here, but will leave it in the sample questions.

Hopefully now you could know why I call "segment tree" an abstract data structure in the beginning. Please leave your comments if you have some. Happy coding!

Question List

Upper Layer

Algo Advance

Search This Blog

Segment Tree

Comments

Post a Comment

Popular posts from this blog

Brute Force - Question 2

Greedy Algorithm - Question List

Dynamic Programming - Easy Level - Question 1