Skip to main content

Binary Search - Hard Level - Question 2

Binary Search - Hard Level - Question 2


Leetcode 727 Minimum Window Subsequence

Given strings S and T, find the minimum (contiguous) substring W of S, so that T is a subsequence of W.

If there is no such window in S that covers all characters in T, return the empty string "". If there are multiple such minimum-length windows, return the one with the left-most starting index.

Note:

All the strings in the input will only contain lowercase letters.

The length of S will be in the range [1, 20000].

The length of T will be in the range [1, 100].


Analysis:


The first step to think about this question may be how to determine a string is a subsequence of another string? 

We can use the two-pointer method: one is a pointer to the beginning of the first string and the other one is a pointer for the second string (to be matched). Once the second pointer can reach the end of the second string, it means we have found a subsequence in the first string.

The time complexity for this step is O(M*N). But this question is asking for the minimum length for all the valid substrings. If we do the same search staring at each char of string S, the time complexity becomes O(M*M*N), which is too large.

One improvement is to record the indexes of each char.

Once we have the indexes of each char, we just either start with the first char of T, or the last char of T. If we choose to start the last char of T, we will start with the last index of the last char of T in S. Then we can find the second last char of T with a most closest distance (to the last char of T) in S, with a binary search. If we can find the second last char of T in S, then we can move on to the third last one of T, until all are found. In such a greedy way, we can find the shortest substring in S having a subsequence as T ending with the last appearance of the last last char of T in S (there may be more than one the last char of T in S). The time complexity is O(N*logM).

Why does binary search work?

1. all the indexes for each char are sorted;

2. we always want to find the next char with a most closest distance.

To find the global shortest substring, we need to start with each appearance the last char of T in S. So the time complexity is O(number of the last char of T in S * N * logM) ~ O(M*N*logM).


See the code below:


string findShortestStr(string s, string t) {
	int m = s.size(), n = t.size(), start = -1, len = 0;
	vector<vector<int>> ids(26);
	for(int i=0; i<m; ++i) {
		ids[s[i]-'a'].push_back(i);
	}
	for(auto &id : ids[t[n-1]-'a']) {
		int idx = id; 
		bool find = true;
		for(int i=n-2; i>=0; --i) {
			int x = t[i] - 'a';
			int p = lower_bound(ids[x].begin(), ids[x].end(), idx) - ids[x].begin();	
			if(p==0) {
				find = false;
				break;
			}
			idx = ids[x][p - 1];
		}
		if(find) {
			if(len == 0 || len > id - idx + 1) {
				start = idx;
				len = id - idx + 1; 
			}
		}
	}
	return start == -1 ? "" : s.substr(start, len);
}

Note:

There are some other methods to solve this problem, such as dp, or the pure two-pointer method.


Upper Layer

Comments

Popular posts from this blog

Segment Tree

Segment tree can be viewed as an abstract data structure which using some more space to trade for speed. For example, for a typical question with O(N^2) time complexity, the segment tree method can decrease it to O(N*log(N)).  To make it understandable, let us consider one example. Say we have an integer array of N size, and what we want is to query the maximum with a query range [idx1, idx2], where idx1 is the left indexes, and idx2 is the right indexes inclusive. If we only do this kind of query once, then we just need to scan through the array from idx1 to idx2 once, and record the maximum, done. The time complexity is O(N), which is decent enough in most cases even though it is not the optimal one (for example, with a segment tree built, the time complexity can decrease down to O(log(N))). However, how about we need to query the array N times? If we continue to use the naïve way above, then the time complexity is O(N^2), since for each query we need to scan the query range once...

Bit Manipulation - Example

  Leetcode 136 Single Number Given a non-empty array of integers nums, every element appears twice except for one. Find that single one. You must implement a solution with a linear runtime complexity and use only constant extra space. Constraints: 1 <= nums.length <= 3 * 10^4 -3 * 10^4 <= nums[i] <= 3 * 10^4 Each element in the array appears twice except for one element which appears only once. Analysis: If there is no space limitation, this question can be solved by counting easily. But counting requires additional space. Here we can use xor (^) operation based on some interesting observations:  A^A = 0, here A is any number A^0 = A, here A is any number Since all the number appears twice except one, then all the number appear even numbers will be cancelled out, and only the number appears one time is left, which is what we want. See the code below: class Solution { public: int singleNumber(vector<int>& nums) { int res = 0; for(auto ...

Rolling Hash

Rolling hash is one common trick used to increase efficiency of substring comparisons by compressing (or hashing) a string into a integer. After this step, we can compare two strings directly without comparing each chars. So the efficiency can be increased from O(N) to O(1). So how to implement the rolling hash? First we need to choose a base for the expansion and a modulo to mod. The basic formula is (suppose the window is n, and the rolling direction is from left to right), HashVal = (A1*p^(n-1) + A2*p^(n-2) + ... + An-1*p^1 + An*p^0)%mod where HashVal is the hash value, Ai is the ith element, p is the base, and mod is the modulo. To avoid collision as much as we can, p and modulo usually need to be large prime numbers. One corner case is that the base order in the above formula cannot be reversed. Or to be more clear, if the rolling direction is from left to right in an array, the first element should be in the highest order of the base, or times p^(n-1), and the last element times ...