Skip to main content

Rolling Hash - Question 2

1316. Distinct Echo Substrings

Return the number of distinct non-empty substrings of text that can be written as the concatenation of some string with itself (i.e. it can be written as a + a where a is some string).

Constraints:

1 <= text.length <= 2000

text has only lowercase English letters.


Analysis:


There are O(N^2) substring in total, and if we check each substring one by one, the time complexity for checking is O(N). So the overall time complexity is O(N^3).

The longest test case may have N as 2000, so N^3 ~ 10^10, which is too large.

We need to find some other ways faster.

One way is to use a rolling hashing method. For a fixed length, we just scan the string from left to right once with double pointers.

The first pointer is the ending of the first half; the second one is the ending of the second half. If they are the same, or give the same hash value, then we consider they are the same.

To avoid repeating counting, we can use a set.

See the code below,


class Solution {
public:
    int distinctEchoSubstrings(string text) {
        unordered_set<int> st;
        int n = text.size();
        for(int i=1; i<=n/2; ++i) {
            long h1 = 0, h2 = 0, p = 199, mod = 1e9+7, r = 1;
            for(int j=0, k=i; k<n; ++j, ++k) {
                if(j<i) r = r*p%mod;
                h1 = (h1*p%mod + text[j]-'a'+1)%mod;
                h2 = (h2*p%mod + text[k]-'a'+1)%mod;
                if(j>=i) {
                    h1 = (h1 + mod - (text[j-i]-'a'+1)*r%mod)%mod;
                    h2 = (h2 + mod - (text[k-i]-'a'+1)*r%mod)%mod;
                }
                if(j>=i-1 && h1 == h2) {
                    // cout<<text.substr(j-i+1, i*2)<<endl;
                    st.insert(h1);
                }
            }
        }
        return st.size();
    }
};


If do not want to use the rolling hash method, we can just use a counting to record the same number of elements in order. 

If the counting is the same as the length, we find one valid answer; when moving onto the next element, we just need to decrease the counting by 1;

If the two poninters point to two different elements, we need to reset the value of counting to be 0;

If the two pointers point to elements with the same value, the counting will be increased by 1. 


 See the code below,

class Solution {
public:
    int distinctEchoSubstrings(string text) {
        unordered_set<string> st;
        int n = text.size();
        for(int i=1; i<=n/2; ++i) {
            long ct = 0;
            for(int j=0, k=i; k<n; ++j, ++k) {
                if(text[j] == text[k]) ++ct;
                else ct = 0;
                if(ct == i) {
                    st.insert(text.substr(j-i+1, i));
                    --ct;
                }
            }
        }
        return st.size();
    }
};





Comments

Popular posts from this blog

Binary Search - Hard Level - Question 3

Binary Search - Hard Level - Question 3 878. Nth Magical Number A positive integer is magical if it is divisible by either a or b. Given the three integers n, a, and b, return the nth magical number. Since the answer may be very large, return it modulo 10^9 + 7. Analysis: Let us consider some examples first. Example 1, a = 4, b = 2. If b is dividable by a, then all the numbers which is dividable by a should be dividable by b as well. So the nth magical number should be n*b; Example 2, a = 3, b = 2. The multiples of 2 are: 2, 4, 6, 8, 10, 12, ... The multiple of 3 are: 3, 6, 9, 12, ... So the overlap is related to the minimum common multiple between a and b, and we need to remove the overlap which is double-counted. So now, we make some conclusions: 1. the upper bound of the nth magical number should be n*b, where a is the smaller one (or b <= a); 2. there are n*b/a magical numbers smaller than n*b; 3. there are n*b/(minimum common multiple) overlaps. Thus, the overall count is: n + ...

Segment Tree

Segment tree can be viewed as an abstract data structure which using some more space to trade for speed. For example, for a typical question with O(N^2) time complexity, the segment tree method can decrease it to O(N*log(N)).  To make it understandable, let us consider one example. Say we have an integer array of N size, and what we want is to query the maximum with a query range [idx1, idx2], where idx1 is the left indexes, and idx2 is the right indexes inclusive. If we only do this kind of query once, then we just need to scan through the array from idx1 to idx2 once, and record the maximum, done. The time complexity is O(N), which is decent enough in most cases even though it is not the optimal one (for example, with a segment tree built, the time complexity can decrease down to O(log(N))). However, how about we need to query the array N times? If we continue to use the naïve way above, then the time complexity is O(N^2), since for each query we need to scan the query range once...

Recursion - Example

Recursion - Example Leetcode 231  Power of Two Given an integer n, return true if it is a power of two. Otherwise, return false. An integer n is a power of two, if there exists an integer x such that n == 2^x Constraints: -2^31 <= n <= 2^31 - 1 Analysis: One way is to think about this question recursively: if n%2 == 1, then n must not be power of 2; if not, then we just need to consider whether (n/2) is a power of 2 or not. This is exactly the "same question with a smaller size"! It is trivial to figure out the base cases: if n == 0, return false; if n == 1, return true. See the code below: class Solution { public: bool isPowerOfTwo(int n) { // base cases if(n == 0) return false; if(n == 1) return true; // converging if(n%2 == 1) return false; return isPowerOfTwo(n/2); } }; If interested, there are some other ways to solve this problem. For example, using bit manipulation, we can have the following solution: class ...