Skip to main content

Union-Find - Question 1

1998. GCD Sort of an Array

You are given an integer array nums, and you can perform the following operation any number of times on nums:

Swap the positions of two elements nums[i] and nums[j] if gcd(nums[i], nums[j]) > 1 where gcd(nums[i], nums[j]) is the greatest common divisor of nums[i] and nums[j].

Return true if it is possible to sort nums in non-decreasing order using the above swap method, or false otherwise.

Constraints:

1 <= nums.length <= 3 * 10^4

2 <= nums[i] <= 10^5


Analysis:

One of the key information is that we can do the swap as many as needed, so the crux is to how to find the pair or group that can be swapped.

Let say we have three elements: A, B and C. If A and B can swap, and B and C can swap, then A, B, and C can swap in any order. So it is easy to propagate this property to groups with more than 3 elements.

Or more general: say we have a group of elements that can swap with each other. Now, we have a new element X. If X can swap with one element in the group, then X can swap with any one in the group.

So this question becomes a union-find problem: to find the connected group.

We will talk about how to find the connected group later (since it requires to consider time complexity). Suppose now we have found the connected groups. Then inside each group, we can do the swap as many as needed. This means that we can sort each group, to make sure each group is sorted, which is the best we can do with swapping. Then check the whole array, to see whether it is sorted. Return true if yes; otherwise, return false.

For example, [8, 9, 4, 2, 3]. There are two groups: [8, 4, 2] (at indexes of 0, 2 and 3) and [9, 3] (at indexes of 1 and 4). With unlimited number of swapping, the first group can be [2, 4, 8] and the second [3, 9]. Then whole array becomes [2, 3, 4, 8, 9]. So return true.

Now let us look at how to find the connected groups.

The naïve way is to find a two-layer for loop, we can check the gcd for every pair of elements. But the requires O(N^2) time complexity, where N is the number of elements. In this question, N could be as large as 10^4, so N^2 is too large. (The time complexity of gcd is considered as ~O(1) here).

We do can do faster: we can check all the prime factors of each number. If two numbers share at least one factor, then they are connected, or they belong to the same group.

The good news is that there are not many prime factors to consider. The maximum value of the elements could be 10^5, so we just need to check the prime factor in this range. A rough estimate is that there may be ~ 10% primes, so there are 10^4 prime factors.

If for each element, we scan through all the prime factors, to see whether it contains the prime factor or not, then the time complexity is O(N*10^4), which could be 10^8, which is still too large.

One improvement is that, we do not need to scan through all the prime factors. For example, if we find one number X has a prime factor p1, then we can remove all the prime factor p1 from X. Then check the X/p1 is a prime factor or not. If yes, we can stop the search; if no, we can continue the searching with X/p1 starting with prime factors larger than p1. The converging speed is much faster than scanning all the prime factors. The overall time complexity is roughly about log(10^4), for checking the prime factor for each element.

The overall time complexity is O(N*log(10^4)), which could be less than 10^6 for the largest case. And this time complexity is acceptable for the online OJ.

To make a summary about how to connected elements to a group: if one element has A, B, C, ... prime factors, then A, B, C ... prime factors are connected. Or the element can be viewed as a bridge to connected all these prime factors.

One detail for the implementation of the union-find algorithm is that the prime factor is not continuous, so a vector is not a good data structure to record the roots of the groups. Instead, a map will be used. (Why not unordered_map? since we need the prime factor to be ordered as well, to lower the time complexity in the checking prime factor step.)


See the code below:


class Solution {
public:
    bool gcdSort(vector<int>& nums) {
        int n = nums.size(), val = nums[0];
        for(int i=0; i<n; ++i) {
            val = max(nums[i], val);
        }
        vector<int> ps = primes(val + 1);
        // for(auto &a : ps) {cout<<a<<" ";} cout<<endl;
        int m = ps.size();
        map<int, int> mp;
        for(auto &a : ps) mp[a] = a;
        vector<int> ids(n, -1);// to connect the nums to mp
        for(int i=0; i<n; ++i) {
            int first = -1, copy = nums[i];
            for(auto &a : mp) {
                // is a prime
                if(mp.count(copy) > 0) {
                    ids[i] = find(mp, copy);
                    if(first != -1)  mp[find(mp, copy)] = find(mp, first);
                    break;
                }
                int k = a.first;
                if(k*k > copy) break;
                if(copy%k == 0) {
                    while(copy%k == 0) copy /= k;
                    if(first == -1) {
                        first = find(mp, k);
                        ids[i] = first;
                        // cout<<i<<" "<<first<<endl;
                    } else {
                        mp[find(mp, k)] = find(mp, first);
                    }
                }
            }
        }
        unordered_map<int, vector<int>> ump;
        for(int i=0; i<n; ++i) {
            int r = find(mp, ids[i]);
            ump[r].push_back(i);
        }
        for(auto &a : ump) {
            vector<int> t;
            for(auto &b : a.second) {
                t.push_back(nums[b]);
            }
            sort(t.begin(), t.end());
            int id = 0;
            for(auto &b : a.second) {
                nums[b] = t[id++];
            }
        }
        for(int i=0; i<n; ++i) {
            if(i+1<n && nums[i] > nums[i+1]) return false;
        }
        return true;
    }
private:
    vector<int> primes(int val) {
        vector<int> res;
        vector<bool> isP(val + 1, true);
        for(int i=2; i<=val; ++i) {
            if(isP[i]) {
                res.push_back(i);
                for(int j=1; j*i<=val; ++j) isP[j*i] = false;
            }
        }
        return res;
    }
    int find(map<int, int>& rs, int id) {
        // cout<<rs.size()<<" "<<id<<endl;
        if(rs[id] != id) return rs[id] = find(rs, rs[id]);
        return rs[id];
    }
};


Upper Layer


Comments

Popular posts from this blog

Binary Search - Hard Level - Question 3

Binary Search - Hard Level - Question 3 878. Nth Magical Number A positive integer is magical if it is divisible by either a or b. Given the three integers n, a, and b, return the nth magical number. Since the answer may be very large, return it modulo 10^9 + 7. Analysis: Let us consider some examples first. Example 1, a = 4, b = 2. If b is dividable by a, then all the numbers which is dividable by a should be dividable by b as well. So the nth magical number should be n*b; Example 2, a = 3, b = 2. The multiples of 2 are: 2, 4, 6, 8, 10, 12, ... The multiple of 3 are: 3, 6, 9, 12, ... So the overlap is related to the minimum common multiple between a and b, and we need to remove the overlap which is double-counted. So now, we make some conclusions: 1. the upper bound of the nth magical number should be n*b, where a is the smaller one (or b <= a); 2. there are n*b/a magical numbers smaller than n*b; 3. there are n*b/(minimum common multiple) overlaps. Thus, the overall count is: n + ...

Segment Tree

Segment tree can be viewed as an abstract data structure which using some more space to trade for speed. For example, for a typical question with O(N^2) time complexity, the segment tree method can decrease it to O(N*log(N)).  To make it understandable, let us consider one example. Say we have an integer array of N size, and what we want is to query the maximum with a query range [idx1, idx2], where idx1 is the left indexes, and idx2 is the right indexes inclusive. If we only do this kind of query once, then we just need to scan through the array from idx1 to idx2 once, and record the maximum, done. The time complexity is O(N), which is decent enough in most cases even though it is not the optimal one (for example, with a segment tree built, the time complexity can decrease down to O(log(N))). However, how about we need to query the array N times? If we continue to use the naïve way above, then the time complexity is O(N^2), since for each query we need to scan the query range once...

Recursion - Example

Recursion - Example Leetcode 231  Power of Two Given an integer n, return true if it is a power of two. Otherwise, return false. An integer n is a power of two, if there exists an integer x such that n == 2^x Constraints: -2^31 <= n <= 2^31 - 1 Analysis: One way is to think about this question recursively: if n%2 == 1, then n must not be power of 2; if not, then we just need to consider whether (n/2) is a power of 2 or not. This is exactly the "same question with a smaller size"! It is trivial to figure out the base cases: if n == 0, return false; if n == 1, return true. See the code below: class Solution { public: bool isPowerOfTwo(int n) { // base cases if(n == 0) return false; if(n == 1) return true; // converging if(n%2 == 1) return false; return isPowerOfTwo(n/2); } }; If interested, there are some other ways to solve this problem. For example, using bit manipulation, we can have the following solution: class ...