Partitioning an Array

Paritioning an array is a common problem in computer science. In quick sort, the array is partitioned into smaller/equal and larger values based on the pivot. There are many other partition problems, where partition function is slightly different than the one in quick sort. Along with partitioning array the same partitioning technique can be used to find kth smallest/largest element in the array using quick select.

In this post describing a generic code template based on the Nico Lomuto algorithm to solve such partition problems in O(n) time and O(1) space.

Partition problems#

2 way partition problem

Example problems:

Partition the array such that all the even numbers come before all the odd numbers.
Partition the array all the smaller and equal numbers than the pivot come before all the larger numbers.
Partition the array such that 0’s come before all the other numbers. (zero to the left of the array)

In such problems the array needs to be divided by a boundry separating the first partition and second partition. Let the index of this boundry be b1 all the indexes before this boundry belong to partition 1 and all the indexes after this boundry are either from partition 2 or unclassified.

A template is such case will be.

def two_way_partition(A: List[int]) -> None:
    b1, n = 0, len(A) # b1 is index such that all elements with index < b1 will belong to partition one.
    for i in range(n):
        if belong_to_partition1(A[i]):
            swap(A, b1, i)
            b1 += 1
    return

For even odd problem this will look like.

def even_odd(A: List[int]) -> None:
    e, n = 0, len(A)
    for i in range(n):
        if A[i] % 2 == 0: # even partition function
            swap(A, e, i)
            e += 1
    return
def swap(A: List[List[int]], i, j: int) -> int:
    A[i], A[j] = A[j], A[i]

For pivot partitioning it will look like.

def partition_pivot(A: List[int], pivot: int) -> None:
    le, n = 0, len(A)
    for i in range(n):
        if A[i] <= pivot: # less than equal partition function
            swap(A, le, i)
            le += 1
    return

def swap(A: List[List[int]], i, j: int) -> int:
    A[i], A[j] = A[j], A[i]

For zero first partition will look like.

def zero_first(A: List[int]) -> None:
    z, n = 0, len(A)
    for i in range(n):
        if A[i] == 0: # zero partition function
            swap(A, z, i)
            z += 1
    return

def swap(A: List[List[int]], i, j: int) -> int:
    A[i], A[j] = A[j], A[i]

3 way partition problem

Extending on the 2 way partition, there are problems where the array needs to be divided into three partitions.

Examples:

Dutch National Flag Problem Sort Colors

In these problems there are two boundaries, b1 denoting the first partition then b2 denoting the second partion and then the rest. So all the indexes less than b1 contain parition 1 elments and all the indexes >= b1 and < b2 contains partition 2 and all the indexes >=b2 contains partition3 or unclassified elements.

In such case the template will look like.

def three_way_partition(A: List[int]) -> None:

    def swap(A: List[List[int]], i, j: int) -> int:
        A[i], A[j] = A[j], A[i]

    b1, b2, n = 0, 0, len(A)
    for i in range(n):
        if belong_to_partition1(A[i]):
            swap(A, b2, i) # swap element b2 with i
            swap(A, b1, b2) # swap element b1 with b2 to move the element to partition 1
            b1 += 1 # increment partition 1 boundry
            b2 += 1 # increment partition 2 boundry
        elif belong_to_partition2(A[i]):
            swap(A, b2, i) # only swap in second partition
            b2 += 1  # only increment partition 2 boundry
    return

Dutch National Flag Sort Colors:

class Solution:
    def sortColors(self, nums: List[int]) -> None:
        """
        Do not return anything, modify nums in-place instead.
        """
        def swap(A: List[List[int]], i, j: int) -> int:
            A[i], A[j] = A[j], A[i]

        b1, b2, n = 0, 0, len(A)
        for i in range(n):
            if nums[i] == 0:
                swap(nums, b2, i)
                swap(nums, b1, b2)
                b2 += 1
                b1 += 1
            elif nums[i] == 1:
                swap(nums, b2, i)
                b2 += 1

Now the same template can be applied to more partition problems like 4 way partition, only it would be more tedius to write.

Quick Select#

Quick select uses the same techniques as in the Quick Sort. With the difference that once the partition is decided based on the condition only single recursive call with the desired partition is made. This makes the code to be tail recursive and thus can also be written iteratively. Complexity of such procedures also reduce to O(N) in the average case, with randomly chosen pivot, from the O(nlogn) naive solution of actually sorting the underlying structure.

Example Problems:

kth largest element in an array

To find the kth largest element in a sorted array is pretty straight forward. kth largest element in a sorted array will be at n - k index. i.e. largest element will at n - 1 index, second largest will at n - 2 index in the sorted array. In the problem array is not sorted, which makes it a little bit tricky. The array can be first sorted and then fetch the index gives a O(nlogn) solution.

A better solution intiution comes from the idea, that if an element is chosen randomly and it’s rank is calculated by iteration over the array to find all the smaller elements than the chosen number. Such brings the case if the rank of the chosen is same as desired rank then we get the answer, otherwise if the rank is less than desired rank, then repeat the process with the greater elements. Else if the rank is greater than the desired rank, than the desired element will live among the smaller elements than the pivot. Thus giving a divide and conquer kind of algorithm based on quick sort.

def findKthLargest(nums: List[int], k: int) -> int:
    start, end, n = 0, len(nums) - 1, len(nums)
    
    def swap(A: List[List[int]], i, j: int) -> int:
        A[i], A[j] = A[j], A[i]

    def qSelect(lo, hi):
        if lo > hi:
            return -1
        p = random.randint(lo, hi) # select a random pivot index
        swap(nums, p, hi) # swap the pivot with last element
        # partition logic, similar to above template.
        j = lo # j is index such that all elements with index < j will be less than pivot
        for i in range(lo, hi): # iterate over the array
            if nums[i] < nums[hi]: # if the element is smaller than pivot nums[hi] move witin j
                swap(nums, j, i)
                j += 1
        swap(nums, hi, j)  # put pivot in the j th position as 
        # all elements less pivot will in array nums[:j] 
        # all greater or equal than pivot will be in nums[j+1:]
        # Note: pivot is at the correct position in the sorted array
        if j == n - k: # If the pivot position is the desired rank, return it.
            return nums[j]
        if j > n - k:
            return qSelect(lo, j - 1)
        return qSelect(j + 1, hi)
    
    return qSelect(start, end)

As the above code is tail recursive, it can also be written iteratively. There is no change in the partition logic. Only change here is that inverted base case lo <= hi is used to keep track while loop for which the sub array is valid array.

def findKthLargest(nums: List[int], k: int) -> int:

    def swap(A: List[List[int]], i, j: int) -> int:
        A[i], A[j] = A[j], A[i]

    lo, hi, n = 0, len(nums) - 1, len(nums)
    while lo <= hi: 
        p = random.randint(lo, hi)
        swap(nums, p, hi)
        j = lo
        for i in range(lo, hi):
            if nums[i] < nums[hi]:
                swap(nums, j, i)
                j += 1
        swap(nums, hi, j)
        if j == n - k:
            return nums[j]
        if j > n - k:
            hi = j - 1
        else:
            lo = j + 1

This problem can be extended to find the median in the array if k is given as n/2.

k closest points to origin

Problem is similar to the above, but the idea is rather than finding kth largest, it returns the smallest k points. i.e. nums[:k]. Also the point distance is calculated based on the distance function.

def kClosest(points: List[List[int]], k: int) -> List[List[int]]:
    
    def distance(point: List[int]):
        return point[0] * point[0] + point[1] * point[1]
    
    def swap(A: List[List[int]], i, j: int) -> int:
        A[i], A[j] = A[j], A[i]

    lo, hi, n = 0, len(points) - 1, len(points)
    while lo <= hi:
        p = random.randint(lo, hi)
        swap(points, p, hi)
        # Partition function.
        pD = distance(points[hi])
        j = lo
        for i in range(lo, hi):
            if distance(points[i]) <= pD: # `<=` handles the case when all the numbers are same, then getting the nth element leads to time out as only one element is removed from the partition at a time. With `<=` all same numbers are removed from the partition.
                swap(points, i, j)
                j += 1
        swap(points, j, hi)
        if j == k - 1: # the index of kth closest point will be k - 1, Always compare index with index values.
            return points[:k]
        if j < k - 1:
            lo = j + 1
        else:
            hi = j - 1
    return points[:j]

top k frequent elements

Top k frequent elements problem, is also quite similar to the k distance problem. The difference here is that the partition needs to be done using the frequency of the element in the array. So as a preprocessing, created a frequency table, which can answer the question what is the frequency of a particular element. Once the frequency is created, take all the unique keys and select one of them as the pivot to partition the array. In the left side all the elements with higher or equal frequency and right side with all the lower frequency elements. Based on the partition similar decision can be made as in the previous solution.

def topKFrequent(nums: List[int], k: int) -> List[int]:

    def swap(A: List[int], i, j: int):
        A[i], A[j] = A[j], A[i]

    freq = Counter(nums)
    keys = list(freq.keys())
    lo, hi = 0, len(keys) - 1
    while lo <= hi:
        p = random.randint(lo, hi)
        swap(keys, p, hi)
        j = lo
        for i in range(lo, hi):
            if freq[keys[i]] >= freq[keys[hi]]: # `>=` partitions the array into greater and equal frequency elements are on the left side of the pivot and then all the lower frequency elements.
                swap(keys, i, j)
                j += 1
        swap(keys, j, hi)
        if j == k - 1:
            return keys[:k]
        if j < k - 1:
            lo = j + 1
        else:
            hi = j - 1
    return keys[:k]

These problem, can also be solved via sorting and heaps which will give O(nlogn) complexity in worst case. Idea is to bring it down to O(N) with quick select.