Partitioning an Array
Paritioning an array is a common problem in computer science. In quick sort, the array is partitioned into smaller/equal and larger values based on the pivot. There are many other partition problems, where partition function is slightly different than the one in quick sort. Along with partitioning array the same partitioning technique can be used to find kth smallest/largest element
in the array using quick select.
In this post describing a generic code template based on the Nico Lomuto algorithm
to solve such partition problems in O(n) time and O(1) space.
Partition problems#
2 way partition problem
Example problems:
- Partition the array such that all the even numbers come before all the odd numbers.
- Partition the array all the smaller and equal numbers than the pivot come before all the larger numbers.
- Partition the array such that 0’s come before all the other numbers. (zero to the left of the array)
In such problems the array needs to be divided by a boundry separating the first partition and second partition. Let the index of this boundry be b1
all the indexes before this boundry belong to partition 1 and all the indexes after this boundry are either from partition 2 or unclassified.
A template is such case will be.
def two_way_partition(A: List[int]) -> None:
b1, n = 0, len(A) # b1 is index such that all elements with index < b1 will belong to partition one.
for i in range(n):
if belong_to_partition1(A[i]):
swap(A, b1, i)
b1 += 1
return
For even odd problem this will look like.
def even_odd(A: List[int]) -> None:
e, n = 0, len(A)
for i in range(n):
if A[i] % 2 == 0: # even partition function
swap(A, e, i)
e += 1
return
def swap(A: List[List[int]], i, j: int) -> int:
A[i], A[j] = A[j], A[i]
For pivot partitioning it will look like.
def partition_pivot(A: List[int], pivot: int) -> None:
le, n = 0, len(A)
for i in range(n):
if A[i] <= pivot: # less than equal partition function
swap(A, le, i)
le += 1
return
def swap(A: List[List[int]], i, j: int) -> int:
A[i], A[j] = A[j], A[i]
For zero first partition will look like.
def zero_first(A: List[int]) -> None:
z, n = 0, len(A)
for i in range(n):
if A[i] == 0: # zero partition function
swap(A, z, i)
z += 1
return
def swap(A: List[List[int]], i, j: int) -> int:
A[i], A[j] = A[j], A[i]
3 way partition problem
Extending on the 2 way partition, there are problems where the array needs to be divided into three partitions.
Examples:
- Dutch National Flag Problem Sort Colors
In these problems there are two boundaries, b1 denoting the first partition then b2 denoting the second partion and then the rest. So all the indexes less than b1 contain parition 1 elments and all the indexes >= b1
and < b2
contains partition 2 and all the indexes >=b2
contains partition3 or unclassified elements.
In such case the template will look like.
def three_way_partition(A: List[int]) -> None:
def swap(A: List[List[int]], i, j: int) -> int:
A[i], A[j] = A[j], A[i]
b1, b2, n = 0, 0, len(A)
for i in range(n):
if belong_to_partition1(A[i]):
swap(A, b2, i) # swap element b2 with i
swap(A, b1, b2) # swap element b1 with b2 to move the element to partition 1
b1 += 1 # increment partition 1 boundry
b2 += 1 # increment partition 2 boundry
elif belong_to_partition2(A[i]):
swap(A, b2, i) # only swap in second partition
b2 += 1 # only increment partition 2 boundry
return
- Dutch National Flag Sort Colors:
class Solution:
def sortColors(self, nums: List[int]) -> None:
"""
Do not return anything, modify nums in-place instead.
"""
def swap(A: List[List[int]], i, j: int) -> int:
A[i], A[j] = A[j], A[i]
b1, b2, n = 0, 0, len(A)
for i in range(n):
if nums[i] == 0:
swap(nums, b2, i)
swap(nums, b1, b2)
b2 += 1
b1 += 1
elif nums[i] == 1:
swap(nums, b2, i)
b2 += 1
Now the same template can be applied to more partition problems like 4 way partition, only it would be more tedius to write.
Quick Select#
Quick select uses the same techniques as in the Quick Sort. With the difference that once the partition is decided based on the condition only single recursive call with the desired partition is made. This makes the code to be tail recursive and thus can also be written iteratively. Complexity of such procedures also reduce to O(N)
in the average case, with randomly chosen pivot, from the O(nlogn)
naive solution of actually sorting the underlying structure.
Example Problems:
To find the kth largest element in a sorted array is pretty straight forward. kth largest element in a sorted array will be at n - k
index. i.e. largest element will at n - 1
index, second largest will at n - 2
index in the sorted array. In the problem array is not sorted, which makes it a little bit tricky. The array can be first sorted and then fetch the index gives a O(nlogn) solution.
A better solution intiution comes from the idea, that if an element is chosen randomly and it’s rank is calculated by iteration over the array to find all the smaller elements than the chosen number. Such brings the case if the rank of the chosen is same as desired rank then we get the answer, otherwise if the rank is less than desired rank, then repeat the process with the greater elements. Else if the rank is greater than the desired rank, than the desired element will live among the smaller elements than the pivot. Thus giving a divide and conquer
kind of algorithm based on quick sort
.
def findKthLargest(nums: List[int], k: int) -> int:
start, end, n = 0, len(nums) - 1, len(nums)
def swap(A: List[List[int]], i, j: int) -> int:
A[i], A[j] = A[j], A[i]
def qSelect(lo, hi):
if lo > hi:
return -1
p = random.randint(lo, hi) # select a random pivot index
swap(nums, p, hi) # swap the pivot with last element
# partition logic, similar to above template.
j = lo # j is index such that all elements with index < j will be less than pivot
for i in range(lo, hi): # iterate over the array
if nums[i] < nums[hi]: # if the element is smaller than pivot nums[hi] move witin j
swap(nums, j, i)
j += 1
swap(nums, hi, j) # put pivot in the j th position as
# all elements less pivot will in array nums[:j]
# all greater or equal than pivot will be in nums[j+1:]
# Note: pivot is at the correct position in the sorted array
if j == n - k: # If the pivot position is the desired rank, return it.
return nums[j]
if j > n - k:
return qSelect(lo, j - 1)
return qSelect(j + 1, hi)
return qSelect(start, end)
As the above code is tail recursive
, it can also be written iteratively. There is no change in the partition logic. Only change here is that inverted base case lo <= hi
is used to keep track while loop for which the sub array is valid array.
def findKthLargest(nums: List[int], k: int) -> int:
def swap(A: List[List[int]], i, j: int) -> int:
A[i], A[j] = A[j], A[i]
lo, hi, n = 0, len(nums) - 1, len(nums)
while lo <= hi:
p = random.randint(lo, hi)
swap(nums, p, hi)
j = lo
for i in range(lo, hi):
if nums[i] < nums[hi]:
swap(nums, j, i)
j += 1
swap(nums, hi, j)
if j == n - k:
return nums[j]
if j > n - k:
hi = j - 1
else:
lo = j + 1
This problem can be extended to find the median in the array if k
is given as n/2
.
Problem is similar to the above, but the idea is rather than finding kth largest, it returns the smallest k points. i.e. nums[:k]. Also the point distance is calculated based on the distance function.
def kClosest(points: List[List[int]], k: int) -> List[List[int]]:
def distance(point: List[int]):
return point[0] * point[0] + point[1] * point[1]
def swap(A: List[List[int]], i, j: int) -> int:
A[i], A[j] = A[j], A[i]
lo, hi, n = 0, len(points) - 1, len(points)
while lo <= hi:
p = random.randint(lo, hi)
swap(points, p, hi)
# Partition function.
pD = distance(points[hi])
j = lo
for i in range(lo, hi):
if distance(points[i]) <= pD: # `<=` handles the case when all the numbers are same, then getting the nth element leads to time out as only one element is removed from the partition at a time. With `<=` all same numbers are removed from the partition.
swap(points, i, j)
j += 1
swap(points, j, hi)
if j == k - 1: # the index of kth closest point will be k - 1, Always compare index with index values.
return points[:k]
if j < k - 1:
lo = j + 1
else:
hi = j - 1
return points[:j]
Top k frequent elements problem, is also quite similar to the k distance problem. The difference here is that the partition needs to be done using the frequency of the element in the array. So as a preprocessing, created a frequency table, which can answer the question what is the frequency of a particular element. Once the frequency is created, take all the unique keys and select one of them as the pivot to partition the array. In the left side all the elements with higher or equal frequency and right side with all the lower frequency elements. Based on the partition similar decision can be made as in the previous solution.
def topKFrequent(nums: List[int], k: int) -> List[int]:
def swap(A: List[int], i, j: int):
A[i], A[j] = A[j], A[i]
freq = Counter(nums)
keys = list(freq.keys())
lo, hi = 0, len(keys) - 1
while lo <= hi:
p = random.randint(lo, hi)
swap(keys, p, hi)
j = lo
for i in range(lo, hi):
if freq[keys[i]] >= freq[keys[hi]]: # `>=` partitions the array into greater and equal frequency elements are on the left side of the pivot and then all the lower frequency elements.
swap(keys, i, j)
j += 1
swap(keys, j, hi)
if j == k - 1:
return keys[:k]
if j < k - 1:
lo = j + 1
else:
hi = j - 1
return keys[:k]
These problem, can also be solved via sorting and heaps which will give O(nlogn)
complexity in worst case. Idea is to bring it down to O(N)
with quick select.
Other Similar problem: