Given a set of
$n$ numbers, we wish to find the$i$ largest in sorted order using a comparison-based algorithm. Find the algorithm that implements each of the following methods with the best asymptotic worst-case running time, and analyze the running times of the algorithms in terms of$n$ and$i$ .
a. Sort the numbers, and list the
$i$ largest.
Depends on the sorting algorithm, with heap sort the worst-case is
b. Build a max-priority queue from the numbers, and call EXTRACT-MAX
$i$ times.
Build the heap is
c. Use an order-statistic algorithm to find the $i$th largest number, partition around that number, and sort the
$i$ largest numbers.
For
$n$ distinct elements$x_1, x_2, \dots, x_n$ with positive weights$w_1, w_2, \dots, w_n$ such that$\sum_{i=1}^n w_i = 1$ , the weighted (lower) median is the element$x_k$ satisfying
$\displaystyle \sum_{x_i < x_k} w_i < \frac{1}{2}$
and
$\displaystyle \sum_{x_i > x_k} w_i \le \frac{1}{2}$
a. Argue that the median of
$x_1, x_2, \dots, x_n$ is the weighted median of the$x_i$ with weights$w_i = 1/n$ for$i=1,2,\dots,n$ .
b. Show how to compute the weighted median of
$n$ elements in$O(n \lg n)$ worstcase time using sorting.
def weighted_median(x):
x.sort()
s = 0.0
for i in range(len(x)):
s += x[i]
if s >= 0.5:
return x[i]
c. Show how to compute the weighted median in
$\Theta(n)$ worst-case time using a linear-time median algorithm such as SELECT from Section 9.3.
Use the median as pivot, the algorithm is exactly
def black_box_median(a, p, r):
return sorted(a[p:r])[(r - p - 1) // 2]
def partition(a, p, r, x):
s = x
i = p - 1
for j in xrange(p, r - 1):
if a[j] == x:
a[j], a[r - 1] = a[r - 1], a[j]
break
for j in xrange(p, r - 1):
if a[j] < x:
i += 1
s += a[j]
a[i], a[j] = a[j], a[i]
i += 1
a[i], a[r - 1] = a[r - 1], a[i]
return i, s
def weighted_median(a, p, r, w=0.5):
if p + 1 >= r:
return a[p]
x = black_box_median(a, p, r)
q, s = partition(a, p, r, x)
if s - a[q] < w and s >= w:
return a[q]
if s >= w:
return weighted_median(a, p, q, w)
return weighted_median(a, q + 1, r, w - s)
The post-office location problem is defined as follows. We are given
$n$ points$p_1, p_2, \dots, p_n$ with associated wegihts$w_1, w_2, \dots, w_n$ . We wish to find a point$p$ that minimizes the sum$\sum_{i=1}^n w_i d(p, p_i)$ , where$d(a,b)$ is the distance between points$a$ and$b$ .
d. Argue that the weighted median is a best solution for the 1-dimensional postoffice location problem, in which points are simply real numbers and the distance between points
$a$ and$b$ is$d(a, b) = |a - b|$ .
Same as Exercise 9.3-9.
e. Find the best solution for the 2-dimensional post-office location problem, in which the points are
$(x,y)$ coordinate pairs and the distance between points$a=(x_1, y-1)$ and$b=(x_2, y_2)$ is the Manhattan distance given by$d(a,b)=|x_1-x_2|+|y_1-y_2|$ .
Since
We showed that the worst-case number
$T(n)$ of comparisons used by SELECT to select the $i$th order statistic from$n$ numbers satisfies$T(n)=\Theta(n)$ , but the constant hidden by the$\Theta$ -notation is rather large. When$i$ is small relative to$n$ , we can implement a different procedure that uses SELECT as a subroutine but makes fewer comparisons in the worst case.
a. Describe an algorithm that uses
$U_i(n)$ comparisons to find the $i$th smallest of$n$ elements, where
$\displaystyle U_i(n) = \left \{ \begin{array}{ll} T(n) & \text{if}~~i \ge n/2 \\ \lfloor n / 2 \rfloor + U_i(\lceil n / 2 \rceil) + T(2i) & \text{otherwise} \end{array} \right .$
Divide elements into pairs and compare each pair. Recursively deal with the set with the smaller elements in each pair, and return the
b. Show that, if
$i < n/2$ , then$U_i(n)=n+O(T(2i)\lg(n/i))$ .
Suppose
c. Show that if
$i$ is a constant less than$n/2$ , then$U_i(n)= n + O(\lg n)$ .
d. Show that if
$i=n/k$ for$k \ge 2$ , then$U_i(n)=n+O(T(2n/k)\lg k)$ .
In this problem, we use indicator random variables to analyze the RANDOMIZED-SELECT procedure in a manner akin to our analysis of RANDOMIZED-QUICKSORT in Section 7.4.2.
As in the quicksort analysis, we assume that all elements are distinct, and we rename the elements of the input array
$A$ as$z_1, z_2, \dots, z_n$ , where$z_i$ is the $i$th smallest element. Thus, the call RANDOMIZED-SELECT$(A, 1, n, k)$ returns$z_k$ .
For
$1 \le i < j \le n$ , let
$X_{ijk} = I \{z_i$ is compared with$z_j$ sometime during the execution of the algorithm to find$z_k\}$ .
a. Give an exact expression for
$\text{E}[X_{ijk}]$ .
b. Let
$X_k$ denote the total number of comparisons between elements of array$A$ when finding$z_k$ . Show that
$$ \text{E}[X_k] \le 2 \left ( \sum_{i=1}^{k}\sum_{j=k}^n \frac{1}{j-i+1} + \sum_{j=k+1}^{n} \frac{j-k-1}{j-k+1} + \sum_{i=1}^{k-2} \frac{k-i-1}{k-i+1} \right ) $$
c. Show that
$E[X_k] \le 4n$ .
Based on StackExchange,
And
Therefore
d. Conclude that, assuming all elements of array
$A$ are distinct, RANDOMIZED-SELECT runs in expected time$O(n)$ .