Problems

18-1 Stacks on secondary storage

Consider implementing a stack in a computer that has a relatively small amount of fast primary memory and a relatively large amount of slower disk storage. The operations PUSH and POP work on single-word values. The stack we wish to support can grow to be much larger than can fit in memory, and thus most of it must be stored on disk.

A simple, but inefficient, stack implementation keeps the entire stack on disk. We maintain in-memory a stack pointer, which is the disk address of the top element on the stack. If the pointer has value $p$, the top element is the $( p ~\text{mod}~ m )$th word on page $\lfloor p / m \rfloor$ of the disk, where $m$ is the number of words per page.

To implement the PUSH operation, we increment the stack pointer, read the appropriate page into memory from disk, copy the element to be pushed to the appropriate word on the page, and write the page back to disk. A POP operation is similar. We decrement the stack pointer, read in the appropriate page from disk, and return the top of the stack. We need not write back the page, since it was not modified.

Because disk operations are relatively expensive, we count two costs for any implementation: the total number of disk accesses and the total CPU time. Any disk access to a page of $m$ words incurs charges of one disk access and $\Theta(m)$ CPU time.

a. Asymptotically, what is the worst-case number of disk accesses for $n$ stack operations using this simple implementation? What is the CPU time for $n$ stack operations? (Express your answer in terms of $m$ and $n$ for this and subsequent parts.)

Worst-case number of disk accesses: $n$ read + $n$ write.

CPU time: $\Theta(nm)$.

Now consider a stack implementation in which we keep one page of the stack in memory. (We also maintain a small amount of memory to keep track of which page is currently in memory.) We can perform a stack operation only if the relevant disk page resides in memory. If necessary, we can write the page currently in memory to the disk and read in the new page from the disk to memory. If the relevant disk page is already in memory, then no disk accesses are required.

b. What is the worst-case number of disk accesses required for $n$ PUSH operations? What is the CPU time?

Worst-case number of disk accesses: $\lceil n / m \rceil + 1$ write.

CPU time: $(\lceil n / m \rceil + 1) * \Theta(m) + n = \Theta(n)$.

c. What is the worst-case number of disk accesses required for $n$ stack operations? What is the CPU time?

Worst-case number of disk accesses: $\lceil n / 2 \rceil$ read + $\lceil n / 2 \rceil$ write.

CPU time: $\Theta(nm)$.

Suppose that we now implement the stack by keeping two pages in memory (in addition to a small number of words for bookkeeping).

d. Describe how to manage the stack pages so that the amortized number of disk accesses for any stack operation is $O(1/m)$ and the amortized CPU time for any stack operation is $O(1)$.

Less than $m/3$: load prev page.

Larger than $2m/3$: load next page.

18-2 Joining and splitting 2-3-4 trees

The join operation takes two dynamic sets $S'$ and $S''$ and an element $x$ such that for any $x' \in S'$ and $x'' \in S''$, we have $x'.key < x.key < x''.key$. It returns a set $S = S' \cup \{x\} \cup S''$. The split operation is like an "inverse" join: given a dynamic set $S$ and an element $x \in S$, it creates a set $S'$ that consists of all elements in set $S$ and an element $x \in S$, it creates a set $S'$ that consists of all elements in $S - \{x\}$ whose keys are less than $x.key$ and a set $S''$ that consists of all elements in $S - \{x\}$ whose keys are greater than $x.key$. In this problem, we investigate how to implement these operations on 2-3-4 trees. We assume for convenience that elements consist only of keys and that all key values are distinct.

a. Show how to maintain, for every node $x$ of a 2-3-4 tree, the height of the subtree rooted at $x$ as an attribute $x.height$. Make sure that your implementation does not affect the asymptotic running times of searching, insertion, and deletion.

b. Show how to implement the join operation. Given two 2-3-4 trees $T'$ and $T''$ and a key $k$, the join operation should run in $O(1 + |h' - h''|)$ time, where $h'$ and $h''$ are the heights of $T'$ and $T''$, respectively.

c. Consider the simple path $p$ from the root of a 2-3-4 tree $T$ to a given key $k$, the set $S'$ of keys in $T$ that are less than $k$, and the set $S''$ of keys in $T$ that are greater than $k$. Show that $p$ breaks $S'$ into a set of trees $\{T'_0, T'_1, \dots, T'_m\}$ and a set of keys $\{k'_1, k'_2, \dots, k'_m\}$, where, for $i = 1, 2, \dots, m$, we have $y < k'_i < z$ for any keys $y \in T'_{i-1}$ and $z \in T'_i$. What is the relationship between the heights of $T'_{i-1}$ and $T'_i$? Describe how $p$ breaks $S''$ into sets of trees and keys.

d. Show how to implement the split operation on $T$. Use the join operation to assemble the keys in $S'$ into a single 2-3-4 tree $T'$ and the keys in $S''$ into a single 2-3-4 tree $T''$. The running time of the split operation should be $O(\lg n)$, where $n$ is then umber of keys in $T$. (Hint: The costs for joining should telescope.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

problems.md

problems.md

Problems

18-1 Stacks on secondary storage

18-2 Joining and splitting 2-3-4 trees

Files

problems.md

Latest commit

History

problems.md

File metadata and controls

Problems

18-1 Stacks on secondary storage

18-2 Joining and splitting 2-3-4 trees