Heapify and Heap Sort

Recall that Heap Sort basically operates by building a heap with n values then destroying the heap.

A complete binary tree with n nodes means that at most there are log n nodes from the root (top) to a leaf (a node at the bottom of the tree)

Insertion may require the percolate up process. The number of times a node needs to percolate up can be no more than the number of nodes from the root to the leaf. Therefore, it is pretty easy to see that this percolate up process is O(log n) for a tree with n nodes. This means that to add a value to a Heap is a process that is O(n log n). In order for us to build a heap we need to insert into it n times. Therefore, the worst case runtime for this process is O(n log n)

The deletion process may require a percolate down process. Like the percolate up process this also is O(log n). Thus, to remove a value from the heap is O(log n). We need to remove n values so this process is O(n log n).

To heap sort we build heap O(n log n) then destroy the heap O(n log n). Therefore the whole sorting algorithm has a runtime of O(n log n)

Our heap can be represented with an array. The simplest implementation of heap sort would be to create a heap object:

//simple but not correct way to heap sort an array
void heapSort(int data[],int size){
    Heap theheap;
    for(int i=0;i<size;i++){
        theheap.insert(data[i]);
    }
    for(int i=0;i<size;i++){
        data[i]=theheap.front();
        theheap.delete();
    }
}

but this is not actually a good implementation. The above would create a heap as it goes meaning that we need to actually double the data storage as the heap itself would be an array that is as big as data. Thus, the function would have the same drawback as merge sort.

What we want to be able to do is to implement the heap sort in place.

Method 1

One way we can do this by borrowing the idea from insertion sort... in that algorithm we start off by hiving off the first element and saying that it is a sorted array. We then "insert" successive elements into this sorted array so that it stays sorted.

For heap sort we can start off by using the first element as the root of the heap. Successive elements are then placed into the heap in place. Because new nodes are always bottom level leftmost, the element added will always be "end" of the array. Thus, similar to doing an insertion sort...we insert subsequent values into our heap which is being formed at the front end.

One key thing we have to change though is the priority. We actually need to ensure that items with that should go towards the end of the array is considered highest priority for the heap. Thus if we were sorting an array in ascending order (small to big) we should use give larger numbers a higher priority.

The reason is this... when we start to remove from the heap, the node that is first removed is from the bottom level, right most. this is essentially last node in the heap part of the array. However, the value that is removed comes from the root. Since the position that is freed up is at the end of the array, we want to ensure that the value that is removed first is the lowest priority item instead of highest otherwise our list will be sorted backwards.

Thus, to sort an array in ascending order (small to big), our heap should be built so that larger values have higher priority.

Method 2: Heapify

Since our heap is actually implemented with an array, it would be good to have a way to actually create a heap in place starting with an array that isn't a heap and ending with an array that is heap. While it is possible to simply "insert" values into the heap repeatedly, the faster way to perform this task is an algorithm called Heapify.

In the Heapify Algorithm, works like this:

Given a node within the heap where both its left and right children are proper heaps (maintains proper heap order) , do the following:

  • If the node has higher priority than both children, we are done, the entire heap is a proper heap

  • Otherwise

    • swap current node with higher priority child

    • heapify() that subtree

Effectively what is happening is that we already know that both the left and right children are proper heaps. If the current node is higher priority than both children then the entire heap must be proper

However if its not then we do a swap, this means that the current node's value has now gone down into one of the subtrees. This could cause that subtree to no longer be a heap because the root of that subtree now has a value of lower priority than it use to have. so we heapify the subtree.

Building a maxheap in place

This example will start with an array that is not heap. It will perform heapify() on it to form a max heap (bigger values have higher priority). In each of the diagrams below, the argument to heapify() is the index corresponding to the node

Firstly, all leaf nodes are valid heaps. Since they have no subtree, we don't need to deal with those nodes. They are highlighted in green in this next picture. Our algorithm therefore starts at the first non-leaf node from the bottom. This node is at index (n-2)/2 where n is the total number of values in our heap.

The heapify function takes the index of the root of the heapify routine (ie we know that nodes children are heaps, and we are looking at it from that node down.

heapify(3)

First node to consider is the node with 5 (index 3). the left subtree is lower priority and the right subtree doesn't exist so we do nothing.

heapify(2)

Second node to consider is 3 (index 2). both 4 and 7 are bigger than 3 and thus have higher priority. However, 7 is higher priority than 4, so we swap these two values

heapify(1)

‌Next consider the node with 6. 6 is higher priority than 5 but not higher priority than 8, so we swap them. 6 is now in a leaf node and thus we can stop

heapify(0)

Lastly we consider the node 1. both 8 and 7 have higher priority than 1, but 8 is higher priority than 7 and thus we swap 8 and 1 first.

At this point we need to heapify the subtree that now has 1 as it's root (index 1). We swap it with the higher priority child (6 in this case). Notice that because we won't need to do anything to the right subtree with 7 as root, as that subtree was not modified.

After the final swap we have a proper max heap

Once the heap is built, we apply continuously remove the highest priority value from the heap and place it at the back of the array.

Last updated