Priority Queues using Binary Heaps
Last updated
Last updated
A priority queue is an abstract data type. Like queues and stacks, it is an idea, the underlying storage and access is decoupled from the functionality of a priority queue. A priority queue is similar to a queue in that you use it for ordering data. However, instead of ordering based on when something was added, you order it based on the priority value of the item. An item at the front of the priority queue (and the item that will be removed if an item is to be dequeued) is the item with the highest priority. This is the type of queues you might find in a hospital emergency room. The thing that matters more is the severity of the illness as opposed to who got there first.
If you were to implement a priority queue using a regular list, you would essentially have to maintain a sorted list (sorted by priority). This would mean the following:
insertions would be O(n) for sure as you would need to go through an already sorted list trying to find the right place then doing the insertion.
removal is from the "front" of the queue, so depending on how you do your implementation, this can be potentially O(n) also.
This is not very efficient. If we were to create our priority queue in this manner the run time for both operations would not be very good. If we then base a sort on such a data structure, its run time would be no better than one of the simple sorts. Thus, clearly this isn't how we should implement a priority queue.
The interesting thing about a priority queue is that we really don't care where any value other than the one with the highest priority is. We care about where the highest priority item is and we want to be able to find it and remove it quickly but all other values can be essentially anywhere.
A binary heap is a data structure that can help us achieve this.
Before we begin, the definitions below look involves trees. It may be a good idea to look at those definitions first so that the definitions with respect to heaps make more sense.
Binary Heap - A binary heap is a complete binary tree where the heap order property is always maintained.
Binary Tree - A binary tree is either a) empty (no nodes), or b) contains a root node with two children which are both binary trees.
Complete Binary Tree - A binary tree where there are no missing nodes in all except at the bottom level. At the bottom level the missing nodes must be to the right of all other nodes
These binary trees are complete.
These binary trees are not complete. The first one is missing a node one level higher than leaves, the second is missing further left in the tree than an existing node
Heap Order Property: For each node, the parent of the node must have a higher priority, while its children must have a lower priority. There is no ordering of priority other than this rule. Thus, the highest priority item will be at the root of the tree. Below is a heap where we define the smaller value as having higher priority:
Insertion into a heap must maintain both the complete binary tree structure and the heap order property. To do this what we do is the following.
create a new empty node in the left most open spot at the bottom level of the tree
If value can be placed into node without violating heap order property put it in
otherwise pull the value from parent into the empty node
repeat the previous two steps until the value can be placed
This process effectively creates an empty node starting at the bottom of the tree. The empty node moves up until it is in the correct position and the value can be placed inside the empty node. This process of moving the empty node towards the root is called percolate up
Create a heap by inserting these number in the order given: 10, 6, 20, 5, 16, 17, 13, 2
The highest priority value is always the value that is removed from the binary heap. The way that the Heap is set up, the node with the highest priority is at the root. Finding it is easy but the removal process must ensure that both the complete binary tree structure along with the heap order property is maintained. Therefore, just removing the root would be a bad idea.
In order for the complete binary tree property to be maintained we will be removing the right most node at the bottom level. Note that a complete binary tree with n nodes can only have 1 shape, so the shape is pretty much determined by the fact that removing a value creates a tree with one fewer node.
The empty spot that had been created by the removal of the value at root must be filled and the value that had been in the rightmost node must go back into the heap. We can accomplish this by doing the following:
If the value could be placed into the empty node (remember, this starts at root) without violating the Heap Order Property, put it in and we are done
otherwise move the child with the higher priority up (the empty spot moves down).
Repeat until value is placed
The process of moving the empty spot down the heap is called percolate down
Clearly we want to implement the heap in as simple a manner as possible. Although we can use a link structure to represent each node and their children, this structure is actually not that good for a heap. One reason is that each node needs two extra pointers so for small data the memory needed for the pointers could easily exceed that of the data. Also, we really don't need the linking structure because our tree is a complete binary tree. This makes it very easy for us to use an array to represent our tree.
Idea is this. Store the data in successive array elements. Use the index values to find the child and parent of any node.
Suppose data was stored in element i. The left child of that node is in element 2i+1 The right child of the node is in element 2i+2 The parent of the node is stored in (i-1)/2 (for all but root node which has no parent)
This representation is very convenient because when we add values to the end of the array it is like we are adding a node to the leftmost available spot at the bottom level.
We do not have empty nodes simply because of the definition of the heap (that it is a complete binary tree)
We also eliminate any need for pointers or the overhead of allocating nodes as we go.