Heaps: Find the Running Medianblog post image

This tutorial will provide a step-by-step walkthrough in Java for solving the Heaps: Find the Running Median problem for the section Cracking the Code Interview from Hackerrank. This algorithm receives a list of numbers and outputs the median from each running set of numbers. In case you need your memory refreshed, the median is the value separating the lower half from the top half in an order set or numbers.

Achando a mediana by Wikimedia Commons / CC BY

As hinted in the problem statement, we'll solve this using heaps and more specifically priority queues. A heap is defined by wikipedia as

heap is a specialized tree-based data structure that satisfies the heap property: if P is a parent node of C, then the key (the value) of P is either greater than or equal to (in a max heap) or less than or equal to (in a min heap) the key of C.[1] The node at the "top" of the heap (with no parents) is called the root node.

Heap mat entspriechendem Tableau dozou by Wikimedia Commons / CC BY

Wikipedia continues to explain the relation between a heap and priority queues

The heap is one maximally efficient implementation of an abstract data type called a priority queue, and in fact priority queues are often referred to as "heaps", regardless of how they may be implemented.

A priority queue in Java is described in the Oracle Docs

An unbounded priority queue based on a priority heap. The elements of the priority queue are ordered according to their natural ordering, or by a Comparator provided at queue construction time, depending on which constructor is used. A priority queue does not permit null elements. A priority queue relying on natural ordering also does not permit insertion of non-comparable objects (doing so may result in ClassCastException).

The head of this queue is the least element with respect to the specified ordering. If multiple elements are tied for least value, the head is one of those elements -- ties are broken arbitrarily. The queue retrieval operations poll, remove, peek, and element access the element at the head of the queue.

Essentially the queue maintains the least value at the head of the queue but the rest of the elements are ordered naturally but not always necessarily in increasing or decreasing order.

The problem from Hackerrank is described as follows:

Problem Description

The median of a dataset of integers is the midpoint value of the dataset for which an equal number of integers are less than and greater than the value. To find the median, you must first sort your dataset of integers in non-decreasing order, then:

  • If your dataset contains an odd number of elements, the median is the middle element of the sorted sample. In the sorted dataset {1, 2, 3}, 2 is the median.
  • If your dataset contains an even number of elements, the median is the average of the two middle elements of the sorted sample. In the sorted dataset {1, 2, 3, 4}, (2+3)/2 = 2.5 is the median.

Given an input stream of n integers, you must perform the following task for each ith integer:

  1. Add the ith integer to a running list of integers.
  2. Find the median of the updated list (i.e., for the first element through the ith element).
  3. Print the list's updated median on a new line. The printed value must be a double-precision number scaled to 1 decimal place (i.e., 12.3 format).

Input Format

The first line contains a single integer, n, denoting the number of integers in the data stream. Each line i of the n subsequent lines contains an integer, ai, to be added to your list.

Constraints

  • 1 ≤ n ≤ 105
  • 1 ≤ ai ≤ 105

Output Format

After each new integer is added to the list, print the list's updated median on a new line as a single double-precision number scaled to 1 decimal place (i.e., 12.3 format).

Sample Input

6
12
4
5
3
8
7

Sample Output

12.0
8.0
5.0
4.5
5.0
6.0

Explanation

There are n = 6 integers, so we must print the new median on a new line as each integer is added to the list:

  1. list = {12}, median = 12.0
  2. list = {12, 4} → {4, 12}, median = (12 + 4)/2 = 8.0
  3. list = {12, 4, 5} → {4, 5, 12}, median = 5.0
  4. list = {12, 4, 5, 3} → {3, 4, 5, 12}, median = (4 + 5)/2 = 4.5
  5. list = {12, 4, 5, 3, 8} → {3, 4, 5, 8, 12}, median = 5.0
  6. list = {12, 4, 5, 3, 8, 7} → {3, 4, 5, 7, 8, 12}, median = (5 +7)/2 = 6.0

Solution

To start, let's create two heaps. One will be our max heap which contains all numbers smaller than the median. I like to think of this as the left side where the median is the middle. This is a max-heap and we only care about the largest number. The min heap (or right side) will contain all numbers greater than the median. This is a min-heap and we only care about smallest number.

I work primarily in JavaScript but I find Java significantly better for algorithms such as these because of the provided data structures. We will be using priority queues which are not provided in JS natively.

The basic idea is to use the two heaps in the following manner:

1. Add a number. We need to decide to which heap.

2. Balance the heaps. This ensures the heaps are the same size or one heap has one integer more.

3. Get the median. This will calculate the median from the balanced heaps, taking into account the amount of integers (odd versus even).

For the left side, we want the greatest value at the head of queue so we pass a custom comparator so that reverses the order. For the right heap we define the priority queue normally with the smallest number at the head.

We also need to print the medians and use the methods based on the three steps above. We create a new scanner system then create a loop based on the number of integers. The first value based as input defines the total number of integers. Then for each integer, add - balance - print the median.

public static void main(String[] args) {
  //left Heap
  PriorityQueue<Integer> max = new PriorityQueue<Integer>(Collections.reverseOrder());
  //right Heap
  PriorityQueue<Integer> min = new PriorityQueue<Integer>();

  Scanner in = new Scanner(System.in);
  int n = in.nextInt();
  int[] a = new int[n];
  for(int a_i=0; a_i < n; a_i++){
    a[a_i] = in.nextInt();
    addNumber(a[a_i], max, min);
    balance(max, min);
    System.out.println(getMedian(max, min));
  }
}

Now that we have scanner system and method usage, we need to actually write the methods. If our number is the very first number, we simply add it to the max heap and move on. We only have one number and the median will simply be that number.

However, for the rest of the numbers we need to check how it compares to the left stack. If it is less than the max value, we add it to the max heap. Otherwise we add it right heap with the larger values.

public static void addNumber(int number, PriorityQueue<Integer> max, PriorityQueue<Integer> min) {
  if (max.size() == 0 || number < max.peek()) {
    max.add(number);
  } else {
    min.add(number);
  }
}

In my opinion, this next step key to this algorithm. We could potentially add numbers to left and right heaps all day and ensure that the max value is always less than min from the other heap. However, we are looking for a median and need to ensure that both heaps have the same amount of integers or only a difference of one (if odd).

In this balancing, when determining the difference in size we need make sure that the size is two numbers bigger. One extra value is expected for an odd total. If the left heap is bigger than the right heap, take and remove the largest value from the heap and move it to the right side. Use the same process to check the right heap and remove the min value if necessary.

The priority queue will ensure that the next largest or smallest value is available for reading and removal.

public static void balance(PriorityQueue<Integer> max, PriorityQueue<Integer> min) {
  if (max.size() - min.size() >= 2) {
    min.add(max.poll());
  } else if (min.size() - max.size() >= 2) {
    max.add(min.poll());
  }
}

Now that our heaps are balanced we can easily retrieve the median. If the medians are the same size, then the media will not be one our actual numbers but rather the average of the two most centric values.

For an odd amount of numbers, the median always is the most centric value. We simply determine which heap has the additional value and return the respective max or min value from that heap. Now we have returned the running median!

public static double getMedian(PriorityQueue<Integer> max, PriorityQueue<Integer> min) {
  if (max.size() == min.size()) {
    return ((double) max.peek() + min.peek())/2;
  }
  if (max.size() > min.size()) {
    return max.peek();
  } else {
    return min.peek();
  }
}

Here is the complete solution.

import java.util.*;
import java.text.*;
import java.math.*;
import java.util.regex.*;

public class Solution {

  public static void main(String[] args) {
    //left Heap
    PriorityQueue<Integer> max = new PriorityQueue<Integer>(Collections.reverseOrder());
    //right Heap
    PriorityQueue<Integer> min = new PriorityQueue<Integer>();

    Scanner in = new Scanner(System.in);
    int n = in.nextInt();
    int[] a = new int[n];
    for(int a_i=0; a_i < n; a_i++){
      a[a_i] = in.nextInt();
      addNumber(a[a_i], max, min);
      balance(max, min);
      System.out.println(getMedian(max, min));
    }
  }

  public static void addNumber(int number, PriorityQueue<Integer> max, PriorityQueue<Integer> min) {
    if (max.size() == 0 || number < max.peek()) {
      max.add(number);
    } else {
      min.add(number);
    }
  }

  public static void balance(PriorityQueue<Integer> max, PriorityQueue<Integer> min) {
    if (max.size() - min.size() >= 2) {
      min.add(max.poll());
    } else if (min.size() - max.size() >= 2) {
      max.add(min.poll());
    }
  }

  public static double getMedian(PriorityQueue<Integer> max, PriorityQueue<Integer> min) {
    if (max.size() == min.size()) {
      return ((double) max.peek() + min.peek())/2;
    }
    if (max.size() > min.size()) {
      return max.peek();
    } else {
      return min.peek();
    }
  }
}