Cumulative moving average to be put on Voyager spacecraft? Use Two Stage Rocket!

Cumulative moving average is a simple concept: you have a series of values (observations), you add them up one at a time, and divide by the number of items to arrive at the average. It is an online algorithm: cumulative average may be computed at any point in time taking into account all the values received up to this point.

It get only slightly more complicated if your data series is very long (if your algorithm should be able to work long enough to be put on the Voyager spacecraft). In this case, you can’t simply keep adding up numbers, because the sum will overflow. One possible solution to this problem described in the “Voyager” post is to keep average instead of sum. The downside of this approach is a loss of precision caused by repeated floating point operations with the average. I observed the loss of precision by feeding this averager numbers from 0 to 100 (average 50), repeated 1 million times. At the end of the run, computed average was off by about 1e-11. It is not a huge difference, but it is there. And int-based summing averager predictably overflowed (100 million values averaging 50 add up to 5 billion; int range in Java goes up to a little over 2 billion).

Fortunately, there is a better solution, which is also in keeping with the space theme: use two stage rocket! As first stage, we will use traditional averaging approach, keeping the sum of values. Our second stage will keep the sum of averages. After a predetermined number of observations, we will push our average from stage one to stage two and reset stage 1 to zero. This approach enables exponential increase of the number of values our averager can accept. And if we need to travel to another galactic, we could use 3-stage rocket!

When I fed the two stage averager the same test set of numbers as above (0 to 100, repeated 1 million times), the output was precisely 50. No loss of precision occurred.

Here’s the code:

public class TwoStageAverage {

   private int count2=0, count=0;
   private int sum=0;
   private double sum2=0.0d;
   private static int STAGE1LIMIT=0x10000;

   public void accept(int value) {
      sum += value;
      count++;
      if (count==STAGE1LIMIT) {
         // accept to stage 2
         sum2 += ((double)sum)/STAGE1LIMIT;
         count2++;
         // reset stage 1
         sum=0;
         count=0;
      }
   }

   public double getAverage() {
      return (sum2*STAGE1LIMIT+sum)/(count2*STAGE1LIMIT+count);
   }
}
Advertisements

Comments are closed.

%d bloggers like this: