Sunday, March 20, 2016

Little-known, useful, charming and beautiful algorithms - part 2

Please find the updated version of this post here: https://piotr.westfalewicz.com/blog/2016/03/little-known-useful-charming-and-beautiful-algorithms---part-2/

Image source: https://goo.gl/EGDuX7

This is the second post about charming algorithms, the first post is here: Little-known, useful, charming and beautiful algorithms - part 1

.NET Timers

How many timers there are in the .NET framework? It's not 2 or 3. It's 5:

  1. System.Timers.Timer
  2. System.Threading.Timer
  3. System.Windows.Forms.Timer
  4. System.Web.UI.Timer
  5. System.Windows.Threading.DispatcherTimer

Here is the comparison of 3 most important of them:
Comparing the Timer Classes in the .NET Framework Class Library.

Which timer would you use for implementing Speculative query execution (which, BTW. it is a nice idea too)? The answer should be: none of them.

.NET timers: internals&assumptions

As you can read in the .NET source code of the System.Threading.Timer, each AppDomain in .NET has one single native timer, supplied by the VM, to fire all managed timers in the AppDomain. The performance considerations are:
We assume that timers are created and destroyed frequently, but rarely actually fire. There are roughly two types of timer:
  - timeouts for operations.  These are created and destroyed very frequently, but almost never fire, because the whole point is that the timer only fires if something has gone wrong.
  - scheduled background tasks.  These typically do fire, but they usually have quite long durations. So the impact of spending a few extra cycles to fire these is negligible.
Because of this, we want to choose a data structure with very fast insert and delete times, but we can live with linear traversal times when firing timers.
The data structure we've chosen is an unordered doubly-linked list of active timers.  This gives O(1) insertion and removal, and O(N) traversal when finding expired timers.
Because of the nature of speculative query execution, we actually care about finding expired timers, and O(N) time is not acceptable. Imagine 10000 read requests per second per one single node to the Cassandra cluster in your web server (which isn't really a big deal, rather normal situation in high traffic scenarios). If you use speculative query execution, you get 10000 timers to manage. Stop, start and expiry processing operations aren't the big deal here, but the per-tick bookkeeping is. Fortunately, there is a better structure for keeping timers, than doubly-linked list.

Hashed and Hierarchical Timing Wheels

Adrian Colyer described all kinds of different timer structures in his blog post: Hashed and Hierarchical Timing Wheels: Data Structures for the Efficient Implementation of a Timer Facility. Go ahead and read it, as he done really good job explaining academic paper.

Real world applications

I've found two real world applications for those structures:
If you are looking for the C# implementation, the one and only implementation of this (which I've found) is here: HashedWheelTimer.cs (many thanks to Cassandra Datastax C# Driver team).