Tuesday, August 30, 2016

Cassandra Datastax C# Driver problems - NoHostAvailableException

Please find the updated version of this post here: https://piotr.westfalewicz.com/blog/2016/08/cassandra-datastax-c-sharp-driver-problems---nohostavailableexception/


This post will be about my journey with fixing nasty Cassandra Datastax C# driver problem, which took me a lot more time than expected...
Credits: wikimedia

Once upon a time, I've been fixing following exception:
Cassandra.NoHostAvailableException: None of the hosts tried for query are available (tried: x.x.x.x:9042, x.x.x.x:9042, x.x.x.x:9042)
   at Cassandra.ControlConnection.Connect(Boolean firstTime)
   at Cassandra.Cluster.Connect(String keyspace)
   at Company.Code.CassandraSessionCache.GetSession(String keyspaceName)
   at Company...
The CassandraSessionCache looked like this:
public class CassandraSessionCache
{
    private readonly Cluster _cluster;
    private readonly ConcurrentDictionary<string, Lazy<ISession>> _sessions; //lockless session cache

    public CassandraSessionCache(Cluster cluster)
    {
        _cluster = cluster;
        _sessions = new ConcurrentDictionary<string, Lazy<ISession>>();
    }

    public ISession GetSession(string keyspaceName)
    {
        if (!_sessions.ContainsKey(keyspaceName))
        {
   _sessions.GetOrAdd(keyspaceName, key => new Lazy<ISession>(() => _cluster.Connect(key)));
        }
        var result = _sessions[keyspaceName];
        return result.Value;
    }
}
Nothing fancy, however let me give you an insight about the architecture and circumstances of the error:
  • Cassandra cluster is in Amazon
  • The client is Cassandra Datastax C# Driver 2.6.0, also on server in Amazon
  • Both the client and the Cassandra cluster is the same Amazon Region
  • Amazon Region had no availability issues during given period
  • The solution was working fine for over 1 month! The client process is being restarted ~every week for various reasons
  • The client follows Cassandra Datastax C# Driver Best Practices
  • Heartbeat is turned on, so the connection should be alive, all the times
  • Things get back to normal after the client restart... and gets back to madness few hours later, at higher load. Incredible high number of NoHostAvailableExceptions, like almost any connection to the Cassandra fails.
  • Of course, it works on my machine®

What didn't happen?

There are plenty of questions about Cassandra.NoHostAvailableException on StackOverflow. So let's get through some of them and exclude them:
  • [1][2] - no, because following C# Driver best practices excludes this
  • [3] - no, because we are using default retry strategy from driver version 2.6.0
  • [4][5][6] - no, because we are able to connect to the Cluster at the beginning
  • [7] - no, because we are not misusing batches

Debugging...

Logs on server revealed that the client closed the connection:
INFO  [SharedPool-Worker-3] yyyy-mm-dd 11:04:45,625 Message.java:605 - Unexpected exception during request; channel = [id: 0x9eaf52c5, /x.x.x.x:y :> /x.x.x.x:9042]
java.io.IOException: Error while read(...): Connection reset by peer
  at io.netty.channel.epoll.Native.readAddress(Native Method) ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
  at io.netty.channel.epoll.EpollSocketChannel$EpollSocketUnsafe.doReadBytes(EpollSocketChannel.java:675) ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
  at io.netty.channel.epoll.EpollSocketChannel$EpollSocketUnsafe.epollInReady(EpollSocketChannel.java:714) ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
  at io.netty.channel.epoll.EpollSocketChannel$EpollSocketUnsafe.epollRdHupReady(EpollSocketChannel.java:689) ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
 at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:329) ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
 at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:264) ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
 at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
 at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137) ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
 at java.lang.Thread.run(Thread.java:745) [na:1.7.0_80]
While the message on the client says that there are no hosts available, what is confirmed by debug logs on the client side. Pretty interesting, huh? Being confused, I've decided to give an update from 2.6.3 to 2.7 a try... but that didn't help.
Accoring to yet another issue regarding NoHostsAvailableException on StackOverflow I've started to log whole exception, with serialized errors property. This is what I've logged:
System.Exception: NoHostAvailableException happened. Errors: {
  "x.x.x.x:9042": {
    "NativeErrorCode": 10060,
    "ClassName": "System.Net.Sockets.SocketException",
    "Message": "A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond",
    "Data": null,
    "InnerException": null,
    "HelpURL": null,
    "StackTraceString": "   
  at Cassandra.Connection.<Open>b__9(Task`1 t)
  at System.Threading.Tasks.ContinuationResultTaskFromResultTask`2.InnerInvoke()
  at System.Threading.Tasks.Task.Execute()",
    "RemoteStackTraceString": null,
    "RemoteStackIndex": 0,
    "ExceptionMethod": "8\n<Open>b__9\nCassandra, Version=2.7.0.0, Culture=neutral, PublicKeyToken=10b231fbfc8c4b4d\nCassandra.Connection\nCassandra.AbstractResponse <Open>b__9(System.Threading.Tasks.Task`1[Cassandra.AbstractResponse])",
    "HResult": -2147467259,
    "Source": "Cassandra",
    "WatsonBuckets": null
  },
  "x.x.x.x:9042": {
    "NativeErrorCode": 10060,
    "ClassName": "System.Net.Sockets.SocketException",
    "Message": "A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond",
    "Data": null,
    "InnerException": null,
    "HelpURL": null,
    "StackTraceString": "
  at Cassandra.Connection.<Open>b__9(Task`1 t)
  at System.Threading.Tasks.ContinuationResultTaskFromResultTask`2.InnerInvoke()
  at System.Threading.Tasks.Task.Execute()",
    "RemoteStackTraceString": null,
    "RemoteStackIndex": 0,
    "ExceptionMethod": "8\n<Open>b__9\nCassandra, Version=2.7.0.0, Culture=neutral, PublicKeyToken=10b231fbfc8c4b4d\nCassandra.Connection\nCassandra.AbstractResponse <Open>b__9(System.Threading.Tasks.Task`1[Cassandra.AbstractResponse])",
    "HResult": -2147467259,
    "Source": "Cassandra",
    "WatsonBuckets": null
  }
}
Unfortunately, no interesting data is here.

So what could possibly go wrong?

Can you spot the error? I couldn't. Any guesses? Find the answer in next post.

Thursday, August 18, 2016

Algorithms and data structures - non-academic trees

Please find the updated version of this post here: https://piotr.westfalewicz.com/blog/2016/08/algorithms-and-data-structures---non-academic-trees/

Credits: Wikipedia Tree (data structure)
There are many types of trees which are covered on Computer Science lectures. Those usually include: Binary Search Tree, AVL Tree, B Tree, Splay Tree, Red-Black Tree, Trie Trees, Heap Trees.

Those are indeed very useful and practical trees with lots of applications. However, I've discovered few other trees while brushing up my knowledge about algorithms and data structures. Here they are, the most interesting, yet not so popular trees:
  • BK-Tree - do you want to find misspellings of a word in a dictionary? E.g. given word "dog" and dictionary { "cat", "fog", "dot", "cookie" }, naive approach is to compare the word "dog" to all of the entries in the dictionary. This leads to O(n) time. It can be solved in O(lg n) time, though. Burkhard-Keller tree is used in Apache Lucene, for example. Head to Xenopax's Blog for awesome post about BK-Trees.
  • Merkle Tree - probably you didn't know but that's the name of the tree of commits and blobs in a Git VCS. Another applications known to me personally include: Cassandra (during node repair) and Bitcoin blockchain.
  • Interval Tree - interesting idea of augmenting "normal" (single value) trees with additional data in order to solve windowing queries.
  • Lemon Tree - the most complicated type of tree. Many wondered what it really is, but few actually knew... Find the official statement here.