Uploaded image for project: 'C# Driver'
  1. C# Driver
  2. CSHARP-2477

MongoDb Driver hangs out for really long time when mongos going down

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 2.7.2
    • Component/s: Command Operations
    • Environment:
      Ubuntu Server 18.04 LTS VM on Azure

      I have sharded cluster with auto-scale on scale sets in Azure. This error seems to happen, when my client writes to Mongo, cluster has 3 mongos, 2 replica-sets. Client writes with high possible throughput using many threads and connections.
      Additional settings:

      settings.MaxConnectionPoolSize = 2000;
      settings.WaitQueueSize = 10000;

      Then one mongos is going down (because scale set decide to shut it down).
      First i gets errors of this kind:

      "[2019-01-10 17:18:56.130 GMT 28 Error 0 Lingaro.OpenIoT.Orleans.Grains.DeviceGrain] !!!!!!!!!! Error during grain execution! Setting previous state
      Exc level 0: MongoDB.Driver.MongoWriteException: A write operation resulted in an error.
        Write results unavailable from mongo-s1-2:27017 due to interrupted at shutdown
         at MongoDB.Driver.MongoCollectionBase`1.ReplaceOneAsync(FilterDefinition`1 filter, TDocument replacement, UpdateOptions options, Func`3 bulkWriteAsync)
         at Lingaro.OpenIoT.Orleans.Grains.DeviceGrain.WriteStateAsync() in C:\src\iotplatformdev\orleans\Lingaro.OpenIoT\Lingaro.OpenIoT.Orleans.Grains\Grains\DeviceGrain.cs:line 108
         at Lingaro.OpenIoT.Orleans.Grains.DeviceGrain.<Process>b__5_0() in C:\src\iotplatformdev\orleans\Lingaro.OpenIoT\Lingaro.OpenIoT.Orleans.Grains\Grains\DeviceGrain.cs:line 92
         at Lingaro.OpenIoT.Orleans.Grains.DeviceGrain.WrapInTimeout[T](String actionName, TimeSpan timeout, Func`1 action) in C:\src\iotplatformdev\orleans\Lingaro.OpenIoT\Lingaro.OpenIoT.Orleans.Grains\Grains\DeviceGrain.cs:line 141
         at Lingaro.OpenIoT.Orleans.Grains.DeviceGrain.Process(EventHubMessage eventHubMessage) in C:\src\iotplatformdev\orleans\Lingaro.OpenIoT\Lingaro.OpenIoT.Orleans.Grains\Grains\DeviceGrain.cs:line 90
      Exc level 1: MongoDB.Driver.MongoBulkWriteException`1[MongoDB.Bson.BsonDocument]: A bulk write operation resulted in one or more errors.
        Write results unavailable from mongo-s1-2:27017 due to interrupted at shutdown
         at MongoDB.Driver.MongoCollectionImpl`1.BulkWriteAsync(IClientSessionHandle session, IEnumerable`1 requests, BulkWriteOptions options, CancellationToken cancellationToken)
         at MongoDB.Driver.MongoCollectionImpl`1.UsingImplicitSessionAsync[TResult](Func`2 funcAsync, CancellationToken cancellationToken)
         at MongoDB.Driver.MongoCollectionBase`1.ReplaceOneAsync(FilterDefinition`1 filter, TDocument replacement, UpdateOptions options, Func`3 bulkWriteAsync)"

      And it seems reasonable as one mongos is shutting down.
      But then after working a while, whole process hang out. After a really long time i gets something like this:

      "[2019-01-10 17:35:06.509 GMT 28Error0Lingaro.OpenIoT.Orleans.Grains.DeviceGrain]!!!!!!!!!! Operation WriteStateAsync took long time to complete! Took 954750.3796 ms.
      [2019-01-10 17:35:06.508 GMT 27Error0Lingaro.OpenIoT.Orleans.Grains.DeviceGrain]!!!!!!!!!! Error during grain execution! Setting previous state
      Exc level 0: MongoDB.Driver.MongoConnectionException: An exception occurred while opening a connection to the server.
         at MongoDB.Driver.Core.Connections.BinaryConnection.OpenHelperAsync(CancellationToken cancellationToken)
         at MongoDB.Driver.Core.Servers.Server.GetChannelAsync(CancellationToken cancellationToken)
         at MongoDB.Driver.Core.Operations.RetryableWriteContext.InitializeAsync(CancellationToken cancellationToken)
         at MongoDB.Driver.Core.Operations.RetryableWriteContext.CreateAsync(IWriteBinding binding, Boolean retryRequested, CancellationToken cancellationToken)
         at MongoDB.Driver.Core.Operations.BulkMixedWriteOperation.ExecuteAsync(IWriteBinding binding, CancellationToken cancellationToken)
         at MongoDB.Driver.OperationExecutor.ExecuteWriteOperationAsync[TResult](IWriteBinding binding, IWriteOperation`1 operation, CancellationToken cancellationToken)
         at MongoDB.Driver.MongoCollectionImpl`1.ExecuteWriteOperationAsync[TResult](IClientSessionHandle session, IWriteOperation`1 operation, CancellationToken cancellationToken)
         at MongoDB.Driver.MongoCollectionImpl`1.BulkWriteAsync(IClientSessionHandle session, IEnumerable`1 requests, BulkWriteOptions options, CancellationToken cancellationToken)
         at MongoDB.Driver.MongoCollectionImpl`1.UsingImplicitSessionAsync[TResult](Func`2 funcAsync, CancellationToken cancellationToken)
         at MongoDB.Driver.MongoCollectionBase`1.ReplaceOneAsync(FilterDefinition`1 filter, TDocument replacement, UpdateOptions options, Func`3 bulkWriteAsync)
         at Lingaro.OpenIoT.Orleans.Grains.DeviceGrain.WriteStateAsync() in C:\src\iotplatformdev\orleans\Lingaro.OpenIoT\Lingaro.OpenIoT.Orleans.Grains\Grains\DeviceGrain.cs:line 108
         at Lingaro.OpenIoT.Orleans.Grains.DeviceGrain.<Process>b__5_0() in C:\src\iotplatformdev\orleans\Lingaro.OpenIoT\Lingaro.OpenIoT.Orleans.Grains\Grains\DeviceGrain.cs:line 92
         at Lingaro.OpenIoT.Orleans.Grains.DeviceGrain.WrapInTimeout[T](String actionName, TimeSpan timeout, Func`1 action) in C:\src\iotplatformdev\orleans\Lingaro.OpenIoT\Lingaro.OpenIoT.Orleans.Grains\Grains\DeviceGrain.cs:line 141
         at Lingaro.OpenIoT.Orleans.Grains.DeviceGrain.Process(EventHubMessage eventHubMessage) in C:\src\iotplatformdev\orleans\Lingaro.OpenIoT\Lingaro.OpenIoT.Orleans.Grains\Grains\DeviceGrain.cs:line 90
      Exc level 1: MongoDB.Driver.MongoConnectionException: An exception occurred while receiving a message from the server.
         at MongoDB.Driver.Core.Connections.BinaryConnection.ReceiveBufferAsync()
         at MongoDB.Driver.Core.Connections.BinaryConnection.ReceiveBufferAsync(Int32 responseTo, CancellationToken cancellationToken)
         at MongoDB.Driver.Core.Connections.BinaryConnection.ReceiveMessageAsync(Int32 responseTo, IMessageEncoderSelector encoderSelector, MessageEncoderSettings messageEncoderSettings, CancellationToken cancellationToken)
         at MongoDB.Driver.Core.WireProtocol.CommandUsingQueryMessageWireProtocol`1.ExecuteAsync(IConnection connection, CancellationToken cancellationToken)
         at MongoDB.Driver.Core.Connections.IsMasterHelper.GetResultAsync(IConnection connection, CommandWireProtocol`1 isMasterProtocol, CancellationToken cancellationToken)
         at MongoDB.Driver.Core.Connections.ConnectionInitializer.InitializeConnectionAsync(IConnection connection, CancellationToken cancellationToken)
         at MongoDB.Driver.Core.Connections.BinaryConnection.OpenHelperAsync(CancellationToken cancellationToken)
      Exc level 2: System.IO.IOException: Unable to read data from the transport connection: Connection timed out.
         at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error)
         at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.GetResult(Int16 token)
         at System.Threading.Tasks.ValueTask`1.ValueTaskSourceAsTask.<>c.<.cctor>b__4_0(Object state)
      — End of stack trace from previous location where exception was thrown —
         at MongoDB.Driver.Core.Misc.StreamExtensionMethods.ReadBytesAsync(Stream stream, Byte[] buffer, Int32 offset, Int32 count, CancellationToken cancellationToken)
         at MongoDB.Driver.Core.Connections.BinaryConnection.ReceiveBufferAsync()
      Exc level 3: System.Net.Sockets.SocketException: Connection timed out"

      This indicate that Writing to mongoDB timeout after 954750.3796 ms!!!

      And i can correlate this with more mongos going down.
      After that long timeout occurred - processing start happen again and writing to mongoDB going with success.
      Sometimes this situation can happen twice or more. (Hanging for really long time). 

            Assignee:
            wan.bachtiar@mongodb.com Wan Bachtiar
            Reporter:
            darthkurak Jakub Dropia
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: