-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: 2.4.3
-
Component/s: Replica Set
-
None
-
Environment:kubernetes
-
5
TL;DR : mongodb ruby driver tries to communicate with cluster members using their ip address which may become invalid over time, as the cluster topology evolves; after a while, existing clients cannot communicate with the cluster anymore ; caching the host name instead of the volatile ip address would fix the issue.
I am running a mongodb replicaSet as kubernetes statefulset. In this setup, each member of cluster member runs in a "pod" (a docker container); each time I recreate the "pod", its host name (and DNS host) remains stable but the pod ip address changes.
In other words, a cluster of 3 replicas may be formed of:
- pod1 (hostname = mongodb-0.mycluster, ip=10.128.30.39) # <-- arbitrary address in 10.128.x.x
- pod2 (hostname = mongodb-1.mycluster, ip=10.129.31.50) # <-- arbitrary address in 10.129.x.x
- pod3 (hostname = mongodb-2.mycluster, ip=10.130.50.19) # <-- arbitrary address in 10.130.x.x
If I delete the first pod, kubernetes will recreate it and the cluster will look like this:
- pod1 (hostname = mongodb-0.mycluster, ip=10.128.10.79) # <--- same hostname, different ipaddress
- pod2 (hostname = mongodb-1.mycluster, ip=10.129.31.50) # <--- previous hostname & ipaddress
- pod3 (hostname = mongodb-2.mycluster, ip=10.130.50.19) # <--- previous hostname & ipaddress
Let's explore with in Ruby console:
puts ENV['MONGODB_URL'] # => mongodb://myapp-api:ba8c827ae12b33ae1eab@mongodb.backends:27017/db2?replicaSet=rs0 client = Mongo::Client.new ENV['MONGODB_URL'] client.cluster.servers.each do |s| puts "#{s.address} => #{s.address.instance_variable_get(:@resolver).host}" end # mongodb-2.mongodb.backends.svc.cluster.local:27017 => 10.131.0.14 # <-- volatile ip # mongodb-1.mongodb.backends.svc.cluster.local:27017 => 10.130.0.15 # <-- volatile ip # mongodb-0.mongodb.backends.svc.cluster.local:27017 => 10.129.0.19 # <-- volatile ip # now, let's use the client client.collections.count # => 8 # lets' do the same while deleting the pod containing the mongodb master client.collections.count # => Mongo::Error::SocketError: end of file reached # => from (irb):28 # lets' do one more time -- at this point we should have a new master client.collections.count # => Mongo::Error::SocketTimeoutError: execution expired # from (irb):29 # let's enable the debugging output Mongoid.logger.level = 0 # after a bit, these start showing D, [2017-09-04T16:15:07.530841 #1] DEBUG -- : MONGODB | execution expired D, [2017-09-04T16:15:20.537943 #1] DEBUG -- : MONGODB | No route to host - connect(2) for 10.131.0.14:27017 D, [2017-09-04T16:15:33.545593 #1] DEBUG -- : MONGODB | No route to host - connect(2) for 10.131.0.14:27017 D, [2017-09-04T16:15:46.554009 #1] DEBUG -- : MONGODB | No route to host - connect(2) for 10.131.0.14:27017 D, [2017-09-04T16:15:59.561493 #1] DEBUG -- : MONGODB | No route to host - connect(2) for 10.131.0.14:27017
Where 10.131.0.14:27017 is the former ip address of the cluster member that we just created.
My problem is that the Ruby mongo driver caches the ip address of the cluster members upon the first connection, then it never update these addresses. After recreating each pod once, none of the original ip addresses are valid anymore, and the mongodb client is unable to process any query anymore.
I traced it to the `Mongo::Address` class memoizing a `@resolver` instance variable, which embeds a socket created with the host ip instead of the host name. See https://github.com/mongodb/mongo-ruby-driver/blob/master/lib/mongo/address.rb#L188 .
I have the following monkeypatch as workaround, which instanciates sockets passing the stable host name as argument instead of the volatile ip address.
require 'mongo/address' module Mongo class Address def initialize_resolver!(ssl_options) return Unix.new(seed.downcase) if seed.downcase =~ Unix::MATCH family = (host == LOCALHOST) ? ::Socket::AF_INET : ::Socket::AF_UNSPEC error = nil ::Socket.getaddrinfo(host, nil, family, ::Socket::SOCK_STREAM).each do |info| begin #res = FAMILY_MAP[info[4]].new(info[3], port, host) # >>>>>> # the monkeypatch forces mongodb to use hostname for its sockets # instead of caching a volatile ipaddress which will break when # the kubernetes pods are recreated. res = FAMILY_MAP[info[4]].new(host, port, host) # <<<<<< res.socket(connect_timeout, ssl_options).connect!(connect_timeout).close return res rescue IOError, SystemCallError, Error::SocketTimeoutError, Error::SocketError => e error = e end end raise error end end end