Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Unknown
Fix Version/s: 1.11.7
Affects Version/s: None
Component/s: Connections, Server Selection
Labels:
None

Documentation Changes:
Not Needed
Documentation Changes Summary:

Hide

1. What would you like to communicate to the user about this feature?
2. Would you like the user to see examples of the syntax and/or executable code and its output?
3. Which versions of the driver/connector does this apply to?

Show
1. What would you like to communicate to the user about this feature? 2. Would you like the user to see examples of the syntax and/or executable code and its output? 3. Which versions of the driver/connector does this apply to?

Summary

In my scenario, I have an arbiter from which I collect metrics using the Percona MongoDB exporter (https://github.com/percona/mongodb_exporter). The exporter uses the Go driver to establish a direct connection to the arbiter and execute commands like getDiagnosticData and replSetGetStatus every 10 seconds.

The exporter connects to the arbiter as soon as it starts and reuses the same connection throughout its entire life. When the mongod instance and the MongoDB exporter start at the same time (which is typically the case as they reside in the same Kubernetes pod), the connection becomes permanently broken. The following error is returned by the Go driver on every attempt to execute any command:

Unable to find source-code formatter for language: shell. Available languages are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yaml

server selection error: context canceled, current topology: { Type: Single, Servers: [{ Addr: mongo-arbiter:27017, Type: Unknown, Last error: (NotPrimaryOrSecondary) node is not in primary or recovering state }, ] }

Restarting only the MongoDB exporter or disabling the global connection pool (via the --no-mongodb.global-conn-pool flag) fixes the problem.

This is particularly easy to reproduce when the exporter connects to the soon-to-be arbiter instance before initiating the replica set. In such case, the exporter is initially able to execute commands but starts failing once the replica set is initiated.

MongoDB driver version: v1.10.3 (the issue doesn't reproduce with v1.3.2 used by older versions of the MongoDB exporter)

MongoDB server version: v4.4 or later

Topology: replica set or sharded cluster + arbiter

How to Reproduce

Reproducibility: 100%

1. Create the following prometheus.yml file:

global: 
  scrape_interval: 10s
  scrape_timeout: 9s

scrape_configs: 
  - job_name: mongo-0
    static_configs: 
      - targets: 
          - exporter-0:9216
  - job_name: mongo-1
    static_configs: 
      - targets: 
          - exporter-1:9216
  - job_name: mongo-arbiter
    static_configs: 
      - targets: 
          - exporter-arbiter:9216

2. Create the following docker-compose.yml file:

version: '3.8'

services: 
  mongo-0:
    container_name: mongo-0
    image: mongo:4.4
    networks: 
      - mongo
    entrypoint: ["/usr/bin/mongod", "--bind_ip_all", "--replSet", "rs"]
  mongo-1:
    container_name: mongo-1
    image: mongo:4.4
    networks: 
      - mongo
    entrypoint: ["/usr/bin/mongod", "--bind_ip_all", "--replSet", "rs"]
  mongo-arbiter: 
    container_name: mongo-arbiter
    image: mongo:4.4
    networks: 
      - mongo
    entrypoint: ["/usr/bin/mongod", "--bind_ip_all", "--replSet", "rs"]
  exporter-0:
    container_name: exporter-0
    image: percona/mongodb_exporter:0.37.0
    networks: 
      - mongo
    command: ["--mongodb.uri=mongodb://mongo-0:27017/", "--mongodb.direct-connect", "--collector.diagnosticdata", "--mongodb.global-conn-pool", "--log.level=debug"]
  exporter-1:
    container_name: exporter-1
    image: percona/mongodb_exporter:0.37.0
    networks: 
      - mongo
    command: ["--mongodb.uri=mongodb://mongo-1:27017/", "--mongodb.direct-connect", "--collector.diagnosticdata", "--mongodb.global-conn-pool", "--log.level=debug"]
  exporter-arbiter: 
    container_name: exporter-arbiter
    image: percona/mongodb_exporter:0.37.0
    networks: 
      - mongo
    command: ["--mongodb.uri=mongodb://mongo-arbiter:27017/", "--mongodb.direct-connect", "--collector.diagnosticdata", "--mongodb.global-conn-pool", "--log.level=debug"]
  victoriametrics: 
    container_name: victoriametrics
    image: victoriametrics/victoria-metrics:latest
    networks: 
      - mongo
    ports: 
      - 8428:8428
    volumes: 
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
    command: ["-promscrape.config", "/etc/prometheus/prometheus.yml"]

networks: 
  mongo: 
    driver: bridge

3. Start the containers in the background:

docker compose up -d

4. Monitor the exporter-arbiter container logs for 30-40 seconds:

docker logs -f exporter-arbiter

Note that the getDiagnosticData command result is emitted every 10 seconds.

5. Initiate the replica set:

cat <<EOF | docker exec -i mongo-0 mongo
rs.initiate({
    "_id": "rs",
    "members": [
        {"_id": 0, "host": "mongo-0:27017"},
        {"_id": 1, "host": "mongo-1:27017"},
        {"_id": 2, "host": "mongo-arbiter:27017", "arbiterOnly": true}
    ]
});
EOF

6. Monitor the exporter-arbiter container logs for another 30-40 seconds.

docker logs -f exporter-arbiter

Note that the following entries are now emitted instead of the command results:

time="2023-03-30T18:49:31Z" level=warning msg="cannot load topology labels: cannot check if the instance is an arbiter: server selection error: context canceled, current topology: { Type: Single, Servers: [{ Addr: mongo-arbiter:27017, Type: Unknown, Last error: (NotPrimaryOrSecondary) node is not in primary or recovering state }, ] }"
time="2023-03-30T18:49:31Z" level=error msg="Registry - Cannot get node type to check if this is a mongos : server selection error: context canceled, current topology: { Type: Single, Servers: [{ Addr: mongo-arbiter:27017, Type: Unknown, Last error: (NotPrimaryOrSecondary) node is not in primary or recovering state }, ] }"

7. Restart the exporter-arbiter container:

docker restart exporter-arbiter

8. Monitor the exporter-arbiter container logs for another 30-40 seconds:

docker logs -f exporter-arbiter

Note that the command results are emitted again.

9. Stop and remove the containers:

docker compose down

Additional Background

Use the --no-mongodb.global-conn-pool flag instead of the --mongodb.global-conn-pool flag in the example above to disable the global connection pool. The issue doesn't reproduce when a new connection is used each time.

I forked the v0.11.2 branch of the MongoDB exporter which uses the Go driver v1.3.2 which doesn't have this issue. After upgrading to v1.11.3, the issue started to appear.

is caused by

GODRIVER-2829 Requesting an immediate heartbeat check after an error doesn't work for streaming heartbeats

Backlog

GODRIVER-2828 SDAM error handling doesn't ignore errors based on topology version correctly

Closed

Assignee:: Matt Dale

Reporter:: Aleksander Polak

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: Mar 30 2023 08:38:12 PM UTC

Updated:: Oct 28 2023 11:37:19 AM UTC

Resolved:: Jun 13 2023 05:40:58 PM UTC

Details

Description

Summary

How to Reproduce

Additional Background

Attachments

Issue Links

Activity

People

Dates