Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Done
Priority: Major - P3
Fix Version/s: 2.0.7, 2.2.0-rc0
Affects Version/s: 2.0.1, 2.0.2, 2.0.3, 2.0.4, 2.0.5, 2.0.6
Component/s: Sharding
Labels:
- configsrv
- mongos
Environment:
Any

Operating System:
Linux
Confidence Status:
None
Work Order:
3

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

This bug does not normally affect the mongo system we have set up. However, when AWS lost power to one of our EBS volumes, it became very apparent that we could not start any more mongos processes, so our production system came down.

Basics

While it is not easy to get AWS to lose power to EBS volumes, it is very easy to reproduce this bug using NFS and IPTables. We'll have 1 NFS server, and then 1 NFS client. The client will be running all instances of mongod and mongos. The NFS server will host a single share that the NFS client will use for a single config server.

NFS Server Setup

sudo apt-get install nfs-kernel-server
sudo mkdir /srv/nfs/mongo
sudo vi /etc/exports

# /etc/exports
/srv/nfs/mongo <IP of NFS client>/32(rw,sync,no_subtree_check,no_root_squash)

sudo /etc/init.d/nfs-kernel-server restart

NFS Client Setup

sudo apt-get install nfs-common
sudo mkdir -p /nfs/mongo
sudo vi /etc/fstab

<IP of NFS server>:/srv/nfs/mongo /nfs/mongo nfs4 _netdev,auto 0 0

sudo mount /nfs/mongo

Mongo Setup (on same server as NFS Client)

sudo mkdir /db/a1
sudo mkdir /db/a2
sudo mkdir /db/a3
sudo mkdir /db/b1
sudo mkdir /db/b2
sudo mkdir /db/b3
sudo mkdir /db/c1
sudo ln -s /nfs/mongo/c2 /db/c2
sudo mkdir /db/c3

sudo mkdir /var/run/mongo
sudo mkdir /db/logs

/usr/bin/mongod --configsvr --smallfiles --fork --port 27050 --dbpath /db/c1 --logpath /db/logs/c1.log --logappend --pidfilepath /var/run/mongo/c1.pid --maxConns 1024
/usr/bin/mongod --configsvr --smallfiles --fork --port 27051 --dbpath /db/c2 --logpath /db/logs/c2.log --logappend --pidfilepath /var/run/mongo/c1.pid --maxConns 1024
/usr/bin/mongod --configsvr --smallfiles --fork --port 27052 --dbpath /db/c3 --logpath /db/logs/c3.log --logappend --pidfilepath /var/run/mongo/c1.pid --maxConns 1024

/usr/bin/mongod --shardsvr --smallfiles --fork --port 27150 --dbpath /db/a1 --logpath /db/logs/a1.log --logappend --pidfilepath /var/run/mongo/c1.pid --maxConns 1024 --replSet a
/usr/bin/mongod --shardsvr --smallfiles --fork --port 27151 --dbpath /db/a2 --logpath /db/logs/a2.log --logappend --pidfilepath /var/run/mongo/c1.pid --maxConns 1024 --replSet a
/usr/bin/mongod --shardsvr --smallfiles --fork --port 27152 --dbpath /db/a3 --logpath /db/logs/a3.log --logappend --pidfilepath /var/run/mongo/c1.pid --maxConns 1024 --replSet a

/usr/bin/mongod --shardsvr --smallfiles --fork --port 27250 --dbpath /db/b1 --logpath /db/logs/b1.log --logappend --pidfilepath /var/run/mongo/c1.pid --maxConns 1024 --replSet b
/usr/bin/mongod --shardsvr --smallfiles --fork --port 27251 --dbpath /db/b2 --logpath /db/logs/b2.log --logappend --pidfilepath /var/run/mongo/c1.pid --maxConns 1024 --replSet b
/usr/bin/mongod --shardsvr --smallfiles --fork --port 27252 --dbpath /db/b3 --logpath /db/logs/b3.log --logappend --pidfilepath /var/run/mongo/c1.pid --maxConns 1024 --replSet b

sleep 10

echo "rs.initiate({_id: 'a', members: [{_id: 0, host: 'localhost:27150', priority: 2},{_id: 1, host: 'localhost:27151', priority: 1},{_id: 2, host: 'localhost:27152', priority: 0}]})" | mongo localhost:27150
echo "rs.initiate({_id: 'b', members: [{_id: 0, host: 'localhost:27250', priority: 2},{_id: 1, host: 'localhost:27251', priority: 1},{_id: 2, host: 'localhost:27252', priority: 0}]})" | mongo localhost:27250

sleep 30

{{echo "db.runCommand(

{addshard: 'a/localhost:27150'}

)" | mongo admin}}
{{echo "db.runCommand(

{addshard: 'b/localhost:27250'}

)" | mongo admin}}

In a different terminal (one that can be tied up):

/usr/bin/mongos --configdb localhost:27050,localhost:27051,localhost:27052 --fork --logpath /var/log/mongos.log --logappend --port 27017 --maxConns 1024

Notice that mongos starts normally.

Baseline

Connect, using mongo, to the mongos process. Insert some items. Find some items. Do whatever. Notice it all works as expected.

Kill the storage associated with one of the mongod config servers. On the NFS Server:

sudo iptables -I INPUT -s <IP of NFS client>/32 -j DROP

Connect, reconnect, etc. using the mongos process. Notice it all still works as expected.

Bug Manifestation

Kill the mongos process (Ctrl-C should be fine). After it's down, start it up again using the same command as before.

/usr/bin/mongos --configdb localhost:27050,localhost:27051,localhost:27052 --fork --logpath /var/log/mongos.log --logappend --port 27017 --maxConns 1024

Notice that mongos will hang for a minute, and then die.

Expected Outcome

Mongos, even though it connected successfully to the config server with the downed data store, should timeout on it's operations, and treat the config server as a downed server; this should result in a successful start of mongos.

is related to

SERVER-6313 config server timeouts not used in all places

Closed

related to

SERVER-5064 mongos can't start when one config server is down. only with keyFile options

Closed

Assignee:: Greg Studer (Inactive)
Reporter:: Matthew Barlocker
Participants:: auto, Greg Studer, Matthew Barlocker
Votes:: 0 Vote for this issue
Watchers:: 2 Start watching this issue

Created:: Jun 22 2012 07:03:50 PM UTC
Updated:: Jul 11 2016 06:32:25 PM UTC
Resolved:: Jul 05 2012 07:23:07 PM UTC

Details

Description

Basics

NFS Server Setup

NFS Client Setup

Mongo Setup (on same server as NFS Client)

Baseline

Bug Manifestation

Expected Outcome

Attachments

Issue Links

Activity

People

Dates