Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-8853

MapReduce jobs execute exclusively on primaries

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 2.4.1
    • Component/s: MapReduce
    • None
    • Environment:
      CentOS release 6.3 (Final) 2.6.32-279.9.1.el6.x86_64
      Java driver
    • Query Optimization
    • Linux
    • Hide

      All the mongod instances are @ 2.4.0-rc1
      All the mongos instances are @ 2.4.0-rc1 (or at least all the ones we're using to test/demonstrate the MapReduce issue)
      All the config server instances are @ 2.4.0-rc1
      The MongoDB Java driver is @ 2.10.1
      We set the ReadPreference to secondary each and every place possible
      We execute the MapReduce job
      Connect to a primary and a secondary on one of our shards (scatter/gather).
      Via currentOp() we can observe that the job only runs on the primary.

      Show
      All the mongod instances are @ 2.4.0-rc1 All the mongos instances are @ 2.4.0-rc1 (or at least all the ones we're using to test/demonstrate the MapReduce issue) All the config server instances are @ 2.4.0-rc1 The MongoDB Java driver is @ 2.10.1 We set the ReadPreference to secondary each and every place possible We execute the MapReduce job Connect to a primary and a secondary on one of our shards (scatter/gather). Via currentOp() we can observe that the job only runs on the primary.

      We've upgraded our cluster to 2.4.0-rc1 and we're still not seeing the MapReduce jobs execute on the secondaries. We were hoping SERVER-7423 would be included.

      Attached is a sample Java application which can be used to illustrate the issue.

      Current state of affairs in 2.5.0:
      There are a few issues preventing M/R from running on secondaries:

      1. We don't pass query options to map reduce - so the actual command can't even tell if slaveOk bit is on.
      2. We currently write the temporary out put to tmp database and this is not allowed on secondaries since this is a write operation.
      3. Routing and keeping track of which nodes the M/R job is run on. This is because sharded map reduce is done in 2 stages:
        1. 1st stage: Run mapReduce on every shard.
        2. 2nd stage: Run mapReduce.shardedfinish on every shard. The 2nd stage involves aggregating the results from all other shards and running finalReduce on them.

            Assignee:
            backlog-query-optimization [DO NOT USE] Backlog - Query Optimization
            Reporter:
            mnarrell Matt Narrell
            Votes:
            8 Vote for this issue
            Watchers:
            16 Start watching this issue

              Created:
              Updated:
              Resolved: