Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-1056

utf8 error causes Segmentation fault while deleting a Map/Reduce temporary collection

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Critical - P2 Critical - P2
    • 1.5.1
    • Affects Version/s: 1.4.1
    • Component/s: Stability
    • None
    • Environment:
      fedora 9 x86_64

      Replica-pair setup: production-shard1-002 and production-shard1-002.silentale.net
      Before the first segmentation fault, shard1-002 is the master.

      I launch a Map/Reduce command with the Ruby driver. The command fails, I receive this exception: #<Mongo::OperationFailure: map-reduce failed: assertion: invalid utf8>.
      Then I launch collection.count, I get a proper result.
      And finally, I launch the same Map/Reduce command, but with less data (thanks to a query), and mongod crashes. I know that this last Map/Reduce command behaves correctly when launched on a freshly started server.

      Tue Apr 27 14:05:25 CMD: drop random_db_name.tmp.mr.mapreduce_1272377125_2
      Tue Apr 27 14:05:25 CMD: drop random_db_name.tmp.mr.mapreduce_1272377125_2_inc
      Tue Apr 27 14:05:25 Got signal: 11 (Segmentation fault).
      Tue Apr 27 14:05:25 Backtrace:
      0x6a8309 0x33042322a0 0x7400f9 0x7a02c3 0x79dbbb 0x71388b 0x713b5a 0x714c4f 0x714bdc 0x714ac5 0x54d691 0x54f433 0x649bdc 0x67e441 0x67f347 0x55662f 0x55a6cc 0x5f310a 0x5f8137 0x6a9894

      The sad part is that I did not notice that the master crashed (auto-reconnect), and I crashed the second pair right after

      I attached the logs of the two servers, and the Map/Reduce code in a Ruby file.

      I can reproduce the problem on my laptop (MongoDB 1.4.1 64bits OSX) with the same collection, and a single mongod instance. But it does not raise a segmentation fault, mongod just gently kills itself.

      Tue Apr 27 18:19:09 connection accepted from 127.0.0.1:60353 #1
      Tue Apr 27 18:19:09 connection accepted from 127.0.0.1:60354 #2
      Tue Apr 27 18:19:09 CMD: drop random_db_name.tmp.mr.mapreduce_1272385149_1
      Tue Apr 27 18:19:09 CMD: drop random_db_name.tmp.mr.mapreduce_1272385149_1_inc
      decode failed. probably invalid utf-8 string ["c?cilegigi63"@orange.fr]
      why: TypeError: malformed UTF-8 character sequence at offset 2
      Tue Apr 27 18:19:10 mr failed, removing collection
      Tue Apr 27 18:19:10 CMD: drop random_db_name.tmp.mr.mapreduce_1272385149_1
      Tue Apr 27 18:19:10 CMD: drop random_db_name.tmp.mr.mapreduce_1272385149_1_inc
      Tue Apr 27 18:19:10 query random_db_name.$cmd ntoreturn:1 command reslen:89 641ms
      Tue Apr 27 18:19:15 CMD: drop random_db_name.tmp.mr.mapreduce_1272385155_2
      Tue Apr 27 18:19:15 CMD: drop random_db_name.tmp.mr.mapreduce_1272385155_2_inc
      Assertion failure: !oldfp->dormantNext, at jsinterp.c:1620
      Tue Apr 27 18:19:15 Got signal: 6 (Abort trap).
      Tue Apr 27 18:19:15 Backtrace:
      0x10021983c 0x7fff800693fa 0x1f6 0x10021a1b5 0x10021f21d 0x10021f56d 0x10022066f 0x100000b74
      0 mongod 0x000000000021983c _ZN5mongo10abruptQuitEi + 332
      1 libSystem.B.dylib 0x00000000800693fa _sigtramp + 26
      2 ??? 0x00000000000001f6 0x0 + 502
      3 mongod 0x000000000021a1b5 _ZN5mongo6listenEi + 597
      4 mongod 0x000000000021f21d _ZN5mongo14_initAndListenEiPKc + 1693
      5 mongod 0x000000000021f56d _ZN5mongo13initAndListenEiPKc + 29
      6 mongod 0x000000000022066f main + 3583
      7 mongod 0x0000000000000b74 start + 52
      Tue Apr 27 18:19:15 dbexit:
      Tue Apr 27 18:19:15 shutdown: going to close listening sockets...
      Tue Apr 27 18:19:15 going to close listening socket: 7
      Tue Apr 27 18:19:15 Listener on port 28017 aborted.
      Tue Apr 27 18:19:15 going to close listening socket: 8
      Tue Apr 27 18:19:15 shutdown: going to flush oplog...
      Tue Apr 27 18:19:15 shutdown: going to close sockets...
      Tue Apr 27 18:19:15 shutdown: waiting for fs preallocator...
      Tue Apr 27 18:19:15 MessagingPort recv() errno:9 Bad file descriptor 127.0.0.1:60353
      Tue Apr 27 18:19:15 shutdown: closing all files...
      Tue Apr 27 18:19:15 end connection 127.0.0.1:60353
      Tue Apr 27 18:19:15 closeAllFiles() finished
      Tue Apr 27 18:19:15 shutdown: removing fs lock...
      Tue Apr 27 18:19:15 dbexit: really exiting now

      So I suppose that this problems appears as soon as a previous map/reduce command fails.

        1. mr_function.rb
          0.8 kB
        2. shard1-001.txt
          28 kB
        3. shard1-002.txt
          5 kB

            Assignee:
            eliot Eliot Horowitz (Inactive)
            Reporter:
            nicolas_ Nicolas Fouché
            Votes:
            3 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: