Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-8133

mongoimport TSV,CSV and JSON always generate duplicate entries for all but the first record when importing

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 2.2.2
    • Component/s: Tools
    • Environment:
      verified on OSX 10.8.2, Centos 5.5 and Amazon Image from 10-Gen upgraded mongodb 2.2.2
    • Fully Compatible
    • ALL
    • Hide

      1) create a file huh.tsv (see attachment)

      2) execute the following load...

      $ mongoimport -d local --type tsv --headerline \
      --stopOnError --drop -c huh -file ./huh.tsv -v
      Thu Jan 10 01:16:25 creating new connection to:127.0.0.1:27017
      Thu Jan 10 01:16:25 BackgroundJob starting: ConnectBG
      Thu Jan 10 01:16:25 connected connection!
      connected to: 127.0.0.1
      Thu Jan 10 01:16:25 ns: local.huh
      Thu Jan 10 01:16:25 dropping: local.huh
      Thu Jan 10 01:16:25 filesize: 13
      Thu Jan 10 01:16:25 got line:myId
      Thu Jan 10 01:16:25 got line:1
      Thu Jan 10 01:16:25 got line:2
      Thu Jan 10 01:16:25 got line:3
      Thu Jan 10 01:16:25 got line:4
      Thu Jan 10 01:16:25 imported 4 objects

      3) mongo local should return 5 rows but we only get the following..

      $ mongo local
      MongoDB shell version: 2.2.2
      connecting to: local
      > db.huh.find()

      { "myId" : 1 }

      >

      I have attached the log file with verbose logging turned on below. We only get the dups messages with verbose turned on. I suspect you might also see this if you tried to use the --dbpath to directly access the files. I have seen other bugs mention that using the tool that way produces more output then in client/server mode.

      Show
      1) create a file huh.tsv (see attachment) 2) execute the following load... $ mongoimport -d local --type tsv --headerline \ --stopOnError --drop -c huh -file ./huh.tsv -v Thu Jan 10 01:16:25 creating new connection to:127.0.0.1:27017 Thu Jan 10 01:16:25 BackgroundJob starting: ConnectBG Thu Jan 10 01:16:25 connected connection! connected to: 127.0.0.1 Thu Jan 10 01:16:25 ns: local.huh Thu Jan 10 01:16:25 dropping: local.huh Thu Jan 10 01:16:25 filesize: 13 Thu Jan 10 01:16:25 got line:myId Thu Jan 10 01:16:25 got line:1 Thu Jan 10 01:16:25 got line:2 Thu Jan 10 01:16:25 got line:3 Thu Jan 10 01:16:25 got line:4 Thu Jan 10 01:16:25 imported 4 objects 3) mongo local should return 5 rows but we only get the following.. $ mongo local MongoDB shell version: 2.2.2 connecting to: local > db.huh.find() { "myId" : 1 } > I have attached the log file with verbose logging turned on below. We only get the dups messages with verbose turned on. I suspect you might also see this if you tried to use the --dbpath to directly access the files. I have seen other bugs mention that using the tool that way produces more output then in client/server mode.

      When importing a simple file with 5 unique rows in it into a new container using mongoimport (I will give examples below into a new container). The import tool says all went well, but only the first row is is saved. I was able to produce this behavior for TSV, CSV and JSON files.

      Staring up the mongodb amazon instance from the amazon market place and upgrading to mongodb 2.2.2 via yum would be the surest way to duplicate what I was seeing. To be specific I was using the public Amazon Instance ID: ami-62da7c0b as a starting point.

        1. huh.csv
          0.0 kB
        2. huh.json
          0.0 kB
        3. huh.tsv
          0.0 kB
        4. mongo-error.log
          3 kB

            Assignee:
            Unassigned Unassigned
            Reporter:
            jeffwrule Jeff Rule
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: