Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-14913

mongoimport imports csv incorrectly when in the presence of even number of escaped quotes

    • Type: Icon: Bug Bug
    • Resolution: Won't Fix
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Tools
    • None
    • ALL
    • Hide

      Create a file called bad.csv

      bad.csv
      "first","last"
      "joe","smith"
      "bad","guy\""
      "evil","monster"
      "sam\"","mill"
      

      Now import it

      desktop:MongoDB aje$ mongoimport --type csv -c bad --drop --headerline < bad.csv
      connected to: 127.0.0.1
      2014-08-15T11:09:02.225-0400 dropping: test.bad
      2014-08-15T11:09:02.254-0400 imported 2 objects

      Now let's look at the collection:

      m101:PRIMARY> db.bad.find().pretty()
      {
      	"_id" : ObjectId("53ee228e35d4ea0c46429cac"),
      	"first" : "joe",
      	"last" : "smith"
      }
      {
      	"_id" : ObjectId("53ee228e35d4ea0c46429cad"),
      	"first" : "bad",
      	"last" : "guy\\\"\n",
      	"field2" : ",",
      	"field3" : "\n",
      	"field4" : "",
      	"field5" : "mill"
      }
      m101:PRIMARY> 
      

      There should be four documents, but there are only two.

      Show
      Create a file called bad.csv bad.csv "first" , "last" "joe" , "smith" "bad" , "guy\" " "evil" , "monster" "sam\" "," mill" Now import it desktop:MongoDB aje$ mongoimport --type csv -c bad --drop --headerline < bad.csv connected to: 127.0.0.1 2014-08-15T11:09:02.225-0400 dropping: test.bad 2014-08-15T11:09:02.254-0400 imported 2 objects Now let's look at the collection: m101:PRIMARY> db.bad.find().pretty() { "_id" : ObjectId( "53ee228e35d4ea0c46429cac" ), "first" : "joe" , "last" : "smith" } { "_id" : ObjectId( "53ee228e35d4ea0c46429cad" ), "first" : "bad" , "last" : "guy\\\" \n", "field2" : "," , "field3" : "\n" , "field4" : "", "field5" : "mill" } m101:PRIMARY> There should be four documents, but there are only two.

      When a csv file contains an even number of escaped quotes put in as \", the parser gets confused and reads across line endings, coalescing multiple lines into a single document.

      Wikipedia says that embedded quotes need to be encoded as "", so arguably, the csv file did not conform to Jimmy Wales's view of CSV, but this particular encoding is the default used by mysql, so we probably need it to work, or at least throw an error.

            Assignee:
            matt.kangas Matt Kangas
            Reporter:
            erlichson Andrew Erlichson
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: