Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-3276

mongoimport is trimming leading whitespace (including tabs) from every input record

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Minor - P4 Minor - P4
    • 1.9.1
    • Affects Version/s: 1.9.0
    • Component/s: Tools
    • Environment:
      n/a
    • Fully Compatible
    • ALL

      I have been able to confirm that this is likely a bug, however, as we all know one programmers bug is another's feature. That said.... Here are the particulars:

      FILE: tools/import.cpp

      This piece of code... while reading the input file, (skip the JSON part), it trims all of the leading whitespace. However, if this is a TSV then the tab will be gobbled up. I don't think this was the intended behavior.

      292 if (_jsonArray) {
      293 while (buf[0] != '{' && buf[0] != '\0')

      { 294 len++; 295 buf++; 296 }


      297 if (buf[0] == '\0')
      298 break;
      299 }
      300 else {
      301 while (isspace( buf[0] ))

      { 302 len++; 303 buf++; 304 }


      305 if (buf[0] == '\0')
      306 continue;
      307 len += strlen( buf );
      308 }
      309

      http://creativyst.com/Doc/Articles/CSV/CSV01.htm

      I do not every reading a formal spec for CSV but this link is pretty good. In general, however, it's a bug in the design to trim the leading part of the record in this location in the code. The parser should be located as tightly and closely as possible... for the obvious reasons.

      /r

            Assignee:
            spencer@mongodb.com Spencer Brody (Inactive)
            Reporter:
            rbucker richard bucker
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: