Any text characters above 0x7F entered on the command line for mongod.exe, mongos.exe, mongo.exe and the other programs in the suite are not necessarily being handled correctly in Windows. Although we build the Windows versions with UNICODE and _UNICODE defined, the entry point we declare is main() and this gets us text in the 8-bit code page of the invoking command window. We would need to change the entry point to wmain() to get a wide-character UTF-16 string, and this would then require using a wide version of boost::program_options to parse the 16-bit characters. The misbehavior that is seen will depend on the code page of the invoking command window. In US English versions of Windows, you get the DOS-compatible code page 437 if you haven't changed your configuration. In Western European versions of Windows you may get code page 1252 which is the same as ISO Latin 1 and so the same as Unicode for characters up to 0xFF. Beyond these issues, there may be instances where data isn't handled correctly: I found and am fixing a few I found in the Windows Service code. We were getting sign-extension of characters between 0x80 and 0xFF, which turned 0xE1 ("LATIN SMALL LETTER A WITH ACUTE", 'á') into U+FFE1 (displays as "FULLWIDTH POUND SIGN", '£').
This may not be an issue for some users (US-only, or European/UK users using code page 1252) but the issue is likely to pop up repeatedly until we make the code fully Unicode-capable.
- is depended on by
-
SERVER-5333 Issues with non-ASCII characters in filenames and paths in Windows
- Closed
- related to
-
SERVER-7496 Mongo.exe client crashes when username of home directory contains a unicode character
- Closed