-
Type: Task
-
Resolution: Done
-
Priority: Minor - P4
-
Affects Version/s: 1.3, 1.4, 1.4.8
-
Component/s: None
GridStore.prototype.open ensures an index on the files collection with the filename field, as well as on the chunks collection with files_id and n. Setting aside the obviousness of performance benefits, these indices are not required. I am curious as to why they are embedded in the driver.
GridFS Index documentation only mentions indexing with regards to the chunks collection. It does recommend to consult the specific driver however.
When investigating how these came into existence, the answer is readily available in the commit history and is mentioned in the HISTORY/change log. These resources mention a ticket #649, but I have been unable to locate this issue, which is why I'm inquiring here about the rationale behind the enhancement. It appears to have originated in version 1.1.4, but JIRA only goes back to 1.3 and there isn't an "issue" section I can find on github for the project.
The specific source in question is as follows:
var collection = self.collection(); // Put index on filename collection.ensureIndex([['filename', 1]], writeConcern, function(err, index) { if(err) return callback(err); // Get chunk collection var chunkCollection = self.chunkCollection(); // Ensure index on chunk collection chunkCollection.ensureIndex([['files_id', 1], ['n', 1]], writeConcern, function(err, index) { if(err) return callback(err); _open(self, writeConcern, callback); }); });
The immediate callbacks upon error with ensureIndex don't render this a passive implementation. The operation could fail with MongoDB errors IndexOptionsConflict (85) or IndexKeySpecsConflict (86) - which are unrelated to the legitimacy of the open method - if either of the indices have been tailored differently, e.g. with a unique index on filename rather than the non-unique one being ensured by the driver.
My questions are:
- What is the rationale behind this embedded index assurance? Is this simply included to guarantee that the developer doesn't forget this critical step?
- Shouldn't collection indexing be up to the developer/DBA/etc.?
- Should this have been implemented without the control flow (explicit callback on error, flow deviation), or should the ensureIndex invocations be removed completely?
If a change will be considered, I can create a new improvement ticket, or simply update this one.