Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-53024

Evaluate simdjson for reading JSON files in MQL queries

    • Type: Icon: Task Task
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Querying
    • None
    • Query Execution

      Atlas DataLake currently uses external data access agents (written in go) that parse data in various formats, convert to BSON and pass to the query processing process (written in cpp) over STDIN. For performance reasons, we are considering implementing parsing directly in cpp for the most common formats.

      JSON is one of the most popular used by our customers. At the moment, we use an external parser based on xdg-go/jibby. The point of this investigation is to measure performance of parsing such files with simdjson.

      We will model scanning files directly in the query processor with a new input MQL stage:

      {$collection: {path: <local path>, format: <format>}}

      we only consider format: 'json'.

            Assignee:
            backlog-query-execution [DO NOT USE] Backlog - Query Execution
            Reporter:
            pawel.terlecki@mongodb.com Pawel Terlecki (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated: