Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-85324

Change representation of empty arrays in cell blocks

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Query Execution
    • ALL
    • Hide

      Create a timeseries collection

      db.createCollection( "nT", { timeseries: { timeField: "timestamp", metaField: "metadata" } })
      

      Insert a few documents

      db.nT.insertMany([
      { timestamp: ISODate("2021-05-18T00:00:00.000Z"), _id: 'a_null', a: null },
      { timestamp: ISODate("2021-05-18T00:00:01.000Z"), _id: 'a_object', a: {} },
      { timestamp: ISODate("2021-05-18T00:00:02.000Z"), _id: 'a_empty_array', a: [] },
      { timestamp: ISODate("2021-05-18T00:00:03.000Z"), _id: 'a_nulls_array', a: [null, null] }
      ])
      

      Count the number of documents that have `a: {$eq: null} `

      db.runCommand({count: "nT", query: { a: {$eq: null} }})
      

      In master the above query returns 2 (a:null and a:[null, null]) while when we run it with block TypeMatch (PR with the code) it returns 3 (a:null , a:[null, null] and a: []).

      SBE Plan with block TypeMatch

      slotBasedPlan: {
              slots: '$$RESULT=s11 env: { s6 = Nothing (nothing) }',
              stages: '[4] project [s11 = makeBsonObj(MakeObjSpec([count], [_id, count], Closed, RetNothing), null, true, Nothing, s8)] \n' +
                '[3] project [s10 = null] \n' +
                '[3] group [] [s8 = sum(1)] spillSlots[s9] mergingExprs[sum(s9)] \n' +
                '[2] block_to_row blocks[s3] row[s7] s5 \n' +
                '[2] filter {!(valueBlockNone(s5, true))} \n' +
                '[2] project [s5 = \n' +
                '    let [\n' +
                '        l101.0 = cellBlockGetFlatValuesBlock(s4) \n' +
                '    ] \n' +
                '    in cellFoldValues_F(valueBlockFillEmpty(valueBlockEqScalar(\n' +
                '        let [\n' +
                '            l102.0 = valueBlockFillEmpty(valueBlockTypeMatch(l101.0, 1088), true) \n' +
                '        ] \n' +
                '        in valueBlockCombine(valueBlockNewFill(\n' +
                '            if valueBlockNone(l102.0, true) \n' +
                '            then Nothing \n' +
                '            else null \n' +
                '       , valueBlockSize(l102.0)), move(l101.0), l102.0) \n' +
                '   , null), false), s4) \n' +
                '] \n' +
                '[2] ts_bucket_to_cellblock s1 pathReqs[s3 = ProjectPath(Get(a)/Id), s4 = FilterPath(Get(a)/Traverse/Id)] \n' +
                '[1] scan s1 s2 none none none none none none lowPriority [] @"3fe22bd2-503a-4920-a760-5cf69eecbd00" true false '
      }
      

      SBE Plan with scalar TypeMatch

      slotBasedPlan: {
              slots: '$$RESULT=s10 env: { s6 = Nothing (nothing) }',
              stages: '[4] project [s10 = makeBsonObj(MakeObjSpec([count], [_id, count], Closed, RetNothing), null, true, Nothing, s7)] \n' +
                '[3] project [s9 = null] \n' +
                '[3] group [] [s7 = sum(1)] spillSlots[s8] mergingExprs[sum(s8)] \n' +
                '[2] filter {traverseF(s5, lambda(l101.0) { ((\n' +
                '    if (typeMatch(l101.0, 1088) ?: true) \n' +
                '    then null \n' +
                '    else move(l101.0) \n' +
                '== null) ?: false) }, false)} \n' +
                '[2] block_to_row blocks[s3] row[s5] \n' +
                '[2] ts_bucket_to_cellblock s1 pathReqs[s3 = ProjectPath(Get(a)/Id), s4 = FilterPath(Get(a)/Traverse/Id)] \n' +
                '[1] scan s1 s2 none none none none none none lowPriority [] @"3fe22bd2-503a-4920-a760-5cf69eecbd00" true false '
      }
      

      Query used to get the plans : `db.runCommand({explain: {count: "nT", query: { a: {$eq: null} }}})`

      Show
      Create a timeseries collection db.createCollection( "nT" , { timeseries: { timeField: "timestamp" , metaField: "metadata" } }) Insert a few documents db.nT.insertMany([ { timestamp: ISODate( "2021-05-18T00:00:00.000Z" ), _id: 'a_null' , a: null }, { timestamp: ISODate( "2021-05-18T00:00:01.000Z" ), _id: 'a_object' , a: {} }, { timestamp: ISODate( "2021-05-18T00:00:02.000Z" ), _id: 'a_empty_array' , a: [] }, { timestamp: ISODate( "2021-05-18T00:00:03.000Z" ), _id: 'a_nulls_array' , a: [ null , null ] } ]) Count the number of documents that have `a: {$eq: null} ` db.runCommand({count: "nT" , query: { a: {$eq: null } }}) In master the above query returns 2 (a:null and a: [null, null] ) while when we run it with block TypeMatch ( PR with the code ) it returns 3 (a:null , a: [null, null] and a: []). SBE Plan with block TypeMatch slotBasedPlan: { slots: '$$RESULT=s11 env: { s6 = Nothing (nothing) }' , stages: '[4] project [s11 = makeBsonObj(MakeObjSpec([count], [_id, count], Closed, RetNothing), null , true , Nothing, s8)] \n' + '[3] project [s10 = null ] \n' + '[3] group [] [s8 = sum(1)] spillSlots[s9] mergingExprs[sum(s9)] \n' + '[2] block_to_row blocks[s3] row[s7] s5 \n' + '[2] filter {!(valueBlockNone(s5, true ))} \n' + '[2] project [s5 = \n' + ' let [\n' + ' l101.0 = cellBlockGetFlatValuesBlock(s4) \n' + ' ] \n' + ' in cellFoldValues_F(valueBlockFillEmpty(valueBlockEqScalar(\n' + ' let [\n' + ' l102.0 = valueBlockFillEmpty(valueBlockTypeMatch(l101.0, 1088), true ) \n' + ' ] \n' + ' in valueBlockCombine(valueBlockNewFill(\n' + ' if valueBlockNone(l102.0, true ) \n' + ' then Nothing \n' + ' else null \n' + ' , valueBlockSize(l102.0)), move(l101.0), l102.0) \n' + ' , null ), false ), s4) \n' + '] \n' + '[2] ts_bucket_to_cellblock s1 pathReqs[s3 = ProjectPath(Get(a)/Id), s4 = FilterPath(Get(a)/Traverse/Id)] \n' + '[1] scan s1 s2 none none none none none none lowPriority [] @ "3fe22bd2-503a-4920-a760-5cf69eecbd00" true false ' } SBE Plan with scalar TypeMatch slotBasedPlan: { slots: '$$RESULT=s10 env: { s6 = Nothing (nothing) }' , stages: '[4] project [s10 = makeBsonObj(MakeObjSpec([count], [_id, count], Closed, RetNothing), null , true , Nothing, s7)] \n' + '[3] project [s9 = null ] \n' + '[3] group [] [s7 = sum(1)] spillSlots[s8] mergingExprs[sum(s8)] \n' + '[2] filter {traverseF(s5, lambda(l101.0) { ((\n' + ' if (typeMatch(l101.0, 1088) ?: true ) \n' + ' then null \n' + ' else move(l101.0) \n' + '== null ) ?: false ) }, false )} \n' + '[2] block_to_row blocks[s3] row[s5] \n' + '[2] ts_bucket_to_cellblock s1 pathReqs[s3 = ProjectPath(Get(a)/Id), s4 = FilterPath(Get(a)/Traverse/Id)] \n' + '[1] scan s1 s2 none none none none none none lowPriority [] @ "3fe22bd2-503a-4920-a760-5cf69eecbd00" true false ' } Query used to get the plans : `db.runCommand({explain: {count: "nT", query: { a: {$eq: null} }}})`

      In block processing, an array is flattened and represented as a CellBlock (walk array code). When the array is empty since there are no other values we just add Nothing in the block representation.

      In typeMatch, when we find an element with tag Nothing we just return Nothing and it is the caller's responsibility to decide how to handle it. We have followed the same semantics in block typeMatch.

      When we query for documents that have null in a field fieldA (e.g. db.runCommand({count: "collection", query: { fieldA: {$eq: null} }})) , we use traverseF with typeMatch as the lambda and then we discard all documents for which typeMatch has returned false. In case fieldA is Nothing, typeMatch does not return false and the document satisfies the query. In case fieldA is an empty array, the scalar version does not run typeMatch, returns false and the document is filtered out. In block processing however, an empty arrays is represented as Nothing, block typeMatch processes it and returns Nothing and since it is not false the document is not filtered out.

      To test my assumption I changed the default value from Nothing to NumberInt32 and the document with the empty array did not satisfy the query in the block typeMatch.

            Assignee:
            backlog-query-execution [DO NOT USE] Backlog - Query Execution
            Reporter:
            foteini.alvanaki@mongodb.com Foteini Alvanaki
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: