-
Type: Bug
-
Resolution: Cannot Reproduce
-
Priority: Major - P3
-
None
-
Affects Version/s: 2.2.0
-
Component/s: Writes
-
Environment:Ubuntu 16.04
Spark 2.2.0 with standalone deployment
mongod --version
db version v3.4.7
git version: cf38c1b8a0a8dca4a11737581beafef4fe120bcd
OpenSSL version: OpenSSL 1.0.2g 1 Mar 2016
allocator: tcmalloc
modules: none
build environment:
distmod: ubuntu1604
distarch: x86_64
target_arch: x86_64
mongo3.2 also suffer this issueUbuntu 16.04 Spark 2.2.0 with standalone deployment mongod --version db version v3.4.7 git version: cf38c1b8a0a8dca4a11737581beafef4fe120bcd OpenSSL version: OpenSSL 1.0.2g 1 Mar 2016 allocator: tcmalloc modules: none build environment: distmod: ubuntu1604 distarch: x86_64 target_arch: x86_64 mongo3.2 also suffer this issue
0. run command
```
./bin/spark-submit \
--master "spark://YOUR_HOST_NAME:7077" \
--deploy-mode client \
--executor-memory 3G \
--num-executors 2 \
--conf "spark.mongodb.input.uri=mongodb://127.0.0.1:27017/test" \
--conf "spark.mongodb.output.uri=mongodb://127.0.0.1:27017/test" \
--packages org.mongodb.spark:mongo-spark-connector_2.11:2.2.0 \
/YOUR/PATH/TO/THIS/joinErr.py
```
1. insert some data into test db
```
db.calc.drop()
db.calc.insert([
{devId: "001", prdId: "product1", size: 10},
{devId: "002", prdId: "product2", size: 20}])
db.calc.find()
{ "_id" : ObjectId("59f284a8a22c47422c2059d2"), "devId" : "001", "prdId" : "product1", "size" : 10 } { "_id" : ObjectId("59f284a8a22c47422c2059d3"), "devId" : "002", "prdId" : "product2", "size" : 20 }```
2. run the command in section 0, its output show as below
```
==> collect sumDF: [Row(devId=u'001', prdId=u'product1', size=10), Row(devId=u'002', prdId=u'product2', size=20)]
==> collect dpDF: [Row(devId=u'001', prdId=u'product1', size=1), Row(devId=u'002', prdId=u'product2', size=2)]
==> final sumDF.collect: [Row(devId=u'002', prdId=u'product2', size=22), Row(devId=u'001', prdId=u'product1', size=11)]
```
Yep!, the infal sumDF is what I wanted. but look in the mongo shell:
```
db.calc.find()
```
it seems that sumDF is replaced by dpDF.
3. if sumDF is create by hand or read throud Pymongo, it works as intended. just as test-case2 and test-case3 indicated.