Spark connector is inserting documents to MongoDB, with following configuration. Trying to capture the error logs when Duplicates getting inserted into Mongo.
Spark Connector inserting to MongoDB Sharded cluster, Uniqueness on Shard key
In the mongod logs, shard key field values are hiding, so I am unable to capture which is getting duplicated.
There is no visibility from Spark Connector to capture Logs
Spark Connector code:
object Write2Mongo {
def main(args: Array[String]): Unit = {
if(args.length != 2)
val confFile = args(0)
val stgDir = args(1)
/**************************************************
* Build Spark context
**************************************************/
__ val config = ConfigFactory.parseFile(new File(confFile))
val spark = MongoContext.buildSparkContext(config)
val mongoRdd = spark.read.parquet(stgDir).rdd.map(x => convertBytes2Bson(x.getAs[Array[Byte]]("document")))
MongoSpark.save(mongoRdd)
}
}