I'm using MongoSpark scala connector to load a collection, process it in spark and I want to update the same collection with the result.

I have a dataframe with the structure:

root |-- _id: struct (nullable = true) | |-- oid: string (nullable = true) |-- provider: string (nullable = true) |-- solution: string (nullable = true)

_id and provider are the sharded keys in mongo but when I run the update with:

MongoSpark .write(df_update) .option("uri","mongodb://127.0.0.1/") .option("database", "test") .option("collection", "solutions") .option("replaceDocument", "false") .mode("append").save()

I get:

Write errors: [BulkWriteError{index=0, code=61, message='upsert { q: { _id: ObjectId('57ebd3d227e9c712d83737c9') }, u: { $set: { provider: "someProvider", solution: "blahblahblah" } }, upsert: true } does not contain shard key for pattern { provider: 1.0 }', details={ }}]

It looks like mongospark is not using provider as part of the query and because of that I get the shard stuff error...

Is there a way to force mongospark to use X columns to query and Y columns to update?

I'm using mongo 3.4.7 and spark 2.2.0

Thanks!