10

How do I bulk update/insert in mongoDb with pymongo/pandas. The error I get is batch op errors occurred I reason I get is because I set the "_id", which I want to do. I code runs fine on first run, but on second run it fails. I want to use pandas in workflow. The data does have a datetime object.

The syntax is completely different for upsert = True, with Update. An efficient solution with update would be helpful, where "_id" or "qid" could be set. But, there are python datetime objects!

InSQL   = 'SELECT * from  database2.table2 '
sqlOut  = pd.read_sql(InSQL,cxn)
sqlOut['_id'] = "20170101" + ":"+ sqlOut['Var']   

dfOut   = sqlOut.to_json(orient='records',date_format='iso' )
try:
    db["test"].insert_many(json.loads(dfOut))
except Exception as e:  print e

I have given a 50pt bounty, which expired, with no answer. Hmm...

4
  • Mongo used string base input and not allow any variable inserting(operator only)check this, json.loads(dfOut) you can't insert local variable as entry ! Primary key YYYYmmDDHHMMSS+Counter value Commented Mar 1, 2017 at 7:16
  • You can dig into the BulkWriteError details and get a better idea of what's going on. It may be that your personally defined ids are duplicated or violate the 12-byte limit. See stackoverflow.com/questions/30355790/mongodb-bulk-write-error for details. Commented Mar 3, 2017 at 15:09
  • The fact that the _id worked the first time, tells me that 12-byte limit is not the concern. Yes, the _id or gid are dups and needed for update. Commented Mar 3, 2017 at 17:15
  • It sounds like your goal is to do an upsert but the current sample code is doing an insert (so it is expected that the successive runs will fail with duplicate _id keys). Can you update with a short example of the dfOut content and the upsert code you tried? Also, what version of PyMongo are you using? I suspect you actually want to use update_many() with the upsert option set. Can you also clarify the concerns on Python datetime objects and _id versus qid? Do you want to use the qid as the _id? Commented Apr 25, 2017 at 22:18

2 Answers 2

8

You get an error because you try to insert documents with fields which conflict with that of existing documents on the second and subsequent insert_many calls. You correctly inferred it may be due to your setting _id explicitly, which would then conflict with existing _id values in the collection.

MongoDB automatically creates an unique index on _id, which forbids duplicating values.

You need to update or replace your documents on calls after the first one (which inserted the documents in their first version). There is indeed a concept of "upsert" which will take care of inserting non-previously-existing documents in the collection as well as updating the existing ones.

Your options:

  • Most efficient: pymongo.collection.Collection.bulk_write

    import pymongo
    
    operations = [pymongo.operations.ReplaceOne(
        filter={"_id": doc["_id"]}, 
        replacement=doc,
        upsert=True
        ) for doc in json.loads(dfOut)]
    
    result = db["test"].bulk_write(operations)
    # handle results
    

Note that it's efficiency also depends on whether the field is indexed in the collection, which incidentally is the case for _id. (also see pymongo.operations.ReplaceOne)

Note: pymongo.collection.Collection.update_many seems unsuitable for your needs since you are not trying to set the same value on all matches of a given filter.

Sign up to request clarification or add additional context in comments.

Comments

2

batch op error maybe caused by duplicate _id, So delete the same _id documents already in mongo before inserting

Or use update_many https://api.mongodb.com/python/current/api/pymongo/collection.html?highlight=update#pymongo.collection.Collection.update_many

https://docs.mongodb.com/manual/reference/method/db.collection.updateMany/

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.