Skip to content Skip to sidebar Skip to footer

Manage Python Multiprocessing With Mongodb

I'm trying to run my code with a multiprocessing function but mongo keep returning 'MongoClient opened before fork. Create MongoClient with connect=False, or create client afte

Solution 1:

db.authenticate will have to connect to mongo server and it will try to make a connection. So, even though connect=False is being used, db.authenticate will require a connection to be open. Why don't you create the mongo client instance after fork? That's look like the easiest solution.

Solution 2:

Since db.authenticate must open the MongoClient and connect to the server, it creates connections which won't work in the forked subprocess. Hence, the error message. Try this instead:

db = MongoClient('mongodb://user:password@localhost', connect=False).database

Also, delete the Lock l. Acquiring a lock in one subprocess has no effect on other subprocesses.

Solution 3:

Here is how I did it for my problem:

import pathos.pools as pp
import time
import db_access

classMultiprocessingTest(object):

    def__init__(self):
        passdeftest_mp(self):
        data = [[form,'form_number','client_id'] for form inrange(5000)]

        pool = pp.ProcessPool(4)
        pool.map(db_access.insertData, data)

if __name__ == '__main__':

    time_i = time.time()

    mp = MultiprocessingTest()
    mp.test_mp()

    time_f = time.time()

    print'Time Taken: ', time_f - time_i

Here is db_access.py:

from pymongo import MongoClient

def insertData(form):
    client = MongoClient()
    db = client['TEST_001']
    db.initialization.insert({
        "form": form[0],
        "form_number": form[1],
        "client_id": form[2]
    })

This is happening to your code because you are initiating MongoCLient() once for all the sub-processes. MongoClient is not fork safe. So, initiating inside each function works and let me know if there are other solutions.

Post a Comment for "Manage Python Multiprocessing With Mongodb"