Connecting to MongoDB through Python

Comments 0

Share to social media

This article is part of Robert Sheldon's continuing series on Mongo DB. To see all of the items in the series, click here.

This article is a bit of a departure from the previous articles in this series. Rather than focusing exclusively on MongoDB, I explain how to access MongoDB through the Python programming language. Many data-driven applications rely on MongoDB for their data, including those written in Python, and I think it’s important to understand how data can be accessed and modified from within an application. Covering this topic also helps to round out the series so you a more complete picture of MongoDB in the real-world.

Python is one of multiple languages through which you can access MongoDB data. I chose Python because the language makes it relatively easy to demonstrate different ways to interface with a MongoDB database and collection. This article walks you through the process of creating Python scripts that let you connect to a MongoDB database and carry out create, read, update, and delete (CRUD) operations.

This article is by no means intended to be an exhaustive discussion of how to access MongoDB data in an application. It is meant only to provide you with the basics you need to get started with Python and to give you a general sense of what it takes to work with MongoDB in a programming language. The exact approach you’ll need to take will depend on the particular language you’re working in and the driver used to interface with a MongoDB database.

Throughout the article, I provide multiple examples that demonstrate how to connect to Atlas MongoDB and perform CRUD operations. The examples are based on the PyMongo driver, which you must install on your system if you want to try out the examples for yourself. For details about PyMongo, refer to the MongoDB PyMongo Documentation. This article assumes that you’re already familiar with Python and have it installed on your system. With that in mind, let’s get started.

Note: For the examples in this article, I used the same MongoDB Atlas environment I used for the previous articles in this series. Refer to the first article for details about setting up these environments. The examples also require the hr database to be in place if you want to try them out for yourself.

Connecting to MongoDB through Python

To access MongoDB documents from within an application, you must establish a connection to the target MongoDB server and the specific database. The exact approach depends on the programming language and the driver you use to facilitate communications between the client and the server. As already noted, we’ll be using be using Python and the PyMongo driver for this article.

All the examples will use the same basic construction to establish a connection with the server and database. Each script will define the connection, connect to the hr database, run a query against the database, and close the connection. The following script shows the necessary language elements to carry out these operations, without the actual query:

I’ll go through each command and explain how it works, but first I want to point out the <query> placeholder, sandwiched between the # start query and # end query comments. For each example that follows, you’ll simply switch out the placeholder for the query itself.

The first two commands in the script import the MongoClient class and ServerApi class from the pymongo module, making it possible to establish a connection to a MongoDB server:

The second import statement might not be necessary, depending on how you define the MongoClient object. I’ll discuss this in more detail shortly. But first, take a look at the next command, which defines a connection string and assigns it to the conn variable:

You can use whatever variable name you like. A lot of documentation uses uri because you’re defining a URI connection to the MongoDB server.

The connection string itself depends on the target MongoDB server. For MongoDB Atlas, the simplest way to define the connection string is to copy the one available through the Atlas interface, as explained in the first article in this series.

Note, however, that the first article walks you through the process of retrieving the connection string for MongoDB Compass. For this article, you should select the connection string specific to Python, inserting your password as appropriate. To find the connection string, navigate to your cluster in Atlas and click the Connect button. In the Connect to dialog box, click Drivers and then select Python from the Driver drop-down list. You’ll find the connection string under Step 3.

If you’re connecting to MongoDB on a local machine, you should be able to use the following connection string when assigning the conn variable:

After you define your connection string, you must initialize a new instance of the MongoClient class. You can find an example of how to instantiate the class in the same place in Atlas where you found the connection string. (To do so, you must view the full code sample, rather than just the connection stream.) The instantiation command takes the following form:

The command creates a MongoClient object and assigns it to the client variable. To create the object, the code uses the MongoClient constructor and passes in two arguments to the constructor. The first argument is the conn variable, which provides the constructor with the necessary connection information.

The second argument, server_api, is optional. It specifies which server API version to use when creating the MongoClient object. This is important if you want to ensure that the driver always conforms to a specific API version. To specify the version, you must use the ServerApi constructor to create a ServerApi object. According to MongoDB documentation and the Atlas site itself, you should use 1 as the constructor’s argument when connecting to Atlas.

If you’re not concerned about enforcing a specific API version, you need only include the conn variable as the MongoClient constructor’s argument:

When you take this approach, you can omit the second import statement at the beginning of your script. The statement imports the ServerApi class, which is not needed if it’s not referenced as an argument in the MongoClient method.

The remaining components in the base code include a try block that contains the main code, followed by an except block that handles any errors.

The first command in the try block uses the client variable to call the get_database method available to the MongoClient object:

The method’s only argument is the name of the target database, which in this case is hr. The database object returned by the method is saved to the db variable. You can use the variable to access methods available to the database object when working with the hr database.

The try block then includes the comments and <query> placeholder mentioned earlier. Next comes the following command, which specifies the close method to end the client connection:

The except block catches any exceptions and saves them to the e variable, which is then used in the error message defined by the Exception method:

That’s all there is to creating the basic shell we need for the rest of the article. As already noted, you’ll be replacing the <query> placeholder with the code in the following examples. The base elements will remain the same throughout.

Creating a collection in a MongoDB database through Python

Now that we have our base code in place, let’s look at how to work with the target database. The following example creates a collection in the hr database and then retrieves a list of the collections defined in the database:

The first command uses the create_collection method available the database object to create the personnel collection in the hr database. You can call the method though the db variable, passing in the collection name as an argument to the method.

The second command uses the list_collection_names method available the database object to retrieve the names of the database’s current collections. The command saves the list of collections to the collections variable.

The third command is a for statement that iterates though the collections list. For each iteration, the current collection name is saved to the collection variable. The print command then print’s the variable’s value to the screen.

When you replace the <query> placeholder in the base code with the above code, it should look like the following script:

If you save this script to a text file (with a .py extension), you should be able to run it at a command prompt. If you’re not sure how to run a Python script, refer to the Python documentation. The process is usually fairly straightforward once you’ve installed Python. Your shell command might take a form similar to the following:

When you run the script, it should return a list of the existing collections, which will be specific to your MongoDB instance. At the very least, the results should include the newly created personnel collection.

When I first tried to run this script on my Mac computer, I received a certificate verification error. I was able to resolve the issue by running the following command file, which was available in my Python installation folder:

If you run into this issue when connecting to Atlas, you might be able to take a similar approach to resolve it. I did not run into this issue when trying to connect to a local instance of MongoDB.

Adding documents to a MongoDB database through Python

Now that we’ve gotten the basics out of the way, we can move onto using Python to carry out standard CRUD operations, starting with how to insert data. In the following example, the code adds a single document to the personnel collection and then prints the results returned by MongoDB:

The first command uses the get_collection method available the db database object to create a collection object for the personnel collection. The command then assigns the object to the personnel variable. You can then use the variable to run methods available to the collection object.

The second command defines the document to be inserted into the collection and assigns it to the doc variable. This is a straightforward document definition just like you’ve seen in previous articles. In this case, I’ve kept the document very simple, but you can define any type of documents.

The next command uses the insert_one method available to the collection object to add the document to the collection. The doc variable is passed in as an argument to the method. When the method is executed, the document is added to the collection. The results returned by MongoDB are assigned to the results variable.

The final command prints the information that was assigned to the results variable. When you run the script, it should return the following results:

There might be times when you want to add multiple documents to a collection in a single operation. For the most part, you use the same format as with a single document, with a couple notable differences, as shown in the following example:

As before, you start by assigning the collection object to the personnel variable. Next, you define an array that contains the document definitions. In this case, there are nine documents whose _id values range from 102 through 110. Because it is an array, the documents are enclosed in square brackets, with commas separating the individual documents. The array is then assigned to the docs variable.

The next command uses the insert_many method available to the collection object to add the documents to the collection, referencing the documents through the docs variable. The command then saves the results returned by MongoDB to the results variable.

As before, the final step prints the information that was assigned to the results variable. The command should return the following results:

The nine documents have now been added to the personnel collection, which already contained the one added in the previous example, giving you a total of 10 documents.

Retrieving documents from a MongoDB database through Python

The process of retrieving documents is just as straightforward as adding them. You can retrieve a single document or multiple documents, using the same type of query expressions you’ve seen in earlier articles. For example, the following code retrieves a document with an _id value of 102 from the personnel collection:

After assigning the collection object to the personnel variable, the code defines a simple query that limits the results to documents with an _id value of 102. Because the query specifies the _id field, you know that the value 102 is unique and that there will be only one document. The query is then assigned to the query variable.

The next command uses the find_one method available to the collection object to retrieve the target document, using the query assigned to the query variable. The command also saves the results returned by MongoDB to the results variable. The last command prints the information in the results variable. When you run the script, MongoDB should return the following document, which is the one that matches the search criteria:

The find_one method returns only the first document returned by the query, even if multiple documents satisfy the search condition. However, you can return all documents by using the find method instead of find_one, as in the following example:

The query now specifies that any returned documents must have a dept value of Marketing. In addition, the code uses the find method to return all matching documents, again saving the returned documents to the results variable.

Next, the code includes a for statement that iterates through the documents in the results variable. For each iteration, the statement assigns the current document to the result variable and print’s the variable’s value to the screen, giving us the following results:

You can also define aggregations in your queries, similar to what you saw in earlier articles. For this, you should first define a pipeline and assign it to a variable, as in the following example:

The pipeline uses the $match stage to return only those documents with a dept value of R&D or Marketing. The $group stage then groups the results by the dept field and returns a count of the number of documents in each group. The entire pipeline definition is assigned to the pipeline variable.

The next command uses the aggregate method available to the collection object to aggregate the data, using the pipeline definition assigned to the pipeline variable. The command also saves the returned data to the results variable. All this is followed by a for loop that iterates through the results and returns the following information:

The results include the group name and the number of documents in each group, which translates to the number of employees in each department, as reflected by the personnel collection. You can, of course, define much more complex pipelines, just like you can define more complex queries when working with the find_one and find methods.

Updating documents in a MongoDB database through Python

You can also use Python to modify your documents, updating either one document or multiple documents in a single operation. When you update the documents, you must specify a query expression that determines which documents to update and then an update expression that defines how to update them.

For example, the following code updates the document with an _id value of 104, changing the value of the position field to Developer:

The query portion is similar to what you saw when retrieving documents. The expression specifies that the target document must have an _id value of 104. The expression is then assigned to the query variable. Next, the update expression uses the $set operator to set the position value to Developer. This expression is assigned to the update variable.

The fourth command uses the update_one method available to the collection object to perform the actual update. The method takes two arguments: the query expression and the update expression, passed in through the query and update variables. The command then saves the results returned by MongoDB to the results variable, which are then printed to the screen.

When you run this code (along with the code base), you should receive a message indicating that one document has been updated.

You can just as easily update multiple documents. Again, you must first define the query expression and then the update expression. This time around, however, you must use the update_many method, rather than the update_one method, as in the following example:

In this case, the command updates those documents whose dept value is R&D, changing the value to Dev. The returned message should indicate that four documents were updated.

Deleting documents from a MongoDB database through Python

You can delete documents just as easily as updating them, whether one document or many. In fact, it is even easier to delete them because you don’t have to specify an update expression. You need only define your query and specify the applicable collection method, as in the following example:

The query expression specifies that the document’s _id value must by 110. The next command uses the delete_one method available to the collection object to delete the document, based on the query specified in the query variable. The results are then printed to the screen. They should indicate that one document has been deleted.

To delete multiple documents, you must use the delete_many method rather than delete_one. For example, the following code uses the delete_many method to delete all documents with a dept value of Marketing:

The results returned by MongoDB should indicate that three documents have been deleted.

It is possible to delete all documents in a collection by defining the query expression as an empty document (only the curly brackets), as in the following example.

Be extremely cautious using this approach. Wiping out an entire collection at once is seldom advisable outside a test environment.

Getting started with Python and MongoDB

Python is an extremely popular programming language that is widely used in data-driven applications that rely on MongoDB. In this article, I’ve tried to give you a foundation in how to use Python to perform CRUD operations against a MongoDB collection. However, this article is only an introduction, and there is much more you can do in Python when interfacing with MongoDB collections.

The examples are also specific to the Python language and the PyMongo driver. Although the basic principles for accessing MongoDB data are generally the same across programming languages and drivers, there are distinct differences between them, and you should refer to the documentation specific to the tools you’re using. That said, I hope this article has at least provided you with a better understanding of how to access a MongoDB database from within your data-driven applications, even if you’re not working with Python.

About the author

Robert Sheldon

See Profile

Robert is a freelance technology writer based in the Pacific Northwest. He’s worked as a technical consultant and has written hundreds of articles about technology for both print and online publications, with topics ranging from predictive analytics to 5D storage to the dark web. He’s also contributed to over a dozen books on technology, developed courseware for Microsoft’s training program, and served as a developmental editor on Microsoft certification exams. When not writing about technology, he’s working on a novel or venturing out into the spectacular Northwest woods.