This article is part of Robert Sheldon's continuing series on Mongo DB. To see all of the items in the series, click here.
This article is a bit of a departure from the previous articles in this series. Rather than focusing exclusively on MongoDB, I explain how to access MongoDB through the Python programming language. Many data-driven applications rely on MongoDB for their data, including those written in Python, and I think it’s important to understand how data can be accessed and modified from within an application. Covering this topic also helps to round out the series so you a more complete picture of MongoDB in the real-world.
Python is one of multiple languages through which you can access MongoDB data. I chose Python because the language makes it relatively easy to demonstrate different ways to interface with a MongoDB database and collection. This article walks you through the process of creating Python scripts that let you connect to a MongoDB database and carry out create, read, update, and delete (CRUD) operations.
This article is by no means intended to be an exhaustive discussion of how to access MongoDB data in an application. It is meant only to provide you with the basics you need to get started with Python and to give you a general sense of what it takes to work with MongoDB in a programming language. The exact approach you’ll need to take will depend on the particular language you’re working in and the driver used to interface with a MongoDB database.
Throughout the article, I provide multiple examples that demonstrate how to connect to Atlas MongoDB and perform CRUD operations. The examples are based on the PyMongo driver, which you must install on your system if you want to try out the examples for yourself. For details about PyMongo, refer to the MongoDB PyMongo Documentation. This article assumes that you’re already familiar with Python and have it installed on your system. With that in mind, let’s get started.
Note: For the examples in this article, I used the same MongoDB Atlas environment I used for the previous articles in this series. Refer to the first article for details about setting up these environments. The examples also require the hr
database to be in place if you want to try them out for yourself.
Connecting to MongoDB through Python
To access MongoDB documents from within an application, you must establish a connection to the target MongoDB server and the specific database. The exact approach depends on the programming language and the driver you use to facilitate communications between the client and the server. As already noted, we’ll be using be using Python and the PyMongo driver for this article.
All the examples will use the same basic construction to establish a connection with the server and database. Each script will define the connection, connect to the hr
database, run a query against the database, and close the connection. The following script shows the necessary language elements to carry out these operations, without the actual query:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
from pymongo import MongoClient from pymongo.server_api import ServerApi conn = <em><connection_string></em> client = MongoClient(conn, <<em>client_settings</em>>) try: db = client.get_database("hr") # start query <em><query></em> # end query client.close() except Exception as e: raise Exception("MongoDB returned the following error: ", e) |
I’ll go through each command and explain how it works, but first I want to point out the <query>
placeholder, sandwiched between the #
start
query
and #
end
query
comments. For each example that follows, you’ll simply switch out the placeholder for the query itself.
The first two commands in the script import the MongoClient
class and ServerApi
class from the pymongo
module, making it possible to establish a connection to a MongoDB server:
1 2 |
from pymongo import MongoClient from pymongo.server_api import ServerApi |
The second import
statement might not be necessary, depending on how you define the MongoClient
object. I’ll discuss this in more detail shortly. But first, take a look at the next command, which defines a connection string and assigns it to the conn
variable:
1 |
conn = <em><connection_string></em> |
You can use whatever variable name you like. A lot of documentation uses uri
because you’re defining a URI connection to the MongoDB server.
The connection string itself depends on the target MongoDB server. For MongoDB Atlas, the simplest way to define the connection string is to copy the one available through the Atlas interface, as explained in the first article in this series.
Note, however, that the first article walks you through the process of retrieving the connection string for MongoDB Compass. For this article, you should select the connection string specific to Python, inserting your password as appropriate. To find the connection string, navigate to your cluster in Atlas and click the Connect button. In the Connect to dialog box, click Drivers and then select Python from the Driver drop-down list. You’ll find the connection string under Step 3.
If you’re connecting to MongoDB on a local machine, you should be able to use the following connection string when assigning the conn
variable:
1 |
conn = mongodb://localhost:27017/ |
After you define your connection string, you must initialize a new instance of the MongoClient
class. You can find an example of how to instantiate the class in the same place in Atlas where you found the connection string. (To do so, you must view the full code sample, rather than just the connection stream.) The instantiation command takes the following form:
1 |
client = MongoClient(conn, server_api=ServerApi('1')) |
The command creates a MongoClient
object and assigns it to the client
variable. To create the object, the code uses the MongoClient
constructor and passes in two arguments to the constructor. The first argument is the conn
variable, which provides the constructor with the necessary connection information.
The second argument, server_api
, is optional. It specifies which server API version to use when creating the MongoClient
object. This is important if you want to ensure that the driver always conforms to a specific API version. To specify the version, you must use the ServerApi
constructor to create a ServerApi
object. According to MongoDB documentation and the Atlas site itself, you should use 1
as the constructor’s argument when connecting to Atlas.
If you’re not concerned about enforcing a specific API version, you need only include the conn
variable as the MongoClient
constructor’s argument:
1 |
client = MongoClient(conn) |
When you take this approach, you can omit the second import
statement at the beginning of your script. The statement imports the ServerApi
class, which is not needed if it’s not referenced as an argument in the MongoClient
method.
The remaining components in the base code include a try
block that contains the main code, followed by an except
block that handles any errors.
The first command in the try
block uses the client
variable to call the get_database
method available to the MongoClient
object:
1 |
db = client.get_database("hr") |
The method’s only argument is the name of the target database, which in this case is hr
. The database object returned by the method is saved to the db
variable. You can use the variable to access methods available to the database object when working with the hr
database.
The try
block then includes the comments and <query>
placeholder mentioned earlier. Next comes the following command, which specifies the close
method to end the client connection:
1 |
client.close() |
The except
block catches any exceptions and saves them to the e
variable, which is then used in the error message defined by the Exception
method:
1 2 |
except Exception as e: raise Exception("MongoDB returned the following error: ", e) |
That’s all there is to creating the basic shell we need for the rest of the article. As already noted, you’ll be replacing the <query>
placeholder with the code in the following examples. The base elements will remain the same throughout.
Creating a collection in a MongoDB database through Python
Now that we have our base code in place, let’s look at how to work with the target database. The following example creates a collection in the hr
database and then retrieves a list of the collections defined in the database:
1 2 3 4 |
db.create_collection("personnel") collections = db.list_collection_names() for collection in collections: print(collection) |
The first command uses the create_collection
method available the database object to create the personnel
collection in the hr
database. You can call the method though the db
variable, passing in the collection name as an argument to the method.
The second command uses the list_collection_names
method available the database object to retrieve the names of the database’s current collections. The command saves the list of collections to the collections
variable.
The third command is a for
statement that iterates though the collections
list. For each iteration, the current collection name is saved to the collection
variable. The print
command then print’s the variable’s value to the screen.
When you replace the <query>
placeholder in the base code with the above code, it should look like the following script:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
from pymongo import MongoClient from pymongo.server_api import ServerApi conn = <em><connection_string></em> client = MongoClient(conn, <<em>client_settings</em>>) try: db = client.get_database("hr") # start query db.create_collection("personnel") collections = db.list_collection_names() for collection in collections: print(collection) # end query client.close() except Exception as e: raise Exception("MongoDB returned the following error: ", e) |
If you save this script to a text file (with a .py
extension), you should be able to run it at a command prompt. If you’re not sure how to run a Python script, refer to the Python documentation. The process is usually fairly straightforward once you’ve installed Python. Your shell command might take a form similar to the following:
1 |
> python <<em>file_path</em>>/<<em>file_name</em>>.py |
When you run the script, it should return a list of the existing collections, which will be specific to your MongoDB instance. At the very least, the results should include the newly created personnel
collection.
When I first tried to run this script on my Mac computer, I received a certificate verification error. I was able to resolve the issue by running the following command file, which was available in my Python installation folder:
1 |
/Applications/Python\ 3.10/Install\ Certificates.command |
If you run into this issue when connecting to Atlas, you might be able to take a similar approach to resolve it. I did not run into this issue when trying to connect to a local instance of MongoDB.
Adding documents to a MongoDB database through Python
Now that we’ve gotten the basics out of the way, we can move onto using Python to carry out standard CRUD operations, starting with how to insert data. In the following example, the code adds a single document to the personnel
collection and then prints the results returned by MongoDB:
1 2 3 4 |
personnel = db.get_collection("personnel") doc = { "_id": 101, "name": "Drew", "position": "Senior Developer", "dept": "R&D" } results = personnel.insert_one(doc) print(results) |
The first command uses the get_collection
method available the db
database object to create a collection object for the personnel
collection. The command then assigns the object to the personnel
variable. You can then use the variable to run methods available to the collection object.
The second command defines the document to be inserted into the collection and assigns it to the doc
variable. This is a straightforward document definition just like you’ve seen in previous articles. In this case, I’ve kept the document very simple, but you can define any type of documents.
The next command uses the insert_one
method available to the collection object to add the document to the collection. The doc
variable is passed in as an argument to the method. When the method is executed, the document is added to the collection. The results returned by MongoDB are assigned to the results
variable.
The final command prints the information that was assigned to the results
variable. When you run the script, it should return the following results:
1 |
InsertOneResult(101, acknowledged=True) |
There might be times when you want to add multiple documents to a collection in a single operation. For the most part, you use the same format as with a single document, with a couple notable differences, as shown in the following example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
personnel = db.get_collection("personnel") docs = [ { "_id": 102, "name": "Parker", "position": "Data Scientist", "dept": "R&D" }, { "_id": 103, "name": "Harper", "position": "Marketing Manager", "dept": "Marketing" }, { "_id": 104, "name": "Darcy", "position": "Senior Developer", "dept": "R&D" }, { "_id": 105, "name": "Carey", "position": "SEO Specialist", "dept": "Marketing" }, { "_id": 106, "name": "Avery", "position": "Network Admin", "dept": "IT" }, { "_id": 107, "name": "Robin", "position": "Security Specialist", "dept": "IT" }, { "_id": 108, "name": "Koda", "position": "QA Specialist", "dept": "R&D" }, { "_id": 109, "name": "Jessie", "position": "Brand Manager", "dept": "Marketing" }, { "_id": 110, "name": "Dana", "position": "Market Analyst", "dept": "Marketing" } ] results = personnel.insert_many(docs) print(results) |
As before, you start by assigning the collection object to the personnel
variable. Next, you define an array that contains the document definitions. In this case, there are nine documents whose _id
values range from 102
through 110
. Because it is an array, the documents are enclosed in square brackets, with commas separating the individual documents. The array is then assigned to the docs
variable.
The next command uses the insert_many
method available to the collection object to add the documents to the collection, referencing the documents through the docs
variable. The command then saves the results returned by MongoDB to the results
variable.
As before, the final step prints the information that was assigned to the results
variable. The command should return the following results:
1 |
InsertManyResult([102, 103, 104, 105, 106, 107, 108, 109, 110], acknowledged=True) |
The nine documents have now been added to the personnel
collection, which already contained the one added in the previous example, giving you a total of 10 documents.
Retrieving documents from a MongoDB database through Python
The process of retrieving documents is just as straightforward as adding them. You can retrieve a single document or multiple documents, using the same type of query expressions you’ve seen in earlier articles. For example, the following code retrieves a document with an _id
value of 102
from the personnel
collection:
1 2 3 4 |
personnel = db.get_collection("personnel") query = { "_id": 102 } results = personnel.find_one(query) print(results) |
After assigning the collection object to the personnel
variable, the code defines a simple query that limits the results to documents with an _id
value of 102
. Because the query specifies the _id
field, you know that the value 102
is unique and that there will be only one document. The query is then assigned to the query
variable.
The next command uses the find_one
method available to the collection object to retrieve the target document, using the query assigned to the query
variable. The command also saves the results returned by MongoDB to the results
variable. The last command prints the information in the results
variable. When you run the script, MongoDB should return the following document, which is the one that matches the search criteria:
1 |
{'_id': 102, 'name': 'Parker', 'position': 'Data Scientist', 'dept': 'R&D'} |
The find_one
method returns only the first document returned by the query, even if multiple documents satisfy the search condition. However, you can return all documents by using the find
method instead of find_one
, as in the following example:
1 2 3 4 5 |
personnel = db.get_collection("personnel") query = { "dept": "Marketing" } results = personnel.find(query) for result in results: print(result) |
The query now specifies that any returned documents must have a dept
value of Marketing
. In addition, the code uses the find
method to return all matching documents, again saving the returned documents to the results
variable.
Next, the code includes a for
statement that iterates through the documents in the results
variable. For each iteration, the statement assigns the current document to the result
variable and print’s the variable’s value to the screen, giving us the following results:
1 2 3 4 |
{'_id': 103, 'name': 'Harper', 'position': 'Marketing Manager', 'dept': 'Marketing'} {'_id': 105, 'name': 'Carey', 'position': 'SEO Specialist', 'dept': 'Marketing'} {'_id': 109, 'name': 'Jessie', 'position': 'Brand Manager', 'dept': 'Marketing'} {'_id': 110, 'name': 'Dana', 'position': 'Market Analyst', 'dept': 'Marketing'} |
You can also define aggregations in your queries, similar to what you saw in earlier articles. For this, you should first define a pipeline and assign it to a variable, as in the following example:
1 2 3 4 5 6 7 8 |
personnel = db.get_collection("personnel") pipeline = [ { "$match": { "$or": [ { "dept": "R&D" }, { "dept": "Marketing" } ] } }, { "$group": { "_id": "$dept", "count": { "$sum": 1 } } } ] results = personnel.aggregate(pipeline) for result in results: print(result) |
The pipeline uses the $match
stage to return only those documents with a dept
value of R&D
or Marketing
. The $group
stage then groups the results by the dept
field and returns a count of the number of documents in each group. The entire pipeline definition is assigned to the pipeline
variable.
The next command uses the aggregate
method available to the collection object to aggregate the data, using the pipeline definition assigned to the pipeline
variable. The command also saves the returned data to the results
variable. All this is followed by a for
loop that iterates through the results and returns the following information:
1 2 |
{'_id': 'Marketing', 'count': 4} {'_id': 'R&D', 'count': 4} |
The results include the group name and the number of documents in each group, which translates to the number of employees in each department, as reflected by the personnel
collection. You can, of course, define much more complex pipelines, just like you can define more complex queries when working with the find_one
and find
methods.
Updating documents in a MongoDB database through Python
You can also use Python to modify your documents, updating either one document or multiple documents in a single operation. When you update the documents, you must specify a query expression that determines which documents to update and then an update expression that defines how to update them.
For example, the following code updates the document with an _id
value of 104
, changing the value of the position
field to Developer
:
1 2 3 4 5 |
personnel = db.get_collection("personnel") query = { "_id": 104 } update = { "$set": { "position": "Developer" } } results = personnel.update_one(query, update) print(results) |
The query portion is similar to what you saw when retrieving documents. The expression specifies that the target document must have an _id
value of 104
. The expression is then assigned to the query
variable. Next, the update expression uses the $set
operator to set the position
value to Developer
. This expression is assigned to the update
variable.
The fourth command uses the update_one
method available to the collection object to perform the actual update. The method takes two arguments: the query expression and the update expression, passed in through the query
and update
variables. The command then saves the results returned by MongoDB to the results
variable, which are then printed to the screen.
When you run this code (along with the code base), you should receive a message indicating that one document has been updated.
You can just as easily update multiple documents. Again, you must first define the query expression and then the update expression. This time around, however, you must use the update_many
method, rather than the update_one
method, as in the following example:
1 2 3 4 5 |
personnel = db.get_collection("personnel") query = { "dept": "R&D" } update = { "$set": { "dept": "Dev" } } results = personnel.update_many(query, update) print(results) |
In this case, the command updates those documents whose dept
value is R&D
, changing the value to Dev
. The returned message should indicate that four documents were updated.
Deleting documents from a MongoDB database through Python
You can delete documents just as easily as updating them, whether one document or many. In fact, it is even easier to delete them because you don’t have to specify an update expression. You need only define your query and specify the applicable collection method, as in the following example:
1 2 3 4 |
personnel = db.get_collection("personnel") query = { "_id": 110 } results = personnel.delete_one(query) print(results) |
The query expression specifies that the document’s _id
value must by 110
. The next command uses the delete_one
method available to the collection object to delete the document, based on the query specified in the query
variable. The results are then printed to the screen. They should indicate that one document has been deleted.
To delete multiple documents, you must use the delete_many
method rather than delete_one
. For example, the following code uses the delete_many
method to delete all documents with a dept
value of Marketing
:
1 2 3 4 |
personnel = db.get_collection("personnel") query = { "dept": "Marketing" } results = personnel.delete_many(query) print(results) |
The results returned by MongoDB should indicate that three documents have been deleted.
It is possible to delete all documents in a collection by defining the query expression as an empty document (only the curly brackets), as in the following example.
1 2 3 4 |
personnel = db.get_collection("personnel") query = {} results = personnel.delete_many(query) print(results) |
Be extremely cautious using this approach. Wiping out an entire collection at once is seldom advisable outside a test environment.
Getting started with Python and MongoDB
Python is an extremely popular programming language that is widely used in data-driven applications that rely on MongoDB. In this article, I’ve tried to give you a foundation in how to use Python to perform CRUD operations against a MongoDB collection. However, this article is only an introduction, and there is much more you can do in Python when interfacing with MongoDB collections.
The examples are also specific to the Python language and the PyMongo driver. Although the basic principles for accessing MongoDB data are generally the same across programming languages and drivers, there are distinct differences between them, and you should refer to the documentation specific to the tools you’re using. That said, I hope this article has at least provided you with a better understanding of how to access a MongoDB database from within your data-driven applications, even if you’re not working with Python.
Load comments