Working with Schema Validation in MongoDB

Comments 0

Share to social media

This article is part of Robert Sheldon's continuing series on Mongo DB. To see all of the items in the series, click here.

In the previous article in this series, I introduced you to schema validation in MongoDB. I described how you can define validation rules on a collection and how those rules validate document inserts and updates. In this article, I continue the discussion by explaining how to apply validation rules to documents that already exist in a collection. Before you start in on this article, however, I recommend that you first review the previous article for an introduction into schema validation.

The examples in this article demonstrate various concepts for working with schema validation, as it applies to existing documents. I show you how to find documents that conform and don’t conform to the validation rules, as well as how to bypass schema validation when inserting or updating a document. I also show you how to update and delete invalid documents in a collection. Finally, I explain how you can use validation options to override the default schema validation behavior when inserting and updating documents.

Note: For the examples in this article, I used the same MongoDB Atlas and MongoDB Compass environments I used for the previous articles in this series. Refer to the first article for details about setting up these environments. Because this article builds off the previous article, I’ve also used the same hr database and candidates collection for the examples. If you’ve already deleted the database and collection, you can simply re-create them. The goal is to start with an empty collection with no validation rules defined.

Searching for invalid documents in a MongoDB collection

When you define schema validation rules on a collection that already contains documents, some of those documents might not adhere to the new rules. In which case, you might want to search your collection to determine which documents conform to the rules and which ones do not.

In this section, I demonstrate how to find valid and invalid documents. If you want to try out the examples, you should create the hr database and candidates collection, if they don’t already exist.

The examples in this article were developed for MongoDB Shell. You can use the version of Shell integrated into MongoDB Compass, or you can use the version available through your system’s command-line utility. I like using the version in Compass because I can quickly review my inserted and updated documents.

If your collection contains any documents, delete those and then delete any validation rules. To remove validation rules, open MongoDB Shell, switch to the hr database, and run the following runCommand statement:

The runCommand method calls the collMod database command, which lets you modify options on the specified collection. In this case, the command calls the validator method, just like you saw in the previous article when defining validation rules. This time, however, the method’s argument is an empty document (curly brackets), which means that no rules will be defined. If rules already exist, they will be removed.

When you run this statement, you should receive an OK message that indicates the statement successfully executed. Of course, you could have dropped and re-created the collection, rather than deleting the documents and schema rules. It’s up to you which approach you take, although it can be useful to know how to remove validation rules.

Once you have a clean slate, you should run the following insertMany statement, which adds four simple documents to the candidates collection:

The dob values in these documents are intentionally defined with different data types—two as String values and two as Date values. The difference in types will help to demonstrate how to find valid and invalid documents, as you’ll see shortly. When you run the insertMany statement, it should return the following results, which indicate that the documents have been added to the collection:

After you add the documents, you can run the following runCommand statement to define a simple set of schema validation rules:

The statement calls the validator method, which in turn calls the $jsonSchema operator. The operator defines a JSON Schema object that contains the validation rules. Notice that the properties element specifies that the dob field must be the date type. The rest of the statement’s components work just like you saw in the previous article. If you have any questions about what I’ve done here, refer to that article.

Although the documents and validation rules that we’ve added to the collection are relatively simple, they are enough to demonstrate how to find valid and invalid documents. The same principles apply no matter how complex your documents or validation rules.

When working with a collection’s schema validation, there might be times when you want to view or retrieve the validation rules themselves. For this, you can use the getCollectionInfos database method, specifying the validator option, as shown in the following statement:

The getCollectionInfos method lets you view different options about the specified collection, in this case, candidates. To retrieve the validation rules, you must specify the validator option as it is constructed here. The statement should return the following $jsonSchema object:

Together, the runCommand method and validator option make it easy to retrieve a collection’s validation rules so you can see how they’ve been defined. You can also use this approach to find valid and nonvalid documents.

To search for valid documents, you need to include the JSON Schema object as an argument to the find collection method. A good way to do this is to assign the object to a variable and then pass the variable in as an argument to the find method. For example, the following let statement assigns the JSON Schema object returned by the getCollectionInfos method to the schema1 variable:

After you define the variable, you can pass it into the find statements that you run within the same MongoDB Shell session. Before I demonstrate how to do this, however, I want to first show you how to assign the JSON Schema object definition directly to the variable, without using the getCollectionInfos method. In this way, you can check for valid and invalid documents before you define validation rules on a collection.

To assign the JSON Schema object directly to the variable, use the $jsonSchema operator to define the object, using this as the variable value:

Regardless of which approach you use to define your variable, you can easily verify its contents by running the following print command:

The command should return the $jsonSchema object, as shown in the following results:

After you define the schema1 variable you can use it in your find statements to search for valid and invalid documents. For example, the following find statement searches the candidates collection for documents that conform to the JSON Schema object in the schema1 variable:

The statement returns the following results, which include the documents with the _id values of 102 and 104:

If you want to search for the documents that do not conform to the validation rules, you can use the $nor logical operator, along with the schema1 variable:

This time, the find statement returns only the invalid documents, which include the documents with the _id values of 101 and 103:

As you can see, the value for the dob field in the results is a String value in both cases, which violates the validation rules. The field must take a Date value to conform to the rules. Except for this issue, the documents conform to the validation rules in all other ways.

Bypassing schema validation in a MongoDB collection

At times, you might want to add a document to a collection that violates the validation rules. For example, you might need to restore data from an archive file that includes invalid documents, which you want to preserve them in their original state.

As you saw in the previous article, MongoDB will return an error when you try to insert a document that violates the validation rules. For instance, MongoDB returns an error if you try to run the following insertOne statement:

The statement attempts to insert a document whose dob field is a String value, which violates the schema we defined above. However, you can override this behavior by including the bypassDocumentValidation option in your insertOne statement, as in the following example:

The bypassDocumentValidation option takes a Boolean as its value. When the option is set to true, MongoDB ignores the validation rules and inserts the document into the collection. The option works the same way for an insertMany statement.

When you include the bypassDocumentValidation option in an insertOne or insertMany statement, the option applies only when you run the statement. Once you insert the document into the collection, you cannot update it in a way the violates the validation rules. For example, if you try to run the following updateOne statement, MongoDB will return an error message stating that the document failed validation:

However, the updateOne method, as well as the updateMany method, also support the bypassDocumentValidation option, as in the following example:

When you run this statement, MongoDB ignores the validation rules, updates the document, and returns a message indicating that one document has been modified.

As with inserting data, the bypassDocumentValidation option applies only to the updateOne or updateMany statements when you run them. If you later try to update the statement, it must once again conform to the validation rules or include the bypassDocumentValidation option.

Updating invalid documents in a MongoDB collection

By default, MongoDB applies validation rules whenever you try to insert or update a document, unless you include the bypassDocumentValidation option in your statements. However, MongoDB does not apply those rules to the documents that already exist in the collection at the time you define the validation rules, which means your collection could contain invalid documents.

In some cases, you might want to update the invalid documents to bring them into conformance with the validation rules or to mark them in some way as being invalid. Before you update the documents, however, you can first search for them so you can assess what you have. For this, you can again use the find method, along with the $nor operator and JSON Schema object variable.

You can use the schema1 variable you defined earlier as long you’re working in the same user session. If you’ve reconnected to MongoDB since running those earlier statements, you’ll need to redefine the variable before you can use it, in which case, you should again run the following let statement:

Once you’ve defined the variable, you can then run your find statement:

The statement returns the following results, which indicate that the candidates collection contains three invalid documents:

To update the invalid documents, you can use $nor operator and schema1 variable within an updateMany statement as part of the statement’s filter. For example, the following updateMany statement uses the operator and variable to change the dob field in the invalid documents:

The statement first searches for the invalid documents—using the $nor operator and the schema1 variable—and then uses the $toDate method to set the field’s data type to Date. When you run the statement, you should receive a message indicating that the three documents were modified. If you were to examine the documents, you would find that the dob field in each document is now defined with the Date data type.

In this case, the updateMany statement modifies all invalid documents because it assumes that the dob value needs to be updated in every one of those documents. However, there might be times when you need to refine your statement’s filter to target only a subset of the invalid documents, as in the following example:

This time, the updateMany statement looks for documents that are invalid documents and whose dob field has a String value. If there are documents that violate the validation rules in other ways, they will not be updated.

Deleting invalid documents in a MongoDB collection

You can use the same logic—the $nor operator and schema1 variable—when deleting invalid documents. For example, suppose you insert the following document into the candidates collection:

Notice that the document includes the bypassDocumentValidation option, allowing the document to be added even if it violates the validation rules, which it does.

After you insert the document, you might decide to delete it, along with any other invalid documents in the collection. For this, you can use a deleteMany statement, once again incorporating the $nor operator and schema1 variable, as in the following example:

When you run the statement, MongoDB should return a message indicating that one document has been deleted (assuming you’ve been following along with the examples). You can verify the deletion by running the following find statement:

The statement should now return no results because the candidates collection no longer contains invalid documents. At this point, the collection should include only five documents, all of them in conformance with the validation rules. In other words, the dob field in each document is now defined with the Date data type.

Setting the schema validation action and level

The examples in this article, as well as those in the previous article, demonstrated how validation rules can affect your insert and update operations. As you have seen, you can insert documents only if they conform to those rules, unless you specify the bypassDocumentValidation option. Likewise, you can update documents only if the updated documents will conform to the rules.

However, this is only the default behavior. You can actually control how inserts and updates are handled when defining your validation rules. To implement these controls, you must include one or both of the following options when defining your validation rules:

  • validationAction. Determines whether MongoDB generates an error or warning when a document violates the validation rules. If the option is set to error (the default), MongoDB rejects the inserted or updated document and issues an error. If the option is set to warn, MongoDB permits the document to be inserted or updated, but records the violation in the MongoDB log.
  • validationLevel. Determines how MongoDB should apply the validation rules. If the option is set to strict (the default), MongoDB applies the rules to all inserted and updated documents. If the option is set to moderate, MongoDB applies the rules to all inserted documents but only to updates in which the document being updated is already valid. If the document is invalid, MongoDB does not enforce the validation rules during the update.

Let’s take a look at a few examples to better understand how these options work. If you want to try them out for yourself, first run the following statements, which delete all the documents, removes the validation rules, and adds the original documents back into the collection:

The next step is the define the validation rules, once again using a runCommand statement. This time, however, you should include the validationAction and validationLevel options in your collMod command, as in the following example:

The runCommand statement defines the same validation rules we’ve been using throughout this article, but it also includes the two validation options. The validationLevel option is set to its default value, strict, so nothing has changed there, but the validationAction option is set to warn, which changes MongoDB’s default behavior.

We can test this change by trying to insert a document into the candidates collection. The following insertOne statement attempts to add an invalid document to the collection, without including the bypassDocumentValidation option:

If the validationAction option were set to error (the default), the insertOne statement would return an error. Instead, the option is set to warn, so MongoDB adds the document to the collection and records the violation in the MongoDB log. (To view the log on Atlas, refer to the Atlas documentation.)

This principle works the same when updating documents. For example, the following updateOne statement attempts to update a field to an invalid value, without including the bypassDocumentValidation option:

As with the insertOne statement, MongoDB does not return an error but instead updates the document and records the violation in the MongoDB log.

Now let’s modify our validation option settings, this time setting the validationAction option to error (the default) and the validationLevel option to moderate:

As you’ll recall, the candidates collection includes two documents whose dob field is a String value and two documents whose dob field is a Date value. As a result, the documents with the _id values 101 and 103 are invalid.

Let’s try to update the 103 document by changing the dob value from 1967-3-25 to 1968-3-25:

If the validationLevel option were set to strict (the default), the updateOne statement would return an error. Instead, MongoDB updates the document without generating an error or warning because the document was already invalid.

You can test this out further by running the following updateOne statement, which attempts to update a valid document with an invalid dob value:

This time, MongoDB will return an error stating that the document failed validation. Because the document is already valid, MongoDB applies the validation rules to the document when you try to update it.

In some cases, you might want to set your validation options to their default values. For example, you might have changed them temporarily to test a particular scenario and now want to change them back. To return them to their default state, you can run the following runCommand statement:

This statement will return the candidates collection to its default behavior, as it applies to the validation rules.

Getting started with MongoDB validation rules

In the previous article, I mentioned that schema validation can be a valuable tool for enforcing constraints on how a collection’s documents are defined, but I also pointed out that you’ll likely want to limit its use to more mature applications, when you don’t require the same degree of flexibility you did when first setting up the collection. Enforcing schema validation on a schema prematurely can create more overhead. You should use schema validation judiciously until you fully understand how it will affect your applications.

With that in mind, you should now have a good foundation for working with schema validation in the MongoDB collection, at least enough to get started. There is certainly more that you can do with schema validation than what I’ve covered here, so I recommend that you review other resources, particularly the MongoDB documentation. I suggest that you start with the topic Schema Validation, which introduces you to validation rules in MongoDB.

Load comments

About the author

Robert Sheldon

See Profile

Robert is a freelance technology writer based in the Pacific Northwest. He’s worked as a technical consultant and has written hundreds of articles about technology for both print and online publications, with topics ranging from predictive analytics to 5D storage to the dark web. He’s also contributed to over a dozen books on technology, developed courseware for Microsoft’s training program, and served as a developmental editor on Microsoft certification exams. When not writing about technology, he’s working on a novel or venturing out into the spectacular Northwest woods.