This article is part of Robert Sheldon's continuing series on Mongo DB. To see all of the items in the series, click here.
In the previous article in this series, I introduced you to schema validation in MongoDB. I described how you can define validation rules on a collection and how those rules validate document inserts and updates. In this article, I continue the discussion by explaining how to apply validation rules to documents that already exist in a collection. Before you start in on this article, however, I recommend that you first review the previous article for an introduction into schema validation.
The examples in this article demonstrate various concepts for working with schema validation, as it applies to existing documents. I show you how to find documents that conform and don’t conform to the validation rules, as well as how to bypass schema validation when inserting or updating a document. I also show you how to update and delete invalid documents in a collection. Finally, I explain how you can use validation options to override the default schema validation behavior when inserting and updating documents.
Note: For the examples in this article, I used the same MongoDB Atlas and MongoDB Compass environments I used for the previous articles in this series. Refer to the first article for details about setting up these environments. Because this article builds off the previous article, I’ve also used the same hr
database and candidates
collection for the examples. If you’ve already deleted the database and collection, you can simply re-create them. The goal is to start with an empty collection with no validation rules defined.
Searching for invalid documents in a MongoDB collection
When you define schema validation rules on a collection that already contains documents, some of those documents might not adhere to the new rules. In which case, you might want to search your collection to determine which documents conform to the rules and which ones do not.
In this section, I demonstrate how to find valid and invalid documents. If you want to try out the examples, you should create the hr
database and candidates
collection, if they don’t already exist.
The examples in this article were developed for MongoDB Shell. You can use the version of Shell integrated into MongoDB Compass, or you can use the version available through your system’s command-line utility. I like using the version in Compass because I can quickly review my inserted and updated documents.
If your collection contains any documents, delete those and then delete any validation rules. To remove validation rules, open MongoDB Shell, switch to the hr
database, and run the following runCommand
statement:
1 |
db.runCommand( { collMod: "candidates", validator: {} } ); |
The runCommand
method calls the collMod
database command, which lets you modify options on the specified collection. In this case, the command calls the validator
method, just like you saw in the previous article when defining validation rules. This time, however, the method’s argument is an empty document (curly brackets), which means that no rules will be defined. If rules already exist, they will be removed.
When you run this statement, you should receive an OK message that indicates the statement successfully executed. Of course, you could have dropped and re-created the collection, rather than deleting the documents and schema rules. It’s up to you which approach you take, although it can be useful to know how to remove validation rules.
Once you have a clean slate, you should run the following insertMany
statement, which adds four simple documents to the candidates
collection:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
db.candidates.insertMany([ { _id: 101, "name": "Drew", "dob": "1973-9-12" }, { _id: 102, "name": "Parker", "dob": new Date("1982-12-2") }, { _id: 103, "name": "Harper", "dob": "1967-3-25" }, { _id: 104, "name": "Darcy", "dob": new Date("1999-5-18") } ]); |
The dob
values in these documents are intentionally defined with different data types—two as String
values and two as Date
values. The difference in types will help to demonstrate how to find valid and invalid documents, as you’ll see shortly. When you run the insertMany
statement, it should return the following results, which indicate that the documents have been added to the collection:
1 2 3 4 5 6 7 8 9 |
{ acknowledged: true, insertedIds: { '0': 101, '1': 102, '2': 103, '3': 104 } } |
After you add the documents, you can run the following runCommand
statement to define a simple set of schema validation rules:
1 2 3 4 5 6 7 8 9 10 11 12 |
db.runCommand( { collMod: "candidates", validator: { $jsonSchema: { bsonType: "object", required: [ "name", "dob" ], properties: { "name": { bsonType: "string" }, "dob": { bsonType: "date" } } } } }); |
The statement calls the validator
method, which in turn calls the $jsonSchema
operator. The operator defines a JSON Schema object that contains the validation rules. Notice that the properties
element specifies that the dob
field must be the date
type. The rest of the statement’s components work just like you saw in the previous article. If you have any questions about what I’ve done here, refer to that article.
Although the documents and validation rules that we’ve added to the collection are relatively simple, they are enough to demonstrate how to find valid and invalid documents. The same principles apply no matter how complex your documents or validation rules.
When working with a collection’s schema validation, there might be times when you want to view or retrieve the validation rules themselves. For this, you can use the getCollectionInfos
database method, specifying the validator
option, as shown in the following statement:
1 2 |
db.getCollectionInfos( { name: "candidates" } )[0].options.validator; |
The getCollectionInfos
method lets you view different options about the specified collection, in this case, candidates
. To retrieve the validation rules, you must specify the validator
option as it is constructed here. The statement should return the following $jsonSchema
object:
1 2 3 4 5 6 7 |
{ '$jsonSchema': { bsonType: 'object', required: [ 'name', 'dob' ], properties: { name: { bsonType: 'string' }, dob: { bsonType: 'date' } } } } |
Together, the runCommand
method and validator
option make it easy to retrieve a collection’s validation rules so you can see how they’ve been defined. You can also use this approach to find valid and nonvalid documents.
To search for valid documents, you need to include the JSON Schema object as an argument to the find
collection method. A good way to do this is to assign the object to a variable and then pass the variable in as an argument to the find
method. For example, the following let
statement assigns the JSON Schema object returned by the getCollectionInfos
method to the schema1
variable:
1 2 3 |
let schema1 = db.getCollectionInfos( { name: "candidates" } )[0].options.validator; |
After you define the variable, you can pass it into the find
statements that you run within the same MongoDB Shell session. Before I demonstrate how to do this, however, I want to first show you how to assign the JSON Schema object definition directly to the variable, without using the getCollectionInfos
method. In this way, you can check for valid and invalid documents before you define validation rules on a collection.
To assign the JSON Schema object directly to the variable, use the $jsonSchema
operator to define the object, using this as the variable value:
1 2 3 4 5 6 7 8 9 10 11 |
let schema1 = { $jsonSchema: { bsonType: "object", required: [ "name", "dob" ], properties: { "name": { bsonType: "string" }, "dob": { bsonType: "date" } } } }; |
Regardless of which approach you use to define your variable, you can easily verify its contents by running the following print
command:
1 |
print(schema1); |
The command should return the $jsonSchema
object, as shown in the following results:
1 2 3 4 5 6 7 |
{ '$jsonSchema': { bsonType: 'object', required: [ 'name', 'dob' ], properties: { name: [Object], dob: [Object] } } } |
After you define the schema1
variable you can use it in your find
statements to search for valid and invalid documents. For example, the following find
statement searches the candidates
collection for documents that conform to the JSON Schema object in the schema1
variable:
1 |
db.candidates.find( schema1 ); |
The statement returns the following results, which include the documents with the _id
values of 102
and 104
:
1 2 3 4 5 6 7 8 9 10 |
{ _id: 102, name: 'Parker', dob: 1982-12-02T08:00:00.000Z } { _id: 104, name: 'Darcy', dob: 2003-07-02T07:00:00.000Z } |
If you want to search for the documents that do not conform to the validation rules, you can use the $nor
logical operator, along with the schema1
variable:
1 |
db.candidates.find( { $nor: [ schema1 ] } ); |
This time, the find
statement returns only the invalid documents, which include the documents with the _id
values of 101
and 103
:
1 2 3 4 5 6 7 8 9 10 |
{ _id: 101, name: 'Drew', dob: '1973-9-12' } { _id: 103, name: 'Harper', dob: '1967-3-25' } |
As you can see, the value for the dob
field in the results is a String
value in both cases, which violates the validation rules. The field must take a Date
value to conform to the rules. Except for this issue, the documents conform to the validation rules in all other ways.
Bypassing schema validation in a MongoDB collection
At times, you might want to add a document to a collection that violates the validation rules. For example, you might need to restore data from an archive file that includes invalid documents, which you want to preserve them in their original state.
As you saw in the previous article, MongoDB will return an error when you try to insert a document that violates the validation rules. For instance, MongoDB returns an error if you try to run the following insertOne
statement:
1 2 3 4 5 6 7 |
db.candidates.insertOne( { _id: 105, "name": "Carey", "dob": "2003-7-2" } ); |
The statement attempts to insert a document whose dob
field is a String
value, which violates the schema we defined above. However, you can override this behavior by including the bypassDocumentValidation
option in your insertOne
statement, as in the following example:
1 2 3 4 5 6 7 8 |
db.candidates.insertOne( { _id: 105, "name": "Carey", "dob": "2003-7-2" }, { bypassDocumentValidation: true } ); |
The bypassDocumentValidation
option takes a Boolean as its value. When the option is set to true
, MongoDB ignores the validation rules and inserts the document into the collection. The option works the same way for an insertMany
statement.
When you include the bypassDocumentValidation
option in an insertOne
or insertMany
statement, the option applies only when you run the statement. Once you insert the document into the collection, you cannot update it in a way the violates the validation rules. For example, if you try to run the following updateOne
statement, MongoDB will return an error message stating that the document failed validation:
1 2 3 4 |
db.candidates.updateOne( { _id : 105 }, { $set: { "dob" : "2003-7-3" } } ); |
However, the updateOne
method, as well as the updateMany
method, also support the bypassDocumentValidation
option, as in the following example:
1 2 3 4 5 |
db.candidates.updateOne( { _id : 105 }, { $set: { "dob" : "2003-7-3" } }, { bypassDocumentValidation: true } ); |
When you run this statement, MongoDB ignores the validation rules, updates the document, and returns a message indicating that one document has been modified.
As with inserting data, the bypassDocumentValidation
option applies only to the updateOne
or updateMany
statements when you run them. If you later try to update the statement, it must once again conform to the validation rules or include the bypassDocumentValidation
option.
Updating invalid documents in a MongoDB collection
By default, MongoDB applies validation rules whenever you try to insert or update a document, unless you include the bypassDocumentValidation
option in your statements. However, MongoDB does not apply those rules to the documents that already exist in the collection at the time you define the validation rules, which means your collection could contain invalid documents.
In some cases, you might want to update the invalid documents to bring them into conformance with the validation rules or to mark them in some way as being invalid. Before you update the documents, however, you can first search for them so you can assess what you have. For this, you can again use the find
method, along with the $nor
operator and JSON Schema object variable.
You can use the schema1
variable you defined earlier as long you’re working in the same user session. If you’ve reconnected to MongoDB since running those earlier statements, you’ll need to redefine the variable before you can use it, in which case, you should again run the following let
statement:
1 2 3 |
let schema1 = db.getCollectionInfos( { name: "candidates" } )[0].options.validator; |
Once you’ve defined the variable, you can then run your find
statement:
1 |
db.candidates.find( { $nor: [ schema1 ] } ); |
The statement returns the following results, which indicate that the candidates
collection contains three invalid documents:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
{ _id: 101, name: 'Drew', dob: '1973-9-12' } { _id: 103, name: 'Harper', dob: '1967-3-25' } { _id: 105, name: 'Carey', dob: '2003-7-3' } |
To update the invalid documents, you can use $nor
operator and schema1
variable within an updateMany
statement as part of the statement’s filter. For example, the following updateMany
statement uses the operator and variable to change the dob
field in the invalid documents:
1 2 3 4 |
db.candidates.updateMany( { $nor: [ schema1 ] }, [ { $set: { "dob": { $toDate: "$dob" } } } ] ); |
The statement first searches for the invalid documents—using the $nor
operator and the schema1
variable—and then uses the $toDate
method to set the field’s data type to Date
. When you run the statement, you should receive a message indicating that the three documents were modified. If you were to examine the documents, you would find that the dob
field in each document is now defined with the Date
data type.
In this case, the updateMany
statement modifies all invalid documents because it assumes that the dob
value needs to be updated in every one of those documents. However, there might be times when you need to refine your statement’s filter to target only a subset of the invalid documents, as in the following example:
1 2 3 4 |
db.candidates.updateMany( { $nor: [ schema1 ], "dob": { $type : "string" } }, [ { $set: { "dob": { $toDate: "$dob" } } } ] ); |
This time, the updateMany
statement looks for documents that are invalid documents and whose dob
field has a String
value. If there are documents that violate the validation rules in other ways, they will not be updated.
Deleting invalid documents in a MongoDB collection
You can use the same logic—the $nor
operator and schema1
variable—when deleting invalid documents. For example, suppose you insert the following document into the candidates
collection:
1 2 3 4 5 6 7 8 |
db.candidates.insertOne( { _id: 106, "name": "Avery", "dob": "1993-11-24" }, { bypassDocumentValidation: true } ); |
Notice that the document includes the bypassDocumentValidation
option, allowing the document to be added even if it violates the validation rules, which it does.
After you insert the document, you might decide to delete it, along with any other invalid documents in the collection. For this, you can use a deleteMany
statement, once again incorporating the $nor
operator and schema1
variable, as in the following example:
1 |
db.candidates.deleteMany( { $nor: [ schema1 ] } ); |
When you run the statement, MongoDB should return a message indicating that one document has been deleted (assuming you’ve been following along with the examples). You can verify the deletion by running the following find
statement:
1 |
db.candidates.find( { $nor: [ schema1 ] } ); |
The statement should now return no results because the candidates
collection no longer contains invalid documents. At this point, the collection should include only five documents, all of them in conformance with the validation rules. In other words, the dob
field in each document is now defined with the Date
data type.
Setting the schema validation action and level
The examples in this article, as well as those in the previous article, demonstrated how validation rules can affect your insert and update operations. As you have seen, you can insert documents only if they conform to those rules, unless you specify the bypassDocumentValidation
option. Likewise, you can update documents only if the updated documents will conform to the rules.
However, this is only the default behavior. You can actually control how inserts and updates are handled when defining your validation rules. To implement these controls, you must include one or both of the following options when defining your validation rules:
validationAction.
Determines whether MongoDB generates an error or warning when a document violates the validation rules. If the option is set toerror
(the default), MongoDB rejects the inserted or updated document and issues an error. If the option is set towarn
, MongoDB permits the document to be inserted or updated, but records the violation in the MongoDB log.validationLevel.
Determines how MongoDB should apply the validation rules. If the option is set tostrict
(the default), MongoDB applies the rules to all inserted and updated documents. If the option is set tomoderate
, MongoDB applies the rules to all inserted documents but only to updates in which the document being updated is already valid. If the document is invalid, MongoDB does not enforce the validation rules during the update.
Let’s take a look at a few examples to better understand how these options work. If you want to try them out for yourself, first run the following statements, which delete all the documents, removes the validation rules, and adds the original documents back into the collection:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
db.candidates.deleteMany({}); db.runCommand( { collMod: "candidates", validator: {} } ); db.candidates.insertMany([ { _id: 101, "name": "Drew", "dob": "1973-9-12" }, { _id: 102, "name": "Parker", "dob": new Date("1982-12-2") }, { _id: 103, "name": "Harper", "dob": "1967-3-25" }, { _id: 104, "name": "Darcy", "dob": new Date("1999-5-18") } ]); |
The next step is the define the validation rules, once again using a runCommand
statement. This time, however, you should include the validationAction
and validationLevel
options in your collMod
command, as in the following example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
db.runCommand( { collMod: "candidates", validator: { $jsonSchema: { bsonType: "object", required: [ "name", "dob" ], properties: { "name": { bsonType: "string" }, "dob": { bsonType: "date" } } } }, validationAction: "warn", validationLevel: "strict" }); |
The runCommand
statement defines the same validation rules we’ve been using throughout this article, but it also includes the two validation options. The validationLevel
option is set to its default value, strict
, so nothing has changed there, but the validationAction
option is set to warn
, which changes MongoDB’s default behavior.
We can test this change by trying to insert a document into the candidates
collection. The following insertOne
statement attempts to add an invalid document to the collection, without including the bypassDocumentValidation
option:
1 2 3 4 5 6 7 |
db.candidates.insertOne( { _id: 105, "name": "Carey", "dob": "2003-7-2" } ); |
If the validationAction
option were set to error
(the default), the insertOne
statement would return an error. Instead, the option is set to warn
, so MongoDB adds the document to the collection and records the violation in the MongoDB log. (To view the log on Atlas, refer to the Atlas documentation.)
This principle works the same when updating documents. For example, the following updateOne
statement attempts to update a field to an invalid value, without including the bypassDocumentValidation
option:
1 2 3 4 |
db.candidates.updateOne( { _id : 105 }, { $set: { "dob" : "2003-7-3" } } ); |
As with the insertOne
statement, MongoDB does not return an error but instead updates the document and records the violation in the MongoDB log.
Now let’s modify our validation option settings, this time setting the validationAction
option to error
(the default) and the validationLevel
option to moderate
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
db.runCommand( { collMod: "candidates", validator: { $jsonSchema: { bsonType: "object", required: [ "name", "dob" ], properties: { "name": { bsonType: "string" }, "dob": { bsonType: "date" } } } }, validationAction: "error", validationLevel: "moderate" }); |
As you’ll recall, the candidates
collection includes two documents whose dob
field is a String
value and two documents whose dob
field is a Date
value. As a result, the documents with the _id
values 101
and 103
are invalid.
Let’s try to update the 103
document by changing the dob
value from 1967-3-25
to 1968-3-25
:
1 2 3 4 |
db.candidates.updateOne( { _id : 103 }, { $set: { "dob" : "1968-3-25" } } ); |
If the validationLevel
option were set to strict
(the default), the updateOne
statement would return an error. Instead, MongoDB updates the document without generating an error or warning because the document was already invalid.
You can test this out further by running the following updateOne
statement, which attempts to update a valid document with an invalid dob
value:
1 2 3 4 |
db.candidates.updateOne( { _id : 104 }, { $set: { "dob" : "1999-6-18" } } ); |
This time, MongoDB will return an error stating that the document failed validation. Because the document is already valid, MongoDB applies the validation rules to the document when you try to update it.
In some cases, you might want to set your validation options to their default values. For example, you might have changed them temporarily to test a particular scenario and now want to change them back. To return them to their default state, you can run the following runCommand
statement:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
db.runCommand( { collMod: "candidates", validator: { $jsonSchema: { bsonType: "object", required: [ "name", "dob" ], properties: { "name": { bsonType: "string" }, "dob": { bsonType: "date" } } } }, validationAction: "error", validationLevel: "strict" }); |
This statement will return the candidates
collection to its default behavior, as it applies to the validation rules.
Getting started with MongoDB validation rules
In the previous article, I mentioned that schema validation can be a valuable tool for enforcing constraints on how a collection’s documents are defined, but I also pointed out that you’ll likely want to limit its use to more mature applications, when you don’t require the same degree of flexibility you did when first setting up the collection. Enforcing schema validation on a schema prematurely can create more overhead. You should use schema validation judiciously until you fully understand how it will affect your applications.
With that in mind, you should now have a good foundation for working with schema validation in the MongoDB collection, at least enough to get started. There is certainly more that you can do with schema validation than what I’ve covered here, so I recommend that you review other resources, particularly the MongoDB documentation. I suggest that you start with the topic Schema Validation, which introduces you to validation rules in MongoDB.
Load comments