Architecting Database Dev and Test Environments: Best Practices and Anti-Patterns for SQL Server
Many organizations use out-of-date architecture patterns for developing relational databases which are a pain to manage, slow developers down, and limit testing and innovation. It’s time to modernize these environments and improve standards for database development.
In the years I worked as a consultant specializing in SQL Server administration and tuning, I frequently fielded questions from customers about how to best manage databases in development and test environments. Now that I talk to folks about database DevOps, I have these conversations even more often.
These questions come up often because it’s common—and very easy– for database environments become a management nightmare for development and testing purposes.
This pain is often a result of the requirements which are set by stakeholders. While all of these requirements individually sound sensible, they add up to trouble:
- Database Administrators believe that databases in development and test environments should be as production-like as possible for performance testing
- Senior managers in IT and Software Engineering require development and test environments to be resilient so that work is not blocked by the failure of a single component
- Leaders in IT require the use of platforms for cost-savings techniques—commonly this is still virtualization when it comes to databases– wherever possible in development in test
When combined, these requirements lead to database environments which are more complex than production. These environments are tricky to manage and often perform poorly.
It is common for datasets used for development and test purposes in these environments to fall out of date, and it’s also common for processes to refresh the database to take a significant amount of time, requiring IT staff to work outside of hours and/or stalling development work while the data refresh is in progress.
SQL Server Availability Groups are often implemented due to requirements 1 and 2
Availability Groups are tricky to manage and troubleshoot even in a production environment. Adding these into a development or test environment typically backfires.
Generally, it is acceptable to have slower storage and less bandwidth in these environments. This additional latency easily causes failures in Availability Groups and adds more problems into these environments than it solves.
Things get even worse when requirement 3, the need to optimize for cost-savings, enters into the picture. The combination of Availability Groups with virtualization technology increases the complexity of the environment significantly.
Shared development databases are often implemented due to multiple reasons arising from these factors
Large production-like datasets and the complexity of Availability Groups, Clusters, and / or virtualization require specialist DBA resources to set up these environments manually.
These shared environments are often used by multiple teams who are each working on their own stream of work, which makes it difficult to refresh the datasets in these environments. (Shared development add a significant amount of complexity for developers in the SDLC – more on this in a bit.)
The traditional “wall of confusion” between IT teams and developers tends to grow high in these traditional development environments
With a complex environment that is difficult to manage used by multiple developers, the IT team typically locks permissions down on these environments so that developers can’t accidentally do any damage.
This unfortunately puts the developers into a situation where they have very minimal visibility into how the environment is set up and how the database is configured. They have a limited ability to troubleshoot, learn about the database platform, and innovate.
Instead of the “production-like” environment increasing understanding of how production is configured, this effectively makes database and instance level configuration in dev and test environments invisible to developers and fully “someone else’s job”.
Sidebar: policy restrictions on development teams factor into database choice
Often the strict permission limitations and restrictions I mention above are only implemented for traditional relational databases such as SQL Server, Oracle, and MySQL.
Developers often gain much more freedom and the ability to innovate simply by using a non-relational database, as old company policies aren’t in place for those. I have seen this drive technology choices in more than one project. I don’t blame the developers for this: developers need to understand their platforms and to be able to experiment with them.
The problem here is outdated policies and architecture for relational databases.
People commonly pretend that transactional replication doesn’t exist in production
While it’s pretty common for Availability Groups, clusters, and virtualization to combine to create super complex dev and test environments, there is one place where teams often draw the line: SQL Server replication.
Transactional replication is generally not included in dev and test environments. This is a natural result of what I’ve outlined above:
- When you’ve got complexity due to clusters/Availability Groups/virtualization, the complexity of implementing transactional replication is higher.
- Transactional replication does make the dataset restore process more difficult in dev and test environments, as a specific restore sequence is needed for all the databases involved. When data masking is required for PII implementation, this becomes even harder.
- As noted above, developers are often given quite minimal permissions in these environments and are put into the role of “consumers” of the database – making the specialist DBA team the only ones who understand or who can work with replication.
The gotcha here is that transactional replication is much closer to the software development process than something like Availability Groups. You may customize the schema of replicated tables, for example, and this is something you can benefit from capturing in source control.
My position is that that while it is an anti-pattern to mimic the high-availability features of production in development and test environments, it is a best practice to carefully consider whether transactional replication should be present in those environments in some way.
Implementing automation to create and manage transactional replication in development and test environments can be time consuming, so the key question here is whether having replication in dev and test environments will result in significant long-term efficiency and improved quality in the deployment of database changes for the database in question.
The pains of poor dev and test database environment architecture
You might be wondering – if development and test databases are unwieldly to manage, what’s the big deal? It can’t be that much of a pain, can it?
This has pretty significant impacts and limits what you can deliver to your customers. There are a lot of these pains.
Here are the top problems I see with unwieldy dev and test environments with shared development databases. (I’m far from the first person to see these problems — Troy Hunt wrote an excellent article in 2011 giving his take on these problems.)
Out of date datasets
Unless you specifically design your dev and test environments to support database creation on the fly, refreshing datasets will typically be a struggle. The more complex the database environment and the more users of a single database, the trickier it is to arrange a time when the database may be refreshed without losing valuable work.
This leads to out-of-date datasets which have minimal or no data at all for tables and columns which were added since the last data refresh. It’s hard for developers and testers to trust that these environments validate their changes properly. This adds significant risk to the deployment process.
Having your changes overwritten
Shared development databases are full of pitfalls. It’s incredibly easy for team members to accidentally overwrite each other’s changes.
Introducing features such as ‘locking’ of objects may mitigate some risks of this situation, but then only allows one person to work with an object as a time – this slows down the team and may put pressure on people to not do thorough testing or validation of their changes so that someone else can work with the object.
Accidentally reviewing or checking in other people’s code
Another common error with shared development databases is that it’s all too easy to pick up changes which someone else has made to the database and accidentally commit them to version control along with one’s own changes. Were those changes meant to be permanent?
Inability to experiment freely when developing
When working on changes, a developer needs to be free to validate ideas. Often, validating ideas requires making changes to database schema. When working in a shared database, changing schema and/or data can easily interfere with others who are doing work—confusing people, causing errors, or slowing others down.
The overhead of communication here is significant. Shared development databases discourage developers from experimenting freely.
Conversely, dedicated/individual development database environments which are easy to reset encourage developers to experiment freely and empower them to validate ideas locally without impacting the productivity of others.
Constantly clashing with distributed version control systems
A major benefit of distributed version control systems such as Git is the ability for developers to develop new features in branches.
Using a shared development database with feature branches creates an inherent conflict: the database cannot contain multiple states of the same objects. This leaves the team with difficult choices about how to collaborate in branches when the database contains a state reflecting parts of multiple branches at the same time.
There is no perfect answer to these choices, which adds risk that changes will be deployed further down the pipeline before they are ready.
Difficulty implementing automation
In modern software development processes, such as Pull Request workflows, it’s incredibly useful to be able to spawn or instantiate a fresh database environment for early validation of changes. This environment is much more useful when it contains a dataset.
Long-lived, highly complex development database environments don’t help implement this automation in any way.
If you instead look to implement your entire development approach with the goal to be able to recreate databases on the fly, adding automation into your CI/CD deployment pipelines becomes incredibly easy.
Limited ability to test when deployment order changes
When multiple teams are developing in a single environment, it’s common for the expected deployment order to frequently change. A bug may be found in a feature, or a team may simply change their priorities. Using shared, long-lived environments for testing creates significant complexity when deployment order changes. Typically with these environments, testing must be done against a database which contains additional changes which will not have gone to production at the time of deployment. This increases the chances for error.
Inability to troubleshoot properly
As I mentioned above, long-lived shared development environments tend to be “precious” and hard to replace, and this leads to low permissions for developers.
Low permissions locks developers out of activities such as tracing against the SQL Server instance. Traces are a simple and useful way to clearly understand what is occurring in a situation. A lack of permissions leads to wasted time, and sometimes causes misdiagnosis of an issue.
Long term patching and management pains
Nobody enjoys patching development and test environments. Long-lived development and test environments require significantly more maintenance than environments designed for “on the fly” database creation.
Let’s have another look at those requirements
We can’t just throw away the requirements we have for our environment.
But we can talk to our stakeholders about these requirements, and we can propose revisions.
My goal here is to create more flexible, modern development and test environments, but to still meet the essential needs of the stakeholder’s requirement.
Stakeholder | Initial Requirement | Reality | Revised / Essential Requirement |
Database Administrators | Databases in dev and test environments should be as production-like as possible for performance testing | Factors such as slow disk, low memory, and slower CPUs often mean dev and test environments are unsuitable for performance testing | At least one environment should be as production-like as possible to support performance testing. Staging is often suitable for this. |
Senior Managers | Dev and test environments should be resilient, so work is not blocked by the failure of a single component | The methods to achieve resiliency in production become cumbersome and unmanageable in dev and test, where databases need to frequently be refreshed | Dev and test environments should be resilient — but the best form of resilience is the ability to quickly reset and/or spin up new databases in the case of a failure or error |
IT Leadership | Dev and test environments should use platforms for cost-savings techniques | Most cost-savings techniques do not combine well with separate high-availability techniques used in production environments | Dev and test environments should use platforms for cost-savings techniques inasmuch as this can be done without impeding manageability |
Keeping these revised requirements in mind and having the goal of designing flexible environments which promote quality testing and enable automation with low maintenance, we can draw out some best practices for development and test environments.
Best practices for dev and test environment architecture
1. Critical databases should have at least one staging environment
This environment supports performance testing and any other testing which requires a “production-like” implementation including production’s high availability technology.
- Keep staging as production-like as possible
- Automate refresh/reset of staging environment as much as possible
- Monitor staging environment fully, use staging to validate monitoring and configuration changes just like code changes
- Use production datasets in staging
- Minimal permissions with automatic permission escalation for developers in staging
2. Development and test environments should be ephemeral
Development and test environments should be able to be quickly generated on demand, both manually and in an automation scenario such as Pull Requests.
- Dev and test environments should mimic production datasets (to an extent—see below), but NOT production availability techniques
- In the case when deployment order of changes is unknown or is potentially unpredictable, test environments should be able to be re-generated so that changes can be easily and quickly re-applied in the expected order following the re-shuffle. (See “Reordering Deployments in Database DevOps” for more detail on this.)
3. “Safe” datasets should be available for dev and test environments
No matter how you architect your dev and test environments, if they contain Personally Identifying Information or other sensitive data, you are putting your business at risk. Development and Test environments are an attractive target for a data breach.
However, you implement a modern architecture for your dev and test databases, it is important that the source of the data you use – be it the source of a snapshot, a database “image”, or something else – contains data which has been sanitized in some way to limit the risks of a data breach.
4. Developers should have full control/ high permissions over dev and test environments
This one is obvious for some organizations, but very scary for others. Developers need to have high permissions in order to experiment.
As long as developers are provided with safe datasets and a technology which allows them to easily recreate environments, most of the objections to high permissions for developers go away. If it is easy to fix any problems which accidentally occur, what is wrong with high permissions?
Development and test environments need to be a place where it’s OK – and even welcome for failures to occur, because we have the ability to quickly recover, learn from the error, and move on and do better.
What about User Acceptance Testing (UAT) environments?
User Acceptance Testing environments are common. UAT environments allow customers to review new features prior to deployment to production.
UAT environments are quite different from Staging environments. The goal is often to allow a simple interaction with a new feature and NOT to present a production-like experience.
In many organizations, UAT environments are not used constantly. In these organizations, customer approval is only required for significant changes which cannot be shipped to the production environment behind a feature flag.
The ability to generate UAT environments on the fly is very useful. This eliminates maintenance and support when the environment is not in use, and also allows generating multiple environments when needed if that is helpful.
What about cloud environments with PAAS databases?
Azure Platform as a Service (PAAS) databases (Azure SQL Database and Azure SQL Database Managed Instances) have some special requirements which may limit your options for development environments:
- Code for Azure SQL Database’s “external table” feature only works against Azure SQL Databases, thereby making it impossible to develop against non-Azure SQL Databases reliably
- Azure SQL Database has a “serverless” mode which can make dedicated development database environments more affordable than they would be otherwise, but Azure SQL Database Managed Instances do not have this feature
- Azure SQL Database and Managed Instances do not allow third party tools to access the file system, and do not provide backups which can be restored to non-Azure SQL Database / Managed Instances. This limits portability of databases.
These factors often push teams toward shared development databases for Azure SQL Database Managed instances, and sometimes also for Azure SQL Databases if the size of the database is such that regularly refreshing multiple environments is painful.
Customers have many more options with SQL Server PAAS options hosted by Amazon and Google, as these offerings allow access to native database backups. This portability enables customers to freedom in architecting development and test environments.
Pulling this all together
To bring the database fully into modern development practices, organizations can no longer use the development and test database architectures that were put into place twenty years ago. These architectures slow down the development and testing process, provide only limited testing, and inhibit innovation.
To increase the tempo of changes delivered, decrease lead time for changes, and ensure high quality changes, follow these principles:
- Critical databases should have at least one staging environment
- Development and test environments should be ephemeral
- “Safe” datasets should be available for dev and test environments
- Developers should have full control/ high permissions over dev and test environments