Star ratings have always been an interesting thing on the Internet. They can be a way to encourage interaction with creators, shopping locations, etc. However, they can also be a bit of trouble as well. Since I started as the editor of this website, I have noticed a few things that have led to us … Read more
In the previous parts of these MySQL optimization series, we told you how query optimization works on a high level, then walked you through how you should optimize queries that insert, read, or update data. Remember the laws of physics? “What goes up, must come down”? Translated into database terms, that means “the data that … Read more
In part 2 of this series, I showed an example implementation of distributing a long-running workload in parallel, in order to finish faster. In reality, though, this involves more than just restoring databases. And I have significant skew to deal with: one database that is many times larger than all the rest and has a … Read more
Over the years Power BI has evolved into a complex and varied ecosystem of tools and solutions, which in its turn demands several supporting roles: there are, of course, developers, data engineers and data scientists, but there is need for one more, i.e. a capacity administrator. Of course some of these roles may be covered … Read more
Containerization has removed boundaries that limit developers from working on one application using different systems. Thus, boosting developer collaboration and speeding the application deployment process. Containerization involves bundling and packaging applications into containers that have all the necessary dependencies and tools for compiling an application on any operating system. Containers enable the coexistence of legacy … Read more
Over the past years, “traditional” ETL development has morphed into data engineering, which has a more disciplined software engineering approach. One of the benefits of having a more code-based approach in data pipelines is that it has become easier to build metadata driven pipelines. What does this mean exactly? Say for example you need to … Read more
The presentation layer of a headless CMS is separated from the content management system itself, making it a backend-only system for managing, creating, and storing material. Content presentation (how the content is shown on websites or applications) and content creation are handled by the content management system in a standard CMS. Headless CMSes have evolved … Read more
There was a time, when I was in a team that was designing an important IT system for a multinational bank, the testers arranged for perfectly normal office workers from the bank to try the system out. This was long before the days of instant video. The software team watched from behind a two-way mirror. … Read more
In the previous articles this series, I demonstrated various ways to retrieve document data from a MongoDB database, using both MongoDB Shell and MongoDB Compass. In this article, my focus shifts from retrieving data to updating data, which is an essential skill to have when working with MongoDB. Whether you access the data directly in … Read more
Before I started as the editor of Simple Talk, I worked on SQL Server. Only. (Ok, I used Redgate’s tools too). But when I started here, one of the goals was to stretch the topics farther and farther into more and more data platforms. And it is not just me in my niche job that … Read more
There are many packages and tools that you can use to facilitate your API development with Rust. Rust has a rich third-party ecosystem of crates for building APIs, including web packages like Actix and Rocket and ORMs like Diesel and SeaORM. This article delves into using Actix and Diesel to build web applications. You’ll learn … Read more
IAsyncEnumerable is a powerful interface introduced in C# 8.0 that allows you to work with sequences of data asynchronously. It is a great fit for building ETLs that asynchronously stream data to get it ready for transfer. You can think of IAsyncEnumerable as the asynchronous counterpart of IEnumerable because both interfaces allow you to easily … Read more
The first two articles in this series demonstrated how PostgreSQL is a capable tool for ELT – taking raw input and transforming it into usable data for querying and analyzing. We used sample data from the Advent of Code 2023 to demonstrate some of the ELT techniques in PostgreSQL. In the first article, we discussed … Read more
In the first part of this two-part series, I covered the mostly non-technical aspects of building a data culture. While the lion’s share of the work will be getting people to work together and embrace ever deeper use of data, as a reader of Simple-Talk, a lot of this transition will be technical. In this … Read more
Let’s start by defining a subset and why you would require a data subset? When dealing with the development, testing and releasing of new versions of an existing production database, developers like to use their existing production data. In doing so, the development team will be hit with the difficulties of managing and accommodating the … Read more
In my previous post, I showed how to borrow a snake draft concept from fantasy football, or a packing technique from the shipping industry, to distribute different portions of a workload to run in parallel. In the previous example, we determined a distribution order for databases based on size – though you can rank by … Read more
I recently had a restore job where I needed to split the work up into multiple parallel processes (which I’ll refer to here as “threads”). I wanted to balance the work so that the duration was something significantly less than the sum of the restore times. Imagine a job that loops through and restores each … Read more
Finally, mirroring is available for Fabric! You can mirror an Azure SQL to Fabric. It works for CosmoDB and Snowflake as well, but in this article, I will focus on Azure SQL. It is 100%, no, but it is definitely a feature that is really great even now. Before getting into a step-by-step of the … Read more
Rust is emerging as a frontrunner for ensuring memory safety without sacrificing performance. Its growing popularity isn’t solely based on the “fearless concurrency” mantra but also on its expanding ecosystem that fosters integration with various technologies. A domain Rust proves to be formidable is database interaction, and a pivotal player in this realm is the … Read more
In the first article in this transforming data series, I discussed how powerful PostgreSQL can be in ingesting and transforming data for analysis. Over the last few decades, this was traditionally done with a methodology called Extract-Transform-Load (ETL) which usually requires external tools. The goal of ETL is to do the transformation work outside of … Read more