New Series - Testing LLMs as Technical Editors

I spend a lot of editing articles from various authors about a wide variety of topics. I do my best to make sure everything that is posted to SimpleTalk.com is as correct as I can get it. All of my career was spent as a specialist of sorts. I spent my time with:

SQL Server Tools and Techniques

I suppose that wasn’t really bulleted list worthy, but it is true. I did study and work with SQL Server as a DBA, Developer, and Architect, using the relational and dimensional capabilities. Maybe that probably should have been a bulleted list. This is a fun article series, so I am taking a few liberties with the writing too.

The problem

Because I am not a generalist, I often lean on other people to do some of the technical edits. The problem with finding technical editors is that you really need to trust them more than actual writers.

This probably seems counterintuitive, but you don’t see a technical editors mistakes in the same way you see a writer’s. If a writer says that 1+1=3; this is likely a typo, especially if they have also said it was 2 in other places. You see a lot more of their work. But a tech editor might actually think 1+1=3; and just think “nice work”. (Naturally I am thinking less about elementary math and more about things like “shrinking your SQL Server database every day is a good thing to do” or something else completely wrong. Definitely possible that they don’t know what they are doing.)

As an editor, you put a lot of trust into your technical editor, especially since as the editor of a site, you put your stamp of approval on the content.

The test

So, I got this idea to test out a few LLM, ChatGPT and the Web and Office Copilot at the very least and see how they handle a load of bad advice. So I put out a question on X, asking:

“A request! Send me your most realistic, but worst, SQL Server management advice. I want to test (and write an article) about using AI to fact check writing.”

And thanks to Anders Pedersen, Koen Verbeeck, Jennifer Stirrup, Matt Oates, Marc Brooks, Michael Fisher, Rui Romano, Wes Crokett; I got this a set of advice that is all delightfully bad. I will list that advice in the next section.

With this advice I will write a couple of articles. The body of advice will stay the same, but the header will change.

One that explicitly states that this is a set of best practice advice that you can’t get along without.
One that explicitly states that these are a set of worst practices.

This should show me how much it is paying attention to the advice, and how it replies to the same advice given in different ways. After I finish this process, I will consider using one of these LLMs to generate a paper of bad advice, and again the worst advice and repeat this process.

What I am hoping for

Once I got the advice gathered, I realized I was mildly terrified by this process. Why? Because it is possible that either the LLM will give me perfect, nuanced answers as to why these suggestions are typically incorrect or it will say, these suggestions all seem to be valid advice.

One means easily be replaced by a machine. The second means the tools are useless for technical editing. I am more concerned by the latter, because I rarely do get a “this is technically incorrect” from asking for review of a document. I did say rarely, which is encouraging, but this test should help me decide.

The test data

In this section, I will present the tips I was given, and then I will link to a .zip file with the two text versions of the article that I mentioned saying these are and are not, good tips. Here are those tips:

Testing in production is the best way to test because production is ‘clean’.
Don’t need a backup or restore plan – we run all of the ETL pipelines again every time we lose the database (sidebar note: this happened frequently)
Autogrow files so they don’t have to think about it
Store images for your website on the same server as your database
Need SQL Server Analysis Services, the engine, SSIS and DQ? Why not run them all on the same machine so they can talk to one another better?
Use auto-shrink to keep your data files lean and mean
Give all developers SA in production so you, the DBA, don’t have to be woke up at night to fix their deployments
Always use an index on a binary column as the first segment as it halves the search space. This is especially useful for active vs. inactive flags.
Always give your server less than or equal to 64 GB of RAM when your database is 1 TB in size.
Always use the object-first ORM approach to generate your database for optimal and efficient schemas. Avoid database-specific features like table-valued functions or materialized views to prevent vendor lock-in without added benefits.
Always use NOLOCK in each and every SQL statement.
You shouldn’t change the default MAXDOP, as it has been set for you to handle all situations.

So those are the raw details, and in the text you will note that I have expounded just a little bit on each of these to make a section with a paragraph or two, giving some background on why these are “good” ideas. Writing that text definitely did not turn my stomach at all!

Here is the zip file with the two versions of the article in both .txt and .docx formats (I did not want them entering the search engines and dropping the Simple Talk authority value!)

What next

In the coming articles, I will take the two articles and test them with the different LLM tools one article at a time and include some of the output. This should let us know at least a little bit about what it can do when answers are wrong.

Depending on how this goes, I will probably try a more subtle approach (try an article that is mostly right, and add a stinker in there), and then maybe see what happens when I ask it to rewrite these articles. I know it will be fun, and I hope it will help people to see just how much they can trust the mainstream LLM tools to help them make sure their writing makes sense.

Register for Simple Talk

New Series – Testing LLMs as Technical Editors

The problem

The test

What I am hoping for

The test data

What next

Article tags

About the author

Louis Davidson

Louis's contributions

Articles

Books

Top topics

Louis's latest contributions:

The good, the bad, and the awful taste muscle memory can have

Sponsoring Scenic City Summit

What about Loyalty?

The problem

The test

What I am hoping for

The test data

What next

Article tags

Recommended

About the author

Louis Davidson

Louis's contributions

Articles

Books

Top topics

Louis's latest contributions:

The good, the bad, and the awful taste muscle memory can have

Sponsoring Scenic City Summit

What about Loyalty?