MySQL Index Overviews: FULLTEXT B-Tree Indexes

Comments 0

Share to social media

If you’ve been following Simple Talk for a while, you know that we’re fans of databases and everything they involve. We’re also big fans of database performance, too. And if you’re a developer or a DBA, you surely know one of the primary answers to database performance – indexing! We’ve already told you about indexes when we talked about multiple approaches to improve your query performance and we’ve already told you about indexing nuances during some of our previous blog posts, too.

We’ve also walked you through the default – B-Tree – type of indexes in MySQL. Now it’s time to dive into its cousin – full-text indexing!

What is Fulltext Indexing?

Full-text indexes are special – they’re a type of index that enables fast and efficient searching through data using rather exotic search modes: in MySQL, those search modes include, but are not limited to the Boolean mode, natural language mode, and a search mode using query expansion. Some of the search modes made possible by using full-text indexes provide unique support for interpreting specific characters differently, searching for implied data, and other specific nuances.

Full-text indexing works similarly across multiple database management systems and hasn’t changed much during the years – take a look at Robert Sheldon’s blog on full-text indexes from more than two decades ago and you will surely find something that’s changed in your SQL Server installation, but you will also find a lot of things that remained the same; the same can be said about other database management systems.

Full-text search capabilities are a means to an end: a means to add full-text search functionality onto their application or website or perform other complex searches through bigger data sets. They allow your search engine to act a little like Google does: it allows your database to return relevant results without you asking for them by using the expansion mode, allows you to search with fuzzy matching using the * character as the wildcard, allows you to search for exact matches by wrapping your query in double quotes, and has other unique capabilities.

Grab the data used in this example from this link, and it is included in the Appendix as well.

Fulltext Indexes in MySQL

In MySQL, full-text indexes facilitate searches of 3 types:

  1. Searching using a natural language search mode is made possible when no search mode is specified or when searches are conducted in the NATURAL LANGUAGE MODE.
  2. Full-text searches with query expansion are possible using assumed knowledge and can be defined if we define WITH QUERY EXPANSION at the end of our query. Queries using such a search mode will run twice – once running the search in a default mode (the natural language mode), and the second time by adding the most relevant results from the first search and running the search again, thus coming back with an interesting result set in the end.
  3. Boolean full-text searches can be helpful if your application needs to search through data using various Boolean characters that have a varied meaning (“~” lowers the contribution of the value to the search results, “-“ tells your database that the value mustn’t be included, searching using double quotes only matches literal values, etc. Such searches necessitate the definition IN BOOLEAN MODE.)

It’s also worth knowing that when full-text indexes are in use, certain characters are represented differently. For example, if our column doesn’t have a full-text index on top of it and we’re using a wildcard to power our searches, the wildcard needs to be represented as the “%” character while if full-text indexes are in use the wildcard becomes the “*” character and the “%” character won’t work.

These things considered, full-text indexes in MySQL can be defined on CHAR, VARCHAR, and TEXT columns, and they should be defined like so:

If you want to define a full-text index on a table once it’s already created, make use of an ordinary ALTER clause with a FULLTEXT definition:

ALTER TABLE `people` ADD FULLTEXT(first_name);

D:\Misc\Redgate\Blogs\June 2024\MySQL Index Deep Dive - FULLTEXT Indexes\Images\Defining a FT Index - ALTER.png

Image 1 – Defining a Full-text Index when Modifying a Table

Fulltext indexes can also be created using the CREATE FULLTEXT INDEX clause:

D:\Misc\Redgate\Blogs\June 2024\MySQL Index Deep Dive - FULLTEXT Indexes\Images\Defining a FT Index - ALTER and CREATE.png

Image 2 – Creating Fulltext Indexes with CREATE FULLTEXT INDEX

ALTER TABLE people ADD FULLTEXT(first_name); would add a full-text index first_name on the column first_name, while CREATE FULLTEXT INDEX `ft_idx` ON `people`(first_name); would add a full-text index ft_idx on the first_name column.

That means that if an index name is not specified, MySQL would create an index titled exactly as the column is titled. If an index name is specified, MySQL will create an index with a name that has been provided.

On a related note, also keep in mind that MySQL will warn you if you use duplicate names for your indexes but won’t prevent you from creating a different type of index on the same column.

In other words, you can have an ordinary B-tree index and a full-text index on the same column because they would serve two distinct purposes: a B-tree index on the first_name column would satisfy queries like

while a full-text index would satisfy columns like

Let’s create a couple of indexes – you will see that MySQL will error out if we have a duplicate key name (that happens because indexes can be acted on depending on their names), but won’t if we run multiple types of indexes:

D:\Misc\Redgate\Blogs\June 2024\MySQL Index Deep Dive - FULLTEXT Indexes\Images\Duplicate Indexes.png

Image 3 – Multiple Indexes on a Column

We can confirm that one column has multiple indexes by running a simple DESCRIBE query on our table (in this case, note the MULmultiple key – definition below the Key column. In our cases, that means that we’ve defined multiple indexes):

D:\Misc\Redgate\Blogs\June 2024\MySQL Index Deep Dive - FULLTEXT Indexes\Images\DESCRIBE a Table.png

Image 4 – Multiple Indexes on a MySQL Table

Search Operations Using Fulltext Indexes in MySQL

Cool, you now know how to build full-text indexes in MySQL. Time to search for data!

As I’ve already told you before, full-text indexes in MySQL have a couple of search modes, these being the natural language search mode (this is the default search mode if no search mode is specified), Boolean search mode, and a mode with query expansion. Here’s the syntax for full-text searches:

In our case, we’ve defined a full-text index on the first_name column, which means that we can only run full-text search queries on that column. With that said, our query looks like so:

Let’s try it out:

D:\Misc\Redgate\Blogs\June 2024\MySQL Index Deep Dive - FULLTEXT Indexes\Images\Ordinary Full-text Index Example.png

Image 5 – Running a Sample Query on a Fulltext Index

Specifying a search mode is not necessary unless you want to employ a specific full-text search mode. In that case, you would make use of one of three modes by defining them after the query.

Boolean Mode

The purpose of a search in the Boolean mode is to search with “fuzzy matching” – such a search mode searches for given words and can act together with operators like less than (<), more than (>), the plus (+) and minus (-) signs (they denote “more importance” or “less importance” for a result set), double quotes to search for an exact match, and have other characteristics as outlined in the documentation.

For a search in the Boolean mode, everything would look like so:

You can see this and a few other examples in the following output:

D:\Misc\Redgate\Blogs\June 2024\MySQL Index Deep Dive - FULLTEXT Indexes\Images\Example in Boolean Mode.png

Image 6 – Searches in Boolean Mode

Such a query will return results after searching for a query in the Boolean mode. Here, keep in mind that there are a couple of “gotchas”, for example, ordinary wildcard characters like “%” and “_” won’t work with full-text indexes – your search won’t error out, but you will get an empty result set:

D:\Misc\Redgate\Blogs\June 2024\MySQL Index Deep Dive - FULLTEXT Indexes\Images\Boolean Mode Wildcard Examples.png

Image 7 – Fulltext Indexes and Wildcards

Boolean search is also named Boolean because it accepts Boolean symbols as part of search values. Have a look:

D:\Misc\Redgate\Blogs\June 2024\MySQL Index Deep Dive - FULLTEXT Indexes\Images\IN BOOLEAN MODE Example.png

Image 8 – Making Use of the Boolean Mode

Here we told our database to search for records containing “Apple” and “juice.” We had a record “Apples with juice” as well as “Apple” so it’d picked those up – very interesting!

The Boolean operators available in the Boolean search mode are as follows:

  • +” means “AND”
  • -“ means “NOT”
  • “ (an empty operator) implies OR (that means if you search for “Apple juice”, your database understands the search as “Apple OR Juice” which may result in funny search results)
  • ~” lowers the contribution of the row to the result set returned by MySQL
  • *” denotes a wildcard and should be appended to the word to take effect. Words beginning with the term preceding the * operator will be returned.
  • ( )” denotes subexpressions.
  • “” means “exact match” – your database will search for a literal phrase as it was typed.

Query Expansion Mode

A fulltext query expansion mode runs the search twice relying on implied knowledge meaning that a search for the term “database” may return “MySQL”, “MariaDB”, “Percona Server”, “relational”, “NoSQL”, “MongoDB”, and the like – in essence, a query is run once, and then the second time searching for the most relevant results related to the query from the first searching attempt, and only then results are returned.

Let’s now try a search using query expansion. I’ll use cars as an example:

D:\Misc\Redgate\Blogs\June 2024\MySQL Index Deep Dive - FULLTEXT Indexes\Images\Query Expansion Examples.png

Image 9 – a Full-text Search with Query Expansion

Full-text search didn’t interpret our query as us searching for “2008” – it’d interpreted our search query as if we were searching for things made in 2008 and returned 3 results. It returned 3 results because the database thought all 3 results would be relevant – that’s the entire point of searches with query expansion. Interesting, yeah?

Choosing the Search Mode

Finally, decide on the search mode you’re going to be using because using full-text indexes without a good reason isn’t the smartest idea: they take up space, and if you don’t know what you’re doing, you may crash your server too (look at the Full-text Server Crashing example below.) Use full-text indexes if all of the following apply:

  • You need to index CHAR, VARCHAR, or TEXT columns.
  • You need the index to find keywords in the text.
  • You are after one or more of the search capabilities that full-text indexes can provide (see examples above.)

Once you’ve made a conscious decision to employ full-text indexes inside of your database, you should also decide what search mode you are going to use. The search modes are something to think about because they’re the primary reason people choose full-text indexes in the first place and if chosen improperly, they can impact the behavior of your search engine or other appliance you may be using full-text indexes for:

  • Use the NATURAL LANGUAGE mode (or don’t define a mode) if you don’t use wildcards and have no need for “implied knowledge” searching, but still want to make use of a full-text index.
  • Use the BOOLEAN mode if you intend to use Boolean characters (see examples above) or the wildcard character (*)
  • Use the QUERY EXPANSION mode if your use case necessitates searches with implied knowledge e.g. “databases” are connected with database types, etc. See the example in the Query Expansion Mode heading.

For more examples, see the heading below denoting all of the available full-text search modes and the limitations imposed on them.

One should also account for the storage engine that is being used, and I’ll explain why in a moment.

Fulltext Search Modes and Limitations

As I’ve already mentioned and given examples above, full-text indexes in MySQL have three search modes you can put to the test:

  • Search in a natural language mode will be conducted if no mode has been defined or you explicitly define the IN NATURAL LANGUAGE MODE modifier. Searches using this modifier will not be able to use the features provided by the Boolean or query expansion modes, but they will provide an index to speed up search queries by helping them quickly find keywords in the text. For an example using this search mode, see below.
  • Search in a Boolean mode will be conducted if you specify the IN BOOLEAN MODE modifier. Once the modifier is in use, certain characters will have a special meaning. No operator, or a space, implies that you want to search for one value OR another value, “+” means AND, “-“ means NOT, and “~” lowers the value’s contribution to the search results. Double quotes (“”) would search for an exact match, and you would need to use the Boolean mode if you want to use the wildcard operator (*) to search with fuzzy matching.
  • Search in a query expansion mode will be conducted if you specify the WITH QUERY EXPANSION modifier. Such a full-text search mode is useful when we rely on implied knowledge. In other words, we can use this search mode when we’re searching for a “database” but want our result set to include the flavors of databases (“MySQL”, “PostgreSQL”, “SQL Server”, etc.) instead.
D:\Misc\Redgate\Blogs\June 2024\MySQL Index Deep Dive - FULLTEXT Indexes\Images\IN NATURAL LANGUAGE MODE Example.png

Image 1 – A Search in a Natural Language Mode (Second Example)

One thing to remember is that if you want to conduct a search using a full-text index, you must use the MATCH() AGAINST()syntax: if you use an ordinary “=” sign, your database won’t use the full-text index you may’ve defined (but may make good use of other types of indexes you may define instead.)

As previously mentioned, full-text indexes are frequently invited to our database once we’re after fuzzy matches for our data – in other words, if we’re searching for something exotic. A wildcard using a full-text search mode becomes the star sign (*), and searches using it must use the Boolean mode:

A query like this is likely to find multiple results ranging from “apple”, “Jimmy has two apples”, “my grandma likes applesauce”, “I like this applet”, or “I’ve downloaded this applet and it sucks!”, so you must be aware of such occurrences to not be disappointed (it’s also worth noting that “” would denote an exact match while ‘’ would denote a match):

D:\Misc\Redgate\Blogs\June 2024\MySQL Index Deep Dive - FULLTEXT Indexes\Images\Fulltext Things in Boolean Mode.png

Image 10 – Fulltext Search in Boolean Mode

Don’t forget the Boolean mode – without it, your query will still execute, albeit with a different result set because the wildcard won’t be acted on as it’s only available within the Boolean search mode:

D:\Misc\Redgate\Blogs\June 2024\MySQL Index Deep Dive - FULLTEXT Indexes\Images\Wildcard Example without Boolean Mode.png

Image 11 – Fulltext Search without Boolean Mode

Queries using the full-text search mode also have to be at least of 4 characters in length and:

  • Not consist of stopwords (stopwords are words that have a meaning for the database or for your use case if you define them yourself.) That means that your search may not be searching for any full-text stopwords from the InnoDB stopword list if you’re running InnoDB, or from the MyISAM stopword list if you’re running MyISAM (find a list of stopwords for InnoDB here, and a list of stopwords for MyISAM here.) This point has caveats – see below*
  • Not run on partitioned tables.

*MyISAM stopword list can be defined by setting the ft_stopword_file variable to the path name of the file that contains stopwords you need to be in place. To define a stopword list for the default storage engine – InnoDB – define a table with the structure of the INNODB_FT_DEFAULT_STOPWORD table (make sure the structure is exactly the same), then set the innodb_ft_server_stopword_table variable to the name of your created table.

Also, keep in mind that the parser for queries using the FULLTEXT index in MySQL determines the start and end of words by looking at characters like the space (“ “), comma (“,”), and the dot (“.”) That means that queries searching for anything including a space, a comma, or a dot may return unexpected results. See example below:

D:\Misc\Redgate\Blogs\June 2024\MySQL Index Deep Dive - FULLTEXT Indexes\Images\Apple Juice Matches.png

Image 12 – I thought we’re searching for Apple Juice?

Full-text search operations should adhere to the limitations discussed above, as well as not be searching through Chinese or Japanese data (such data will make the fulltext index unable to determine where words begin/end), as well as provide AGAINST() with values instead of columns.

See all restrictions imposed towards full-text search operations here.

Bugs in Full-text Indexes – BUG#104263

Older versions of MySQL (we’re talking <= MySQL 5.7 which is now deprecated and shouldn’t be used anymore) also had a full-text-related trick up their sleeve – they’ve been prone to server crashes if all three of the following were true:

  1. There were more than 100 million rows in a table and one or more of our columns had a full-text index on it.
  2. There was an “@” sign in a specific column (think email addresses and everything in that realm.)
  3. You were searching for data containing the “@” sign using a full-text index with a Boolean search mode.

If all of these three factors were true and we would execute an SQL query like so (here we search for an exact match of demo@demonstration.com – the query could be anything with an “@” in it):

Our SQL query would time out and we would have to terminate it before proceeding to run any other SQL queries in the database. Why? I assume because searching for an exact match (look closely – we’ve wrapped our value in quotes meaning that we’re searching for exactly this value and nothing else) and using an @ sign on bigger data sets causes MySQL to go bonkers. I’ve never got to the bottom of this – but hey, now you have something to show during your next conference talk! (Also, if you’re looking for a conference, visit our conference PowerShell and SQL on the River in August.)

This is a bug within MySQL – BUG#104263 to be exact. The bug exists only within MySQL 5.7 (if you find that it works on other versions, please let me know so I can test this out once again) and only works on bigger data sets and exact matches searching for anything containing the @ sign in the Boolean search mode.

Cool – now you’re also aware of a bug within MySQL!

Summary

Fulltext indexes in MySQL are a force to be reckoned with – they’re an extremely powerful friend you can call on whenever the need to implement various search modes (natural language search mode, query expansion search mode, or the Boolean search mode) occurs. If you want to implement a tad of Google-like search features into your search engine, look into full-text indexes.

However, they must be accounted for and used properly – your database won’t “protect” you from adding a full-text index on top of another index on the same column, and full-text indexes are only necessary to complete search operations of a specific nature: they facilitate complex text searches.

We hope that this blog was informational and useful for you, your colleagues, or the team you manage. If you elect to use full-text indexes in MySQL, do know both their benefits and disadvantages and choose wisely: if you want to deeper your knowledge of the database space, consider watching a video or two on the Database Dive YouTube channel and also, come back to the Redgate blog every now and then to hear about updates from the database space, and until next time.

Appendix – Table Structure & Data

In case you want to create the tables and indexes in this blog, all tables, table structures, and demo data are available here.

Load comments

About the author

Lukas Vileikis

See Profile

Lukas Vileikis is an ethical hacker and a frequent conference speaker. Since 2014, Lukas has found and responsibly disclosed security flaws in some of the most visited websites in Lithuania. He runs one of the biggest & fastest data breach search engines in the world - BreachDirectory.com, frequently speaks at conferences and blogs in multiple places including his blog over at lukasvileikis.com.