What are database indexes?

How SQL queries work

SQL queries are used to retrieve data from a database. We use the SELECT statement to retrieve data and usually a WHERE clause to filter the data.
Let’s see how SELECT statements work.

First let’s define our table. We have a posts table defined by the below schema:

    user_id INTEGER,
    title TEXT,
    body TEXT,
    FOREIGN KEY (user_id) REFERENCES users(id)

Let’s say we want to retrieve all posts. We can do this by using the SELECT statement.

SELECT * FROM posts;

This will return all the posts in the posts table.

Let’s take this one level up. We want to filter all posts where the title contains the word “Java” (since its my favorite language).

SELECT * FROM posts WHERE title = "Java";

This will return all the posts where the title is “Java”. The WHERE clause is used to filter the data.

All is pretty convenient till now. Queries can be much complex but this should be enough to understand what is happening behind the scenes.

Let’s examine how our query works. What does the system do in the background?

Let’s say this table is part of a huge database. Since it goes through the entire database, it will take a long time to run.
This is where the concept of indexing comes in.

What if the rows were sorted alphabetically by the title column?
Would it be faster to query the database?

We can achieve this by creating an index on the title column.

Let’s see how we can create an index on the title column.

CREATE INDEX posts_title_index ON posts (title);

This will create an index on the title column.
The index is basically a lookup table. It has two parts:

  1. A key – It is the value of the column we are indexing on – in this case title.
  2. A value – It is the pointer to the row that contains the entire post

Note: It is also possible to index multiple columns together. For example, if we want to find all the posts that contain the word “Java” and the word “Python”, we can index the title column and the body column together.

CREATE INDEX posts_title_body_index ON posts (title, body);

Here the index will have two keys. The first key is the title column and the second key is the body column. When both the keys match, the index will point to the row.

So if we run the query again, it will look up the key in the index and return the row id. The row id can then be used to return the entire row.

How did this make the lookup faster?

In our case, the query optimizer will find the index on the title column. It will use the index to find the rows. Now the speed depends on how the index is implemented.
For example,

There are many more ways to implement indexes and each one has its own advantages and disadvantages. To read further refer to – https://en.wikipedia.org/wiki/Index_(database)

Then why don’t we create an index for each column?

While it is possible to create an index for each column, it is not necessary. It has a performance cost. Wait, why?

We saw that the index makes querying faster but what about writing to the database?

Automatic Indexing

For optimization purposes, the database engine will automatically create indexes for some columns.

  1. The primary key column – This is the column that is used to uniquely identify a row. It is automatically indexed.
  2. The columns with unique constraints – These are automatically indexed. There can be many columns with unique constraints.
    Disclaimer: Each specific database implementation may have different rules for indexing although most of them follow the above two rules.

How to decide which columns to index?

There are many factors that can influence the speed of indexing and it is important to choose the columns that are most likely to be queried.

Decision criteria:

  1. The column should be indexed if it is likely to be queried often – For example, finding all the posts by a user is a common query.
  2. Multiple columns should be indexed only if they are likely to be queried together – For example, finding all the posts that contain the word “Java” and the word “Python” is not a common query.
  3. Don’t index columns that are changed frequently – This creates a lot of unnecessary index updates.
  4. Columns used to join tables are a good candidate for indexing. It is likely that they will be used in queries.

As you can see, the decision criteria are not always obvious but the general idea is to choose the columns that are most likely to be queried. Keeping this in mind, we should decide our querying strategy and indexing strategy together. Keep as few columns in queries as possible and look at the possibility of those columns having their own index.

Thanks for reading. This should give you an idea about indexes in SQL databases and how they can be an important part of your SQL performance. f you want to connect with me, you can find me on Twitter @abh1navv

Source: DEV Community

December 2, 2021
Category : News
Tags: beginners | database | programming | SQL

Leave a Reply

Your email address will not be published. Required fields are marked *

Sitemap | Terms | Privacy | Cookies | Advertising

Senior Software Developer

Creator of @LzoMedia I am a backend software developer based in London who likes beautiful code and has an adherence to standards & love's open-source.