3. April 2024 By Yannik Rust
Database indices: The key to performance optimisation
Databases are the backbone of modern applications, but as the amount of data increases, challenges arise, especially in terms of query speed. Database indexes, in particular the B-tree index, are the focus of this review to understand how they help speed up database queries.
Database indexes play a crucial role in the world of Oracle databases. As data volumes increase, database users often struggle with slow query speeds. To solve this problem, tables are indexed. However, it is not enough to simply create new indexes to speed up queries. To utilise the full potential of indexes, a deep understanding of how they work is required. In this article, we take a closer look at how indexes work using the example of Oracle databases.
Target for database queries
When a SELECT query is applied to one or more tables, one of the main objectives is to minimise query time. The data is stored in blocks, known as DB pages, on the hard drive. Reading and writing from the hard drive is a time-consuming process. Data that is queried frequently is stored in the limited buffer cache. The more frequently data has to be read from the hard disc, the longer a query takes.
If data is restricted in the SELECT statement by filter criteria (WHERE condition) and no indices are available, all data records in this table must be read from the hard drive and checked against the given filter criterion.
Such a so-called full table scan may not be a problem for small tables. With thousands or millions of data records, however, this procedure leads to longer waiting times. Especially if several tables are linked together, as is the case with normalised databases in the Star schema.
Indices are used to reduce waiting times and minimise the number of read operations on the hard disk.
The B-tree index
One of the most frequently used index types is the B-tree index. This is a hierarchical tree structure based on a specific set of rules to enable faster and more memory-efficient searching, inserting and deleting of data. The name "B-Tree" refers to the balanced structure of the tree ("balenced tree"). This means that all leaf nodes are on the same level and the degree of branching is kept relatively low to ensure quick access to the leaf nodes. The B-tree index stores data in sorted order, which makes it very efficient for range queries (WHERE X BETWEEN Y AND Z) and equality checks (WHERE X = Y). B-Tree is the standard index type for various database systems such as Oracle and MySQL. The following example shows a simple B-tree index based on a list of numbers.
Structure of a B-tree index
In this example, we have a sorted list of numerical values [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28], which represent the index. The tree is balanced because all the lowest leaves, shown here in green, are on the same level (3). The leaves, also known as leaf nodes, contain the actual data or references to data records in the database table. In the Oracle example, these leaves contain the ROWID, which enables the desired data record to be quickly identified on the hard drive.
Above the sheets are the nodes (branches), shown here in yellow. In this example, each node can only hold three values and therefore only have three child nodes/leaves. The nodes always represent only one area for the nodes or leaves stored in them. The top node is called the root, is shown here in blue and represents the start of the B-tree index.
To illustrate how this works, we will now look at a single block of numbers. For example, the second entry in the root (18-28) refers to all values between the numbers 18 and 28. As we need to store a total of 6 numbers and the maximum storage capacity has been set to three values per sheet, a new node must be created. This new node again contains two entries (18-23 | 24-28), which in turn refer to a range of numbers. In this example, only two of the three positions are occupied. The individual values are now saved on the lowest level (level 3). The limit of three values per node/leaf defined for the entire tree also applies here.
In the example, one position is still free at the root (blue) and at the second node (yellow). This means that the maximum number of values to be saved has not yet been reached. If the values 32 and 36 are now added, a new sheet is created on the right-hand side. This leaf creates a new entry in the node, which now has three entries. This tree, which is limited to three entries per node, can therefore store 27 entries on the third level (33). If the number of values to be saved exceeds 27, a new level must be created. The tree could now store up to 81 values on level 4 (34).
In an Oracle database, the size of the nodes/blocks can be defined when the database is created. Let's take a block size of 8 KB as an example. If data such as strings, timestamps or integers are stored with a size of 8 bytes, a block can store more than 400 entries. As can be seen in the table, this type of tree can already store up to 64 million entries at level 3 and up to 25 billion entries at level 4. In practice, B-trees are therefore usually no deeper than 4 levels.
Change in the B-Tree Index
A key aspect of the B-Tree Index is its ability to self-balance. When the underlying data changes, whether by adding new records or deleting existing records, the tree is automatically restructured to maintain its balance. This means that the leaf nodes of the B-tree always remain at the same level. This automatic balancing ensures that the search performance remains at a consistently high level even when the data changes dynamically. However, it should be noted that regular adjustments to the index can lead to a slowdown when inserting (INSERT), updating (UPDATE) and deleting (DELETE) data compared to tables without an index.
To illustrate this, let's look again at the previously indexed numerical values from 0 to 28 and add the values 13, 15 and 17. By inserting these values, the tree must be restructured up to the root. This means that the number ranges must be adjusted at each level and a new leaf must be added. The new structure with the changed values in bold is shown in the following illustration.
B-tree indices in Oracle
Despite its potential difficulties, the B-tree index remains an indispensable tool for optimising database queries and increasing performance. To create a B-tree index in Oracle, the following code must be executed: CREATE INDEX index_name ON table_name(attribute_name);
It is also possible to create indices using several attributes of a table. However, it should be noted that the order of the specified attributes is important. Performance optimisation can only be achieved through the defined index if the specified attributes are also used in the filter operations (WHERE) of the SELECT statement. If this is not the case and only some of the attributes are filtered in the WHERE statement, the query optimiser performs a full table scan or possibly uses other, smaller indices. This can slow down the query and prevent the desired improvement in performance. To create an index from multiple attributes in Oracle, the following code is used: CREATE INDEX index_name ON table_name(attribute_name_1, attribute_name_2);
Best Practice
When using B-tree indexes, there are various aspects to consider in order to avoid undesirable reductions in query speed. The most important rules and best practices for the use of B-tree indices are summarised below:
- Primary Key and Foreign Key: To improve the performance of queries with one or more joins over tables, it is advisable to create indexes for primary and foreign keys. This means that joins can be executed more quickly and a full table scan does not have to be performed for large tables.
- High cardinality: The attributes selected for the index should have a high cardinality, i.e. contain many unique values. For example, the date of birth attribute has a high cardinality and the gender attribute has a very low cardinality. If you create a B-tree index with low cardinality, the Query Optimiser will most likely not use this index. Other index types, such as the bitmap index, are more suitable here.
- Correct operators: In order to be able to use an index in a query, the correct comparison operators must be used in the WHERE condition. These are, in particular, equality operators (greater than, less than, equal to) and range operators (BETWEEN). Negating operators such as NOT or "!=" prevent the use of an index.
- Small query sets: An index is only used by the Query Optimiser if the result represents a small subset of the total data. It is ideal if the selected data makes up less than 1% of the available data volume. Between one per cent and ten per cent, the index can still be used; for more than ten per cent, a complete table scan may be faster.
- Use case: To create optimal indices, it is helpful to know the expected SQL queries and adapt the indices accordingly.
- Multiple attributes: For complex queries and joins, it can be useful to create an index from multiple attributes. Attributes with high cardinality should be placed first in order to enable faster traversal of the index tree.
- Number of indices: You should avoid creating too many indices, as this consumes memory and can slow down SQL operations such as INSERT, UPDATE and DELETE.
- Query plan: The query plan can be used to check whether an index is used in SELECT queries. This shows whether a complete table scan or an index is used for the attributes to be filtered.
Conclusion
To summarise, the reasons for using B-tree indices in database systems can be derived from the points discussed above:
- Efficiency: B-Trees provide an efficient search operation with a worst-case logarithmic complexity of O(log n), where n is the number of entries in the tree. This means that the search time grows logarithmically with the number of entries, which makes B-Tree very efficient for large data sets.
- Balancing: B-Trees are self-balancing, meaning that when data is added or removed, the tree is automatically restructured to maintain a balance between the left and right subtrees. This ensures that search operations take approximately the same time for all nodes in the tree and improves the overall performance of the database.
- Range queries: B-Tree supports range queries efficiently as the data is stored in consecutive blocks on the hard drive. This enables fast access to data within a range of values in a column.
- Hard disk space: B-Trees are optimised for the special requirements of hard disk space, which is crucial for databases. B-Tree indexes minimise the number of disk searches required to find a record and improve overall system performance.
Would you like to find out more about exciting topics from the adesso world? Then take a look at our previous blog posts.