Database indexes and their Big-O notation

Question

I'm trying to understand the performance of database indexes in terms of Big-O notation. Without knowing much about it, I would guess that:

Querying on a primary key or unique index will give you a O(1) lookup time.
Querying on a non-unique index will also give a O(1) time, albeit maybe the '1' is slower than for the unique index (?)
Querying on a column without an index will give a O(N) lookup time (full table scan).

Is this generally correct ? Will querying on a primary key ever give worse performance than O(1) ? My specific concern is for SQLite, but I'd be interested in knowing to what extent this varies between different databases too.

Nicholas Carey · Accepted Answer · 2011-01-14 18:50:53Z

Most relational databases structure indices as B-trees.

If a table has a clustering index, the data pages are stored as the leaf nodes of the B-tree. Essentially, the clustering index becomes the table.

For tables w/o a clustering index, the data pages of the table are stored in a heap. Any non-clustered indices are B-trees where the leaf node of the B-tree identifies a particular page in the heap.

The worst case height of a B-tree is O(log n), and since a search is dependent on height, B-tree lookups run in something like (on the average)

O(log_t n)

where t is the minimization factor ( each node must have at least t-1 keys and at most 2*t* -1 keys (e.g., 2*t* children).

That's the way I understand it.

And different database systems, of course, may well use different data structures under the hood.

And if the query does not use an index, of course, then the search is an iteration over the heap or B-tree containing the data pages.

Searches are a little cheaper if the index used can satisfy the query; otherwise, a lookaside to fetch the corresponding datapage in memory is required.

Mark Wilkins · Accepted Answer · 2011-01-14 18:33:53Z

11

The indexed queries (unique or not) are more typically O(log n). Very simplistically, you can think of it as being similar to a binary search in a sorted array. More accurately, it depends on the index type. But a b-tree search, for example, is still O(log n).

If there is no index, then, yes, it is O(N).

answered Jan 14, 2011 at 18:33

Mark Wilkins

41.3k5 gold badges61 silver badges111 bronze badges

Comments

gbn · Accepted Answer · 2011-01-14 18:54:47Z

6

If you SELECT the same columns you search for then

Primary or Unqiue will be O(log n): it's a b-tree search
non-unique index is also O(log n) + a bit: it's a b-tree search
no index = O(N)

If you require information from another "source" (index intersection, bookmark/key lookup etc) because the index is non-covering, then you could have O(n + log n) or O(log n + log n + log n) because of multiple index hits + intermediate sorting.

If statistics show that you require a high % of rows (eg not very selective index) then the index may be ignored and become a scan = O(n)

answered Jan 14, 2011 at 18:54

gbn

435k85 gold badges602 silver badges690 bronze badges

Comments

StaxMan · Accepted Answer · 2011-01-14 18:55:15Z

5

Other answers give a good starting point; but I would just add that to get O(1), primary index itself would need to be hash-based (which is typically not the default choice); so more commonly it is logarithmic (B-tree).

You are correct in that secondary indexes typically have same complexity, but worse actual performance -- this because index and data are not clustered, so the constant (number of disk seeks) is bigger.

answered Jan 14, 2011 at 18:55

StaxMan

117k35 gold badges215 silver badges241 bronze badges

1 Comment

Alex78191 Over a year ago

how to create hash-based index?

dan04 · Accepted Answer · 2011-01-15 17:45:22Z

3

It depends on what your query is.

A condition of the form Column = Value allows the use of a hash-based index, which has O(1) lookup time. However, many databases, including SQLite, do not support them.
A condition using relational operators (<, >, <=, >=) can make use of an ordered index, typically implemented with a binary tree, which has O(log n) lookup time.
More complicated expressions which cannot use an index require O(n) time.

Since you're primarily interested in SQLite, you might want to read its Query Optimizer Overview which explains in more detail how indexes are selected.

answered Jan 15, 2011 at 17:45

dan04

92.1k23 gold badges169 silver badges206 bronze badges

Comments

Johnny Svarog · Accepted Answer · 2022-12-07 09:39:51Z

2

That's a great question. It deserves a book to be written! The three major things here are

the search algorithm applied and
the size of the read blocks to process
the type of the index (hashed, B-Tree, or other)

The complexity of O(1), in general, is applied to hashed indexes, and the data that the database engine has in the cache.

Most DB engines use B-Tree indexes by default. The clustered indexes are not an exception. That's why when searching by a primary key, you should expect complexity O(log N).

In this nice article you can find more details on what is going on under the hood:

edited Dec 7, 2022 at 9:39

answered Dec 4, 2022 at 16:44

Johnny Svarog

1,15610 silver badges17 bronze badges

Collectives™ on Stack Overflow

Database indexes and their Big-O notation

6 Answers 6

Comments

Comments

Comments

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

Comments

Comments

Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related