0

I have a table indexed on a text column, and I want all my queries to return results ordered by name without any performance hit. Table has around 1 million rows if it matters.

Table -

CREATE TABLE table (Name text)

Index -

CREATE INDEX "NameIndex" ON "Files" (
    "Name" COLLATE nocase   ASC
);

Query 1 -

select * from table where Name like "%a%"

Query plan, as expected a full scan -

SCAN TABLE table

Time -

Result: 179202 rows returned in 53ms

Query 2, now using order by to read from index -

select * from table where Name like "%a%" order by Name collate nocase

Query plan, scan using index -

SCAN TABLE table USING INDEX NameIndex

Time -

Result: 179202 rows returned in 672ms

Used DB Browser for SQLite to get the information above, with default Pragmas.

I'd assume scanning the index would be as performant as scanning the table, is it not the case or am I doing something wrong?

Another interesting thing I noticed, that may be relevant -

Query 3 -

select * from table where Name like "a%"
Result: 23026 rows returned in 9ms

Query 4 -

select * from table where name like "a%" order by name collate nocase
Result: 23026 rows returned in 101ms

And both has them same query plan -

SEARCH TABLE table USING INDEX NameIndex (Name>? AND Name<?)

Is this expected? I'd assume the performance be the same if the plan was the same.

Thanks!

EDIT - The reason the query is slower was because I used select * and not select name, causing SQLite to go between the table and the index.

The solution was to use clustered index, thanks @Tomalak for helping me find it - create table mytable (a text, b text, primary key (a,b)) without rowid The table will be ordered by default using a + b combination, meaning that full scan queries will be much faster (now 90ms).

1 Answer 1

2

A LIKE pattern that starts with % can never use an index. It will always result in a full table scan (or index scan, if the query can be covered by the index itself).

It's logical when you think about it. Indexes are not magic. They are sorted lists of values, exactly like a keyword index in a book, and that means they are only only quick for looking up a word if you know how the given word starts. If you're searching for the middle part of a word, you would have to look at every index entry in a book as well.


Conclusion from the ensuing discussion in the comments:

The best course of action to get a table that always sorts by a non-unique column without a performance penalty is to create it without ROWID, and turn it into a clustering index over a the column in question plus a second column that makes the combination unique:

CREATE TABLE MyTable (
    Name   TEXT COLLATE NOCASE,
    Id     INTEGER,
    Other  TEXT,
    Stuff  INTEGER,
    PRIMARY KEY(Name, Id)  -- this will sort the whole table by Name
) WITHOUT ROWID;

This will result in a performance penalty for INSERT/UPDATE/DELETE operations, but in exchange sorting will be free since the table is already ordered.

Sign up to request clarification or add additional context in comments.

11 Comments

@AdirHudayfi How does the performance change when you change the ordered queries so they can be fulfilled from the index? e.g. select Name from table where Name like "%a%" order by Name collate nocase
@AdirHudayfi No, you did not. You mentioned times for select * queries, that's not what I was talking about.
@AdirHudayfi That would be my conclusion. (That's why covering indexes are great, when you can make them, and SELECT *-queries are generally bad because they optimize poorly)
@AdirHudayfi An index always contains a copy of the column data (deduped, but still) so indexes that contain all columns will increase the database size and slow down INSERT/UPDATE/DELETE operations. If you want your table to always sort by one particular column, you can make a clustered index, i.e. you ensure that the base table itself is always ordered by a particular column. For that you create the table WITHOUT ROWID, and set up a primary key with your defined order. This will slow down INSERTs somewhat, but in exchange sorting will be free.
@AdirHudayfi You can make a two-part primary key that consists of Name and a unique number (or any other existing property that makes the combination unique), and then cluster by this key sorting by Name first.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.