1

I'm developing a search for entities by name in Postgres database. Name usually consists of 1-3 words and may contain symbols &, !, (, ), -, etc.

I'm using gin trigram index and queries: WHERE name ILIKE '%something%', ILIKE 'a%' and WHERE name % 'abc' for fuzzy search (if nothing was found by the exact match via ILIKE). The problem is that we need to support search by any characters, not only letters and numbers. Trigram index ignores such characters.

I've tried text_pattern_ops index for this case but without any success: queries such as WHERE name ILIKE '%$%' are extremely slow

So, is there any way to efficiently process such queries? Do I need a full text search for this purpose?

UPD:

Table is like:

id (int) Name (text)
123 Dolce&Gabbana

Queries are like:

SELECT name FROM brand WHERE name ILIKE '%&%' ORDER BY name;

UPD2:

Query plan for

EXPLAIN(analyze, verbose, buffers, settings)
SELECT name FROM brand
WHERE name ILIKE '%$%'
ORDER BY name

enter image description here

Index was created as:

CREATE INDEX brand_trgm_idx ON brand USING gin (name gin_trgm_ops);

Table was created as:

CREATE TABLE brand (
    id              serial PRIMARY KEY,
    name            TEXT,
    collection_id   TEXT,
    created_at      TIMESTAMP DEFAULT now() NOT NULL,
    created_by      TEXT                    NOT NULL
);

Also tried:

CREATE INDEX brand_name_idx ON brand (name text_pattern_ops);

UPD3:

Checked query analyze for the same db, but with ~1M entries:

enter image description here

8
  • can you post a table with data and what you are searching for Commented Apr 10, 2024 at 22:40
  • Table is like: id int Name text ... 123 Dolce&Gabbana. Queries are like: SELECT name FROM brand WHERE name ilike '%&%' ORDER BY name; Commented Apr 10, 2024 at 22:47
  • 1
    please add new information always in the question, comment are unreadable for ther things than text and make that create table and INSERT INTO Commented Apr 10, 2024 at 22:50
  • 1
    4.4 milliseconds, what kind of performance do you expect? Maybe the query planner will consider using the index when you have more data. For now, with this amount of data, using an index would most likely be slower. Commented Apr 11, 2024 at 0:32
  • 1
    Be aware that an index with the operator class text_pattern_ops is only good for left-anchored patters. Your example pattern '%&%' cannot use such an index. See: stackoverflow.com/a/13452528/939860 If you really hunt for single punctuation characters, you need a tailored index ... Commented Apr 11, 2024 at 2:52

1 Answer 1

3

trgm.h does #define KEEPONLYALNUM. If you removed that and recompiled, it would then keep punctuation characters other than spaces. However, '%$%' doesn't have any usable trigrams in it, just like '%a%' doesn't, as it is too short. So the one concrete example you showed us would still not use the index.

It is also rather hazardous to do this, as then upgrading your system could cause your changed binaries to silently be lost. It would be better to fork and rename, but that is a lot of work.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you for your answer! I think it's too risky for me, this will be used in production...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.