Postgres queries contaning non-alphabetical characters

Question

I'm developing a search for entities by name in Postgres database. Name usually consists of 1-3 words and may contain symbols &, !, (, ), -, etc.

I'm using gin trigram index and queries: WHERE name ILIKE '%something%', ILIKE 'a%' and WHERE name % 'abc' for fuzzy search (if nothing was found by the exact match via ILIKE). The problem is that we need to support search by any characters, not only letters and numbers. Trigram index ignores such characters.

I've tried text_pattern_ops index for this case but without any success: queries such as WHERE name ILIKE '%$%' are extremely slow

So, is there any way to efficiently process such queries? Do I need a full text search for this purpose?

UPD:

Table is like:

id (int)	Name (text)
123	Dolce&Gabbana

Queries are like:

SELECT name FROM brand WHERE name ILIKE '%&%' ORDER BY name;

UPD2:

Query plan for

EXPLAIN(analyze, verbose, buffers, settings)
SELECT name FROM brand
WHERE name ILIKE '%$%'
ORDER BY name

Index was created as:

CREATE INDEX brand_trgm_idx ON brand USING gin (name gin_trgm_ops);

Table was created as:

CREATE TABLE brand (
    id              serial PRIMARY KEY,
    name            TEXT,
    collection_id   TEXT,
    created_at      TIMESTAMP DEFAULT now() NOT NULL,
    created_by      TEXT                    NOT NULL
);

Also tried:

CREATE INDEX brand_name_idx ON brand (name text_pattern_ops);

UPD3:

Checked query analyze for the same db, but with ~1M entries:

can you post a table with data and what you are searching for — nbk
– nbk, Commented Apr 10, 2024 at 22:40
Table is like: id int Name text ... 123 Dolce&Gabbana. Queries are like: SELECT name FROM brand WHERE name ilike '%&%' ORDER BY name; — Анастасия Разумовская
– Анастасия Разумовская, Commented Apr 10, 2024 at 22:47
please add new information always in the question, comment are unreadable for ther things than text and make that create table and INSERT INTO — nbk
– nbk, Commented Apr 10, 2024 at 22:50
4.4 milliseconds, what kind of performance do you expect? Maybe the query planner will consider using the index when you have more data. For now, with this amount of data, using an index would most likely be slower. — Frank Heikens
– Frank Heikens, Commented Apr 11, 2024 at 0:32
Be aware that an index with the operator class text_pattern_ops is only good for left-anchored patters. Your example pattern '%&%' cannot use such an index. See: stackoverflow.com/a/13452528/939860 If you really hunt for single punctuation characters, you need a tailored index ... — Erwin Brandstetter
– Erwin Brandstetter, Commented Apr 11, 2024 at 2:52

jjanes · Accepted Answer · 2024-04-11 01:00:25Z

3

trgm.h does #define KEEPONLYALNUM. If you removed that and recompiled, it would then keep punctuation characters other than spaces. However, '%$%' doesn't have any usable trigrams in it, just like '%a%' doesn't, as it is too short. So the one concrete example you showed us would still not use the index.

It is also rather hazardous to do this, as then upgrading your system could cause your changed binaries to silently be lost. It would be better to fork and rename, but that is a lot of work.

answered Apr 11, 2024 at 1:00

jjanes

45k5 gold badges39 silver badges48 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Анастасия Разумовская Over a year ago

Thank you for your answer! I think it's too risky for me, this will be used in production...

Collectives™ on Stack Overflow

Postgres queries contaning non-alphabetical characters

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related