Clickhouse cluster replicas sync completion time issue

Question

I'm trying to remove about 50 milliong of 40 billions rows table (one specific day based on event_date column) , I'm using lightweight delete with "where" clause and "where" has no columns that is part of ORDER BY of PRIMARY KEY so I need to scan whole table in order to find that data with specific "event_date". I know that I can partition table or create primary key or create table with specific "ORDER BY" , but in case mentioned above without any of mentioned warkarounds is there any way to end operation only when all replicas deleted their data ? When SQL operation exceed 120 sec on mask DELETE FROM ... ON CLUSTER SETTINGS alter_sync = 2 ,replication_wait_for_inactive_replica_timeout = 600 , mutations_sync = 2 I still get error

DB::Exception: Watching task /clickhouse/cluster_name/task_queue/ddl/query-0023446431 is executing longer than distributed_ddl_task_timeout (=180) seconds. There are 10 unfinished hosts (0 of them are currently active), they are going to execute the query in background.

I want to have sql running until deletes comleted on all replicas even it take infinite time , I don't want to break SQL operation after 180 sec ( I know it is still running in background) , how can I achieve this ?

Slach · Accepted Answer · 2023-11-23 08:52:59Z

1

You could use SETTINGS distributed_ddl_task_timeout=3600

Because you have full scan in this case, better use ALTER TABLE .. DELETE WHERE ...

and poll SELECT hostName(), * FROM clusterAllReplicas('cluster_name',system.mutations) WHERE is_done=0 FORMAT Vertical

answered Nov 23, 2023 at 8:52

Slach

2,7081 gold badge18 silver badges26 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Clickhouse cluster replicas sync completion time issue

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related