0

I'm trying to remove about 50 milliong of 40 billions rows table (one specific day based on event_date column) , I'm using lightweight delete with "where" clause and "where" has no columns that is part of ORDER BY of PRIMARY KEY so I need to scan whole table in order to find that data with specific "event_date". I know that I can partition table or create primary key or create table with specific "ORDER BY" , but in case mentioned above without any of mentioned warkarounds is there any way to end operation only when all replicas deleted their data ? When SQL operation exceed 120 sec on mask DELETE FROM ... ON CLUSTER SETTINGS alter_sync = 2 ,replication_wait_for_inactive_replica_timeout = 600 , mutations_sync = 2 I still get error

DB::Exception: Watching task /clickhouse/cluster_name/task_queue/ddl/query-0023446431 is executing longer than distributed_ddl_task_timeout (=180) seconds. There are 10 unfinished hosts (0 of them are currently active), they are going to execute the query in background.

I want to have sql running until deletes comleted on all replicas even it take infinite time , I don't want to break SQL operation after 180 sec ( I know it is still running in background) , how can I achieve this ?

1 Answer 1

1

You could use SETTINGS distributed_ddl_task_timeout=3600

Because you have full scan in this case, better use ALTER TABLE .. DELETE WHERE ...

and poll SELECT hostName(), * FROM clusterAllReplicas('cluster_name',system.mutations) WHERE is_done=0 FORMAT Vertical

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.