[Data] support inject the checkpointfilter/checkpointwriter on custom place.

### Description

Ray data support checkpoint based on using primary key to filter out processed rows.

But currently, the mechanism only support inject `checkpointfilter` on source side(`plan_op_read` inject it into readtasks).

But in some envronment, we can't get primary key in source side.  Allowing user to set inject place of checkpointfilter will be useful in those scenarios.

### Use case

In our offline inference scenario,  we get files based on sample stradegy, and then split one compressed file into many samples. So the primary key will be `file_name + smaple_id`, and we should inject the checkpointfilter after extracting samples.
 
<img width="1070" height="633" alt="Image" src="https://github.com/user-attachments/assets/9a3dfd55-eba6-4cdb-81f9-485199d8e895" />

And in a RAG pipeline, a pdf file may be split to many chunks, we may use `file_name + chunk_id` to identify a chunk.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Data] support inject the checkpointfilter/checkpointwriter on custom place. #60704

Description

Use case

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Data] support inject the checkpointfilter/checkpointwriter on custom place. #60704

Description

Description

Use case

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions