Skip to content
This repository was archived by the owner on Mar 11, 2021. It is now read-only.

Deferring Text Processing#461

Open
knowtheory wants to merge 2 commits intodocumentcloud:masterfrom
knowtheory:skip_text
Open

Deferring Text Processing#461
knowtheory wants to merge 2 commits intodocumentcloud:masterfrom
knowtheory:skip_text

Conversation

@knowtheory
Copy link
Member

As a general matter i don't recommend this, but I've tested this out in anger and it does what it's supposed to.

DocumentCloud's DocumentImport action splits into two main tasks, processing images, and processing text. We can already run each of those independently (notably for reprocessing text, or reprocessing images).

The changes here expose skipping text processing entirely. I don't generally recommend this, and there needs to be some other process which will circle back around and process text, but this would serve as a basis for doing so.

Additionally, added a mechanism for flagging incoming import jobs as api_document_import so it can be handled separately and priority given to document_import and large_document_import jobs.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant