Skip to content

Detect incomplete or corrupted downloaded files #980

@aalhossary

Description

@aalhossary
  • It is quite often when a file is being downloaded that it downloads incomplete or corrupted; which could lead to unexpected potential problems.

  • I propose to create one (or two) files beside the target file target.ext, named target.ext.size (and target.ext.sha).

  • When we need to parse a file, we currently check its presence (e.g. in ScopInstallation.ensureClaInstalled() for SCPR classification DB file). In my proposal, we do not only check for its presence, but also for its size +/- its hash code.

  1. HashCode: Some web folders publish the HashCode along with the download file itself. If not, we ignore the hash code :(.
  2. Size: We can issue a HEADER request before the GET or POST call that downloads the file itself, extract the size, and save it.
  • The four functions to store/check size/hash can be implemented centrally in a separate location (most probably in BioJava-core) as public static methods and called only at the respective use places.

  • This way, we hopefully will not modify in the current code a lot: just adding one (or two) lines to call storeSize(), and storeHash() at the file download location and checkSize() +/- checkHash() method from within the ensureXXXInstalled() functions.

Metadata

Metadata

Assignees

Labels

enhancementImprovement of existing code or method

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions