hasher/operations: fix in-flight hash caching, fingerprinting, and modtime-fallback for hash-less remotes#9500
Draft
davispw wants to merge 15 commits into
Draft
Conversation
The Google Photos Library API does not support permanently deleting media items (https://issuetracker.google.com/issues/109759781). To work around this, rclone now moves deleted and overwritten items into a designated "trash" album (default: "rclone_Trash") and removes them from the active album. Users can then review and permanently delete items from the trash album via the Google Photos web UI. - Move deleted items to trash album in Remove() - Move old item to trash album after Update() re-uploads - Add TrashAlbumName to Options (defaults to "rclone_Trash") - Add api.BatchAddItems and api.BatchRemoveItems request types - Add unit tests: TestRemoveTrashWorkaround, TestUpdateTrashWorkaround
The Google Photos Library API has no delete endpoint for media items (https://issuetracker.google.com/issues/109759781). Add documentation covering the trash album workaround added in the previous commit, including the trash_album_name option and instructions for users to review and permanently delete items from the trash album via the Google Photos web UI.
dea1384 to
ef9f34e
Compare
f321a92 to
f218a82
Compare
3ba5f44 to
320d2a2
Compare
When uploading media, rclone can now read description metadata from EXIF/IPTC/XMP tags and pass it to the Google Photos API as the media item description, visible in the Google Photos UI. This is controlled by two new options: - read_exif_description: enable the feature (default: false) - exif_description_fields: ordered list of tag names to try (default: Description,Caption-Abstract,ImageDescription,Title,ObjectName) The first non-empty matching tag value is used. The feature uses the github.com/bep/imagemeta library and reads only the first 512 KiB of the upload stream to extract metadata before uploading. Add unit test TestEXIFDescriptionMapping.
…identical byte deduplication
…ption fields This adds a custom HandleXMP parser to extract Dublin Core nested title and description tags (dc:title and dc:description) which are skipped by the default imagemeta parser or are otherwise ignored since they are nested tags and not attributes. This ensures that Lightroom-exported titles and descriptions successfully map to the Google Photos description on upload.
…upported
Add unit tests covering the fix that makes equal() use hash comparison
when a backend does not support modtime but does advertise a common hash
type. Tests are written TDD-style (this commit introduces them before
the fix) and cover all three new branches:
A. No common hash → size-only fallback (existing behaviour preserved)
B. Common hash, same content → equal
C. Common hash, different content → not equal (the behaviour fixed by
the next commit)
…less remotes fix(hasher): cache hashes under source size for remotes with delayed size reporting chore: merge all PR branches for combined testing (davispw-head)
320d2a2 to
1ffd962
Compare
1ffd962 to
57f0115
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Four related fixes that together make the
hasherbackend overlay work reliably with Google Photos, and improve correctness for any remote without native hash support or reliable modtime or size.equal()modtime fallback:equal()now falls back to comparing hashes when modtime is not supported but a common hash type exists.hashernow computes and caches hashes in-flight during file updates for remotes without native hashes.Updateto execute the trash workaround immediately during async uploads, preventing duplicates in the album.Previously required (all of the following, and still had edge cases)
Even with all of the above, overwriting a file caused the hasher cache entry to be pruned without replacement, so the next sync always re-downloaded the full file to re-verify the hash.
Now required
--checksumis no longer needed (Part 1 below handles this automatically).--gphotos-read-sizeis NO LONGER needed when using--ignore-size(Part 3 below handles this automatically), avoiding extra HEAD requests and saving massive API transaction quota.--gphotos-batch-mode=asyncnow works without panicking or creating duplicates in albums (fixed in Part 4 below).Part 1:
equal()no longer silently skips when modtime is unsupportedProblem
When
modifyWindow == fs.ModTimeNotSupportedand file sizes match,equal()infs/operations/operations.goreturnedtrueimmediately — assuming the files were identical. A file whose content changed but size didn't would be silently skipped during a sync. This made--checksumnecessary to force a hash comparison.Fix
When a common hash type exists (e.g. MD5 via the
hasheroverlay), we fall through to compare hashes. For backends with no common hash, the previous behaviour is preserved.--checksumis no longer needed.Part 2:
Updatenow caches its hash in-flightProblem
When
hasherwraps a hash-less remote (e.g. Google Photos),Putalready computed the hash in-flight via ahashingReaderand cached it immediately. ButUpdate(overwrite) only pruned the existing cache entry without computing a replacement. The next sync found no cached hash and had to re-download the entire file to verify it.Fix
In
backend/hasher/object.go,Updatenow:o.Object.Fs().Hashes().IsEmpty()).hashingReaderto compute the hash in-flight.putHashes(ctx, hashes, src.Size()).For remotes that do support hashes natively (S3, B2, etc.), the existing behaviour (prune + let remote verification handle it) is unchanged.
Part 3: Hash stored under local source size, not remote-reported size
Problem (Primary Motivation: Async Upload Loops)
The hasher cache is keyed on an object's fingerprint:
"size,modtime,hash".The Async Batch
0Size Loop (Primary Issue):In
batch_mode = async, the upload returns immediately before the item is fully processed and committed on Google's servers. At this point, the returned size is0. The hasher cache would record the computed hash under fingerprint0,-,-. On the next sync, the remote file is listed with its real size (e.g.,12345). Because the fingerprint12345,-,-has no cached hash (it was stored under0,-,-), this causes a cache miss. The mismatch triggers a new upload, leading to an infinite upload loop.The
-1Listing Mismatch:Without
batch_mode = async, Google Photos returns a size of-1unless--gphotos-read-sizeis explicitly used. Storing the hash under-1,-,-and later listing the file with its real size (12345) similarly caused cache misses.Fix
putHashesaccepts an optionallocalSizeoverride (namedexpectedSizein the code signature as a variadic parameter). When called fromUpdateandPut, it now passessrc.Size()— the local file size — as the fingerprint key. A newfingerprintWithSize(ctx, size)helper separates this from the standardfingerprint(ctx)path used for lookups.Additionally, hasher now implements ignore-size fingerprint matching in
backend/hasher/kv.gowhen the globalIgnoreSizeflag is enabled. Whenci.IgnoreSizeis active, the size component of the fingerprint is ignored during database lookup, and a record matches if its modtime and fast-hash components match (which, for Google Photos, are both stable as-).This means:
12345,-,-.--ignore-size, hasher ignores the size component, allowing the lookup to query the database with fingerprint-1,-,-(or any other size) and successfully obtain a cache hit ✓--gphotos-read-size, saving massive API transaction quota.Impact on other backends
For remotes where
o.Object.Size() == src.Size()(S3, B2, and any backend with accurate, immediate size reporting), the stored fingerprint is identical to what it was before — no behavioural change. The fix only makes a meaningful difference on backends where the remote size lags, is zero during async processing, or is permanently-1.Part 4: Google Photos Trash Workaround Async Fix
Problem
When using the
hasherbackend overlay to sync metadata changes (like updating EXIF description or tags, which changes the file hash), rclone performs anUpdateoperation. If Google Photos is configured withbatch_mode = async, the new upload is committed in the background, so the newly uploaded media item's details are not returned immediately (the batcher returnsnilsynchronously).Because
infoisnilin async mode,o.iddoes not get updated to the new media item's ID synchronously. The overwrite checkif oldID != "" && oldID != o.idthen evaluatedoldID != oldID(which isfalse), completely skipping the trash/update workaround and leaving duplicate files in the Google Photos album.Fix
The check in
backend/googlephotos/googlephotos.gois simplified to:Since
Object.Updateis only called by rclone when replacing/overwriting a file that already exists at the destination, a non-emptyoldIDalways guarantees that an overwrite is occurring, and thus the old item should always be moved to the trash album.Automated Tests
fs/operations/operations_test.go:TestEqualHashFallbackbackend/hasher/hasher_internal_test.go:UpdateInFlightHashing/UnderlyingLacksHashesUpdatefor hash-less remotesUpdateInFlightHashing/UnderlyingSupportsHashesUpdatefor hash-native remotesbackend/googlephotos/googlephotos_workaround_test.go:TestUpdateTrashWorkaroundAsyncUpdateunder async batch modeManual Test Plan (Add, Update, Remove Operations)
Prerequisites
Step 1: Add operation — hashes cached in-flight
Expected: both files transferred.
Step 2: Skip operation (Re-sync with no changes) — zero transfers (cache hit)
Expected: nothing transferred. Without the ignore-size fingerprint fix, rclone would re-upload the files or re-download them to verify.
Step 3: Update operation (Overwrite one photo) — change detected, one file transferred
Expected: exactly one file transferred (
photo_a.jpg, whose content changed).photo_b.jpgis skipped (unchanged, cache hit).Step 4: Skip operation (Re-sync after overwrite) — zero transfers (new hash cached correctly)
Expected: nothing transferred. Verifies that the hash written during Step 3's overwrite (stored under the
src.Size()fingerprint key) is correctly found on lookup.Step 5: Remove operation (Delete one photo) — change detected, remote photo deleted/trashed
Expected: exactly one deletion/removal processed (
photo_b.jpgremoved).Step 6: Skip operation (Re-sync after deletion) — zero transfers and no re-creation
Expected: nothing transferred, and
photo_b.jpgis NOT recreated. Hasher's cache was successfully pruned of the deleted file's hash.Cleanup
Warning
Running
rclone purgeonrclone_Trashdeletes the album container but leaves the files themselves orphaned ("zombie files") in your main Google Photos library. You must delete the files from your library first.rclone_Trashalbum, select all photos, and click "Move to trash" (or delete them).Dependencies (gh-stack)
This PR is part of a stack. It depends on:
gphotos-trash-album: trash workaround + error propagationgphotos-exif-description: EXIF/XMP description mappinggphotos-async-panic: async batch-mode panic fix (which is necessary to allow the recommended async batch mode in this PR's documentation to function without crashing)