Environment
- Server version:
@solid/community-server v7 (deployed via npm install -g @solid/community-server@7)
- Node.js version: v20 (from
node:20-slim base image)
- npm version: bundled with the above
- Storage backend:
css:config/storage/backend/file.json
- Locker:
css:config/util/resource-locker/memory.json
- Filesystem under
--rootFilePath: Azure Files share mounted at /data (SMB)
Description
If the $<ext> content file for a resource goes missing from the file-backend store while the parent directory still exists, the resource enters an unrepairable state via the HTTP API:
GET/HEAD continue to return 200 indefinitely with a stable etag (CSS appears to keep the response cached, including across revision restarts of the host container — so the in-memory theory doesn't fully explain it).
- Every
PUT returns H500 with ENOENT: no such file or directory, open '<path>$.<ext>'.
DELETE on the resource returns 205 but does not clear the cache — the next HEAD returns the same 200/etag.
DELETE on the .meta companion is refused with H409 "Not allowed to delete metadata resources directly."
DELETE on the $<ext> path is refused with H501 "Identifiers cannot contain a dollar sign before their extension."
DELETE on any parent container is refused with H409 "Can only delete empty containers." — even though the leaf .well-known/ directory on disk is empty after the body file vanishes.
PUT with If-None-Match: * returns H412 (CSS still thinks the resource exists).
Net effect: a single missing $<ext> body file permanently breaks all writes to that LDP resource, and there's no client-side recovery path. The resource is "alive" enough to fail every write, but "dead" enough that no DELETE chain can remove it.
How the body file went missing isn't fully clear in my deployment — the suspected trigger is an SMB-side write being torn while a long-poll lock from the in-memory resource-locker was held during a previous run. But the more general concern is the handling: CSS doesn't reconcile or recover from this state at all.
Reproduction
The corruption itself is hard to deterministically reproduce (it took an unlucky combination of concurrent writers + an SMB volume + a crashed client), but the failure mode + unrepairability can be reproduced trivially by simulating the on-disk shape:
- Start CSS pointing
--rootFilePath /data at any directory.
- Create the resource through CSS normally:
curl -X PUT -H 'Content-Type: text/turtle' \
-H 'If-None-Match: *' \
--data-binary '<#a> <#b> <#c> .' \
http://localhost:3456/foo/.well-known/test
- Out-of-band, delete the backing file (simulating the SMB tear / disk-pressure eviction / whatever caused it in production):
rm /data/foo/.well-known/test$.ttl
# leave the parent .well-known/ directory intact and empty
- Try to write again through CSS:
ETAG=$(curl -sI http://localhost:3456/foo/.well-known/test | grep -i ^etag | awk '{print $2}')
curl -X PUT -H 'Content-Type: text/turtle' \
-H "If-Match: $ETAG" \
--data-binary '<#a> <#b> <#d> .' \
http://localhost:3456/foo/.well-known/test
→ H500 ENOENT: open '/data/foo/.well-known/test$.ttl'
- Try every DELETE chain — observe each H409/H501/H205-without-cleanup outcome listed above.
Expected
CSS should either:
- Self-heal: detect the missing body file on the first failing write and treat it as a fresh resource (
PUT succeeds, body file gets recreated), or
- Surface a recoverable error: emit a distinct error code (e.g.
H410 Gone or a specific BodyFileMissingError) and accept a subsequent PUT with If-Match: * or similar to repair, or
- At minimum, let
DELETE on the resource actually clean up the orphaned .meta and parent-container state so the operator can rebuild from scratch.
Actual
None of the above. The resource is permanently stuck until an operator with disk-level access to the storage volume uploads a body file by hand. In a managed environment (Azure Files / NFS / etc.) that's a non-trivial recovery step requiring storage-account credentials. For a serverless / hosted CSS, there's no recovery path at all.
Workaround (in case anyone else hits this)
In my Azure Files-backed deployment I recovered by uploading a fresh body file directly into the share at the expected $.ttl path:
az storage file upload \
--account-name <storage-account> \
--account-key <key> \
--share-name <share> \
--source ./recovery-body.ttl \
--path "<full/path/to/resource>\$.ttl"
After upload, the very next HEAD against CSS picks up the new file (new etag), and PUT works again.
Suggested fix areas
FileDataAccessor.writeData and friends in src/storage/accessors/FileDataAccessor.ts — if the existing-file open fails with ENOENT but the parent dir is present, fall through to the "create new" path rather than propagating ENOENT.
- Whatever layer is producing the cached
GET response after revision restart — verify it isn't serving stale etags from a place that survives container restart. (If CSS truly is reading the response from disk each time, the body must be coming from somewhere — possibly .internal/?)
- The
DELETE path on resources whose body file is already missing — should still clean up .meta + parent state, not return 205 and leave everything in place.
Happy to test a candidate fix against the same Azure Files-backed deployment if useful.
Environment
@solid/community-serverv7 (deployed vianpm install -g @solid/community-server@7)node:20-slimbase image)css:config/storage/backend/file.jsoncss:config/util/resource-locker/memory.json--rootFilePath: Azure Files share mounted at/data(SMB)Description
If the
$<ext>content file for a resource goes missing from the file-backend store while the parent directory still exists, the resource enters an unrepairable state via the HTTP API:GET/HEADcontinue to return200indefinitely with a stableetag(CSS appears to keep the response cached, including across revision restarts of the host container — so the in-memory theory doesn't fully explain it).PUTreturnsH500withENOENT: no such file or directory, open '<path>$.<ext>'.DELETEon the resource returns205but does not clear the cache — the nextHEADreturns the same200/etag.DELETEon the.metacompanion is refused withH409 "Not allowed to delete metadata resources directly."DELETEon the$<ext>path is refused withH501 "Identifiers cannot contain a dollar sign before their extension."DELETEon any parent container is refused withH409 "Can only delete empty containers."— even though the leaf.well-known/directory on disk is empty after the body file vanishes.PUTwithIf-None-Match: *returnsH412(CSS still thinks the resource exists).Net effect: a single missing
$<ext>body file permanently breaks all writes to that LDP resource, and there's no client-side recovery path. The resource is "alive" enough to fail every write, but "dead" enough that no DELETE chain can remove it.How the body file went missing isn't fully clear in my deployment — the suspected trigger is an SMB-side write being torn while a long-poll lock from the in-memory
resource-lockerwas held during a previous run. But the more general concern is the handling: CSS doesn't reconcile or recover from this state at all.Reproduction
The corruption itself is hard to deterministically reproduce (it took an unlucky combination of concurrent writers + an SMB volume + a crashed client), but the failure mode + unrepairability can be reproduced trivially by simulating the on-disk shape:
--rootFilePath /dataat any directory.rm /data/foo/.well-known/test$.ttl # leave the parent .well-known/ directory intact and emptyH500 ENOENT: open '/data/foo/.well-known/test$.ttl'Expected
CSS should either:
PUTsucceeds, body file gets recreated), orH410 Goneor a specificBodyFileMissingError) and accept a subsequentPUTwithIf-Match: *or similar to repair, orDELETEon the resource actually clean up the orphaned.metaand parent-container state so the operator can rebuild from scratch.Actual
None of the above. The resource is permanently stuck until an operator with disk-level access to the storage volume uploads a body file by hand. In a managed environment (Azure Files / NFS / etc.) that's a non-trivial recovery step requiring storage-account credentials. For a serverless / hosted CSS, there's no recovery path at all.
Workaround (in case anyone else hits this)
In my Azure Files-backed deployment I recovered by uploading a fresh body file directly into the share at the expected
$.ttlpath:After upload, the very next
HEADagainst CSS picks up the new file (new etag), andPUTworks again.Suggested fix areas
FileDataAccessor.writeDataand friends insrc/storage/accessors/FileDataAccessor.ts— if the existing-file open fails withENOENTbut the parent dir is present, fall through to the "create new" path rather than propagatingENOENT.GETresponse after revision restart — verify it isn't serving staleetags from a place that survives container restart. (If CSS truly is reading the response from disk each time, the body must be coming from somewhere — possibly.internal/?)DELETEpath on resources whose body file is already missing — should still clean up.meta+ parent state, not return205and leave everything in place.Happy to test a candidate fix against the same Azure Files-backed deployment if useful.