Skip to content

Commit 2c187f1

Browse files
committed
content(blog): tag filesystem
1 parent de8778f commit 2c187f1

File tree

1 file changed

+153
-0
lines changed

1 file changed

+153
-0
lines changed
Lines changed: 153 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,153 @@
1+
# Escape the Hierarchy Trap: How Tag File Systems Work
2+
3+
We’ve been living in a tree for 50 years.
4+
5+
Ever since the dawn of modern computing, we’ve organized our digital lives into **Hierarchical File Systems**. You have a root, you have branches (folders), and you have leaves (files). It’s neat, it’s tidy, and it’s also **fundamentally broken** for the way we actually think.
6+
7+
Why? Because a file often belongs in more than one place.
8+
9+
Is that `invoice.pdf` in `/Work/Invoices`, or is it in `/Clients/ACME/2024`? In a hierarchy, you have to choose. You either duplicate the file (wasteful), use symlinks (messy), or just hope your future self remembers your arbitrary decision.
10+
11+
Enter: **The Tag File System (TFS)**.
12+
13+
---
14+
15+
## What is a Tag File System?
16+
17+
In a Tag File System, the physical location of a file is irrelevant. Instead of being "inside" a folder, a file is "associated" with one or more **tags**.
18+
19+
Think of it like Gmail labels vs. Outlook folders. In Outlook, an email is in a folder. In Gmail, an email has labels. You can view your "Taxes" label and your "Important" label, and the same email appears in both.
20+
21+
### The Semantic Shift
22+
- **Hierarchical:** Path-based (`/home/user/photos/cats/oscar.jpg`)
23+
- **Tag-based:** Attribute-based (`oscar.jpg` + `type:photo` + `subject:cat` + `name:oscar`)
24+
25+
---
26+
27+
## How it Works Under the Hood
28+
29+
How do you actually build this? You can't just delete folders and expect your OS to keep working. Most Tag File Systems are implemented as **overlays**.
30+
31+
### 1. The Database Approach
32+
The most common way to implement a TFS (like the excellent [TMSU](https://tmsu.org/)) is to use a sidecar database—usually **SQLite**.
33+
34+
The database maps file hashes or paths to a list of tags.
35+
36+
```mermaid
37+
graph LR
38+
subgraph Database
39+
Files[Files Table]
40+
Tags[Tags Table]
41+
Map[File_Tags Mapping]
42+
end
43+
44+
FileA[cat.jpg] --> Files
45+
Tag1[#pets] --> Tags
46+
Tag2[#cute] --> Tags
47+
48+
Files --- Map
49+
Tags --- Map
50+
```
51+
52+
### 2. The FUSE Magic
53+
To make this usable by your favorite apps (like Photoshop or VLC), these systems use **FUSE (Filesystem in Userspace)**.
54+
55+
FUSE allows a program to "pretend" to be a disk partition. When you browse a FUSE-mounted Tag File System, the folders you see aren't real. If you enter a directory named `query/cats+cute/`, the FUSE driver:
56+
1. Intercepts the `ls` command.
57+
2. Queries the SQLite database for files tagged with both "cats" and "cute".
58+
3. Returns those files as if they were actually sitting in that folder.
59+
60+
---
61+
62+
## Let's Build a Simple Tagger in Go
63+
64+
If we wanted to build a tiny version of this, we'd start with a way to track these relationships. Here is a conceptual implementation using Go and a simple map (in reality, you'd use SQL).
65+
66+
```go
67+
package main
68+
69+
import (
70+
"fmt"
71+
)
72+
73+
type FileID string
74+
75+
type TagSystem struct {
76+
// Maps Tag -> Set of FileIDs
77+
Tags map[string]map[FileID]bool
78+
// Maps FileID -> Set of Tags (for quick lookup)
79+
Files map[FileID]map[string]bool
80+
}
81+
82+
func NewTagSystem() *TagSystem {
83+
return &TagSystem{
84+
Tags: make(map[string]map[FileID]bool),
85+
Files: make(map[FileID]map[string]bool),
86+
}
87+
}
88+
89+
func (ts *TagSystem) TagFile(file FileID, tag string) {
90+
if ts.Tags[tag] == nil {
91+
ts.Tags[tag] = make(map[FileID]bool)
92+
}
93+
if ts.Files[file] == nil {
94+
ts.Files[file] = make(map[string]bool)
95+
}
96+
ts.Tags[tag][file] = true
97+
ts.Files[file][tag] = true
98+
}
99+
100+
func (ts *TagSystem) Query(tags ...string) []FileID {
101+
if len(tags) == 0 {
102+
return nil
103+
}
104+
105+
// Start with the first tag's files
106+
results := make(map[FileID]bool)
107+
for f := range ts.Tags[tags[0]] {
108+
results[f] = true
109+
}
110+
111+
// Intersect with subsequent tags (AND logic)
112+
for _, tag := range tags[1:] {
113+
for f := range results {
114+
if !ts.Tags[tag][f] {
115+
delete(results, f)
116+
}
117+
}
118+
}
119+
120+
var final []FileID
121+
for f := range results {
122+
final = append(final, f)
123+
}
124+
return final
125+
}
126+
127+
func main() {
128+
tfs := NewTagSystem()
129+
130+
tfs.TagFile("vacation_01.jpg", "2024")
131+
tfs.TagFile("vacation_01.jpg", "beach")
132+
tfs.TagFile("work_notes.pdf", "2024")
133+
tfs.TagFile("work_notes.pdf", "boring")
134+
135+
fmt.Println("Files from 2024 at the beach:", tfs.Query("2024", "beach"))
136+
// Output: [vacation_01.jpg]
137+
}
138+
```
139+
140+
---
141+
142+
## The Trade-offs
143+
144+
Tag file systems sound like paradise, but why aren't we all using them as our primary OS?
145+
146+
1. **The "Clean Room" Problem:** Hierarchies are low-maintenance. You just throw a file in a folder. Tags require **discipline**. If you don't tag your files, they vanish into a black hole.
147+
2. **Standardization:** There is no "Tagging Standard." If you move your files from Linux (TMSU) to macOS, your tags don't come with you unless they are embedded in the file metadata (like EXIF or ID3 tags).
148+
3. **Performance:** Querying a database with 10 million files and complex tag intersections can be slower than a simple directory lookup.
149+
150+
## Summary
151+
152+
Tag File Systems represent a move from **location-based** computing to **meaning-based** computing. While we might not be ready to ditch folders entirely, adding a tagging layer to your workflow—especially for large media libraries or research papers—can save you from the "Where did I put that?" nightmare.
153+

0 commit comments

Comments
 (0)