Skip to content

Commit b1de6b2

Browse files
committed
Merge branch 'jk/fast-export-anonymize'
Sometimes users want to report a bug they experience on their repository, but they are not at liberty to share the contents of the repository. "fast-export" was taught an "--anonymize" option to replace blob contents, names of people and paths and log messages with bland and simple strings to help them. * jk/fast-export-anonymize: docs/fast-export: explain --anonymize more completely teach fast-export an --anonymize option
2 parents d9dd4ce + 75d3d65 commit b1de6b2

File tree

3 files changed

+462
-11
lines changed

3 files changed

+462
-11
lines changed

Documentation/git-fast-export.txt

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -105,6 +105,11 @@ marks the same across runs.
105105
in the commit (as opposed to just listing the files which are
106106
different from the commit's first parent).
107107

108+
--anonymize::
109+
Anonymize the contents of the repository while still retaining
110+
the shape of the history and stored tree. See the section on
111+
`ANONYMIZING` below.
112+
108113
--refspec::
109114
Apply the specified refspec to each ref exported. Multiple of them can
110115
be specified.
@@ -141,6 +146,62 @@ referenced by that revision range contains the string
141146
'refs/heads/master'.
142147

143148

149+
ANONYMIZING
150+
-----------
151+
152+
If the `--anonymize` option is given, git will attempt to remove all
153+
identifying information from the repository while still retaining enough
154+
of the original tree and history patterns to reproduce some bugs. The
155+
goal is that a git bug which is found on a private repository will
156+
persist in the anonymized repository, and the latter can be shared with
157+
git developers to help solve the bug.
158+
159+
With this option, git will replace all refnames, paths, blob contents,
160+
commit and tag messages, names, and email addresses in the output with
161+
anonymized data. Two instances of the same string will be replaced
162+
equivalently (e.g., two commits with the same author will have the same
163+
anonymized author in the output, but bear no resemblance to the original
164+
author string). The relationship between commits, branches, and tags is
165+
retained, as well as the commit timestamps (but the commit messages and
166+
refnames bear no resemblance to the originals). The relative makeup of
167+
the tree is retained (e.g., if you have a root tree with 10 files and 3
168+
trees, so will the output), but their names and the contents of the
169+
files will be replaced.
170+
171+
If you think you have found a git bug, you can start by exporting an
172+
anonymized stream of the whole repository:
173+
174+
---------------------------------------------------
175+
$ git fast-export --anonymize --all >anon-stream
176+
---------------------------------------------------
177+
178+
Then confirm that the bug persists in a repository created from that
179+
stream (many bugs will not, as they really do depend on the exact
180+
repository contents):
181+
182+
---------------------------------------------------
183+
$ git init anon-repo
184+
$ cd anon-repo
185+
$ git fast-import <../anon-stream
186+
$ ... test your bug ...
187+
---------------------------------------------------
188+
189+
If the anonymized repository shows the bug, it may be worth sharing
190+
`anon-stream` along with a regular bug report. Note that the anonymized
191+
stream compresses very well, so gzipping it is encouraged. If you want
192+
to examine the stream to see that it does not contain any private data,
193+
you can peruse it directly before sending. You may also want to try:
194+
195+
---------------------------------------------------
196+
$ perl -pe 's/\d+/X/g' <anon-stream | sort -u | less
197+
---------------------------------------------------
198+
199+
which shows all of the unique lines (with numbers converted to "X", to
200+
collapse "User 0", "User 1", etc into "User X"). This produces a much
201+
smaller output, and it is usually easy to quickly confirm that there is
202+
no private data in the stream.
203+
204+
144205
Limitations
145206
-----------
146207

0 commit comments

Comments
 (0)