|
| 1 | +At the core level, git is character encoding agnostic. |
| 2 | + |
| 3 | + - The pathnames recorded in the index and in the tree objects |
| 4 | + are treated as uninterpreted sequences of non-NUL bytes. |
| 5 | + What readdir(2) returns are what are recorded and compared |
| 6 | + with the data git keeps track of, which in turn are expected |
| 7 | + to be what lstat(2) and creat(2) accepts. There is no such |
| 8 | + thing as pathname encoding translation. |
| 9 | + |
| 10 | + - The contents of the blob objects are uninterpreted sequence |
| 11 | + of bytes. There is no encoding translation at the core |
| 12 | + level. |
| 13 | + |
| 14 | + - The commit log messages are uninterpreted sequence of non-NUL |
| 15 | + bytes. |
| 16 | + |
| 17 | +Although we encourage that the commit log messages are encoded |
| 18 | +in UTF-8, both the core and git Porcelain are designed not to |
| 19 | +force UTF-8 on projects. If all participants of a particular |
| 20 | +project find it more convenient to use legacy encodings, git |
| 21 | +does not forbid it. However, there are a few things to keep in |
| 22 | +mind. |
| 23 | + |
| 24 | +. `git-commit-tree` (hence, `git-commit` which uses it) issues |
| 25 | + an warning if the commit log message given to it does not look |
| 26 | + like a valid UTF-8 string, unless you explicitly say your |
| 27 | + project uses a legacy encoding. The way to say this is to |
| 28 | + have core.commitencoding in `.git/config` file, like this: |
| 29 | ++ |
| 30 | +------------ |
| 31 | +[core] |
| 32 | + commitencoding = ISO-8859-1 |
| 33 | +------------ |
| 34 | ++ |
| 35 | +Commit objects created with the above setting record the value |
| 36 | +of `core.commitencoding` in its `encoding` header. This is to |
| 37 | +help other people who look at them later. Lack of this header |
| 38 | +implies that the commit log message is encoded in UTF-8. |
| 39 | + |
| 40 | +. `git-log`, `git-show` and friends looks at the `encoding` |
| 41 | + header of a commit object, and tries to re-code the log |
| 42 | + message into UTF-8 unless otherwise specified. You can |
| 43 | + specify the desired output encoding with |
| 44 | + `core.logoutputencoding` in `.git/config` file, like this: |
| 45 | ++ |
| 46 | +------------ |
| 47 | +[core] |
| 48 | + logoutputencoding = ISO-8859-1 |
| 49 | +------------ |
| 50 | ++ |
| 51 | +If you do not have this configuration variable, the value of |
| 52 | +`core.commitencoding` is used instead. |
| 53 | + |
| 54 | +Note that we deliberately chose not to re-code the commit log |
| 55 | +message when a commit is made to force UTF-8 at the commit |
| 56 | +object level, because re-coding to UTF-8 is not necessarily a |
| 57 | +reversible operation. |
0 commit comments