Skip to content

Commit 3bece2f

Browse files
committed
archive/tar: refactor Reader support for sparse files
This CL is the first step (of two) for adding sparse file support to the Writer. This CL only refactors the logic of sparse-file handling in the Reader so that common logic can be easily shared by the Writer. As a result of this CL, there are some new publicly visible API changes: type SparseEntry struct { Offset, Length int64 } type Header struct { ...; SparseHoles []SparseEntry } A new type is defined to represent a sparse fragment and a new field Header.SparseHoles is added to represent the sparse holes in a file. The API intentionally represent sparse files using hole fragments, rather than data fragments so that the zero value of SparseHoles naturally represents a normal file (i.e., a file without any holes). The Reader now populates SparseHoles for sparse files. It is necessary to export the sparse hole information, otherwise it would be impossible for the Writer to specify that it is trying to encode a sparse file, and what it looks like. Some unexported helper functions were added to common.go: func validateSparseEntries(sp []SparseEntry, size int64) bool func alignSparseEntries(src []SparseEntry, size int64) []SparseEntry func invertSparseEntries(src []SparseEntry, size int64) []SparseEntry The validation logic that used to be in newSparseFileReader is now moved to validateSparseEntries so that the Writer can use it in the future. alignSparseEntries is currently unused by the Reader, but will be used by the Writer in the future. Since TAR represents sparse files by only recording the data fragments, we add the invertSparseEntries function to convert a list of data fragments to a normalized list of hole fragments (and vice-versa). Some other high-level changes: * skipUnread is deleted, where most of it's logic is moved to the Discard methods on regFileReader and sparseFileReader. * readGNUSparsePAXHeaders was rewritten to be simpler. * regFileReader and sparseFileReader were completely rewritten in simpler and easier to understand logic. * A bug was fixed in sparseFileReader.Read where it failed to report an error if the logical size of the file ends before consuming all of the underlying data. * The tests for sparse-file support was completely rewritten. Updates golang#13548 Change-Id: Ic1233ae5daf3b3f4278fe1115d34a90c4aeaf0c2 Reviewed-on: https://go-review.googlesource.com/56771 Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
1 parent b2174a1 commit 3bece2f

File tree

6 files changed

+1175
-696
lines changed

6 files changed

+1175
-696
lines changed

src/archive/tar/common.go

Lines changed: 149 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ package tar
1515
import (
1616
"errors"
1717
"fmt"
18+
"math"
1819
"os"
1920
"path"
2021
"strconv"
@@ -30,6 +31,8 @@ var (
3031
ErrWriteTooLong = errors.New("tar: write too long")
3132
ErrFieldTooLong = errors.New("tar: header field too long")
3233
ErrWriteAfterClose = errors.New("tar: write after close")
34+
errMissData = errors.New("tar: sparse file references non-existent data")
35+
errUnrefData = errors.New("tar: sparse file contains unreferenced data")
3336
)
3437

3538
// Header type flags.
@@ -68,6 +71,131 @@ type Header struct {
6871
AccessTime time.Time // access time
6972
ChangeTime time.Time // status change time
7073
Xattrs map[string]string
74+
75+
// SparseHoles represents a sequence of holes in a sparse file.
76+
//
77+
// The regions must be sorted in ascending order, not overlap with
78+
// each other, and not extend past the specified Size.
79+
// The file is sparse if either len(SparseHoles) > 0 or
80+
// the Typeflag is set to TypeGNUSparse.
81+
SparseHoles []SparseEntry
82+
}
83+
84+
// SparseEntry represents a Length-sized fragment at Offset in the file.
85+
type SparseEntry struct{ Offset, Length int64 }
86+
87+
func (s SparseEntry) endOffset() int64 { return s.Offset + s.Length }
88+
89+
// A sparse file can be represented as either a sparseDatas or a sparseHoles.
90+
// As long as the total size is known, they are equivalent and one can be
91+
// converted to the other form and back. The various tar formats with sparse
92+
// file support represent sparse files in the sparseDatas form. That is, they
93+
// specify the fragments in the file that has data, and treat everything else as
94+
// having zero bytes. As such, the encoding and decoding logic in this package
95+
// deals with sparseDatas.
96+
//
97+
// However, the external API uses sparseHoles instead of sparseDatas because the
98+
// zero value of sparseHoles logically represents a normal file (i.e., there are
99+
// no holes in it). On the other hand, the zero value of sparseDatas implies
100+
// that the file has no data in it, which is rather odd.
101+
//
102+
// As an example, if the underlying raw file contains the 10-byte data:
103+
// var compactFile = "abcdefgh"
104+
//
105+
// And the sparse map has the following entries:
106+
// var spd sparseDatas = []sparseEntry{
107+
// {Offset: 2, Length: 5}, // Data fragment for 2..6
108+
// {Offset: 18, Length: 3}, // Data fragment for 18..20
109+
// }
110+
// var sph sparseHoles = []SparseEntry{
111+
// {Offset: 0, Length: 2}, // Hole fragment for 0..1
112+
// {Offset: 7, Length: 11}, // Hole fragment for 7..17
113+
// {Offset: 21, Length: 4}, // Hole fragment for 21..24
114+
// }
115+
//
116+
// Then the content of the resulting sparse file with a Header.Size of 25 is:
117+
// var sparseFile = "\x00"*2 + "abcde" + "\x00"*11 + "fgh" + "\x00"*4
118+
type (
119+
sparseDatas []SparseEntry
120+
sparseHoles []SparseEntry
121+
)
122+
123+
// validateSparseEntries reports whether sp is a valid sparse map.
124+
// It does not matter whether sp represents data fragments or hole fragments.
125+
func validateSparseEntries(sp []SparseEntry, size int64) bool {
126+
// Validate all sparse entries. These are the same checks as performed by
127+
// the BSD tar utility.
128+
if size < 0 {
129+
return false
130+
}
131+
var pre SparseEntry
132+
for _, cur := range sp {
133+
switch {
134+
case cur.Offset < 0 || cur.Length < 0:
135+
return false // Negative values are never okay
136+
case cur.Offset > math.MaxInt64-cur.Length:
137+
return false // Integer overflow with large length
138+
case cur.endOffset() > size:
139+
return false // Region extends beyond the actual size
140+
case pre.endOffset() > cur.Offset:
141+
return false // Regions cannot overlap and must be in order
142+
}
143+
pre = cur
144+
}
145+
return true
146+
}
147+
148+
// alignSparseEntries mutates src and returns dst where each fragment's
149+
// starting offset is aligned up to the nearest block edge, and each
150+
// ending offset is aligned down to the nearest block edge.
151+
//
152+
// Even though the Go tar Reader and the BSD tar utility can handle entries
153+
// with arbitrary offsets and lengths, the GNU tar utility can only handle
154+
// offsets and lengths that are multiples of blockSize.
155+
func alignSparseEntries(src []SparseEntry, size int64) []SparseEntry {
156+
dst := src[:0]
157+
for _, s := range src {
158+
pos, end := s.Offset, s.endOffset()
159+
pos += blockPadding(+pos) // Round-up to nearest blockSize
160+
if end != size {
161+
end -= blockPadding(-end) // Round-down to nearest blockSize
162+
}
163+
if pos < end {
164+
dst = append(dst, SparseEntry{Offset: pos, Length: end - pos})
165+
}
166+
}
167+
return dst
168+
}
169+
170+
// invertSparseEntries converts a sparse map from one form to the other.
171+
// If the input is sparseHoles, then it will output sparseDatas and vice-versa.
172+
// The input must have been already validated.
173+
//
174+
// This function mutates src and returns a normalized map where:
175+
// * adjacent fragments are coalesced together
176+
// * only the last fragment may be empty
177+
// * the endOffset of the last fragment is the total size
178+
func invertSparseEntries(src []SparseEntry, size int64) []SparseEntry {
179+
dst := src[:0]
180+
var pre SparseEntry
181+
for _, cur := range src {
182+
if cur.Length == 0 {
183+
continue // Skip empty fragments
184+
}
185+
pre.Length = cur.Offset - pre.Offset
186+
if pre.Length > 0 {
187+
dst = append(dst, pre) // Only add non-empty fragments
188+
}
189+
pre.Offset = cur.endOffset()
190+
}
191+
pre.Length = size - pre.Offset // Possibly the only empty fragment
192+
return append(dst, pre)
193+
}
194+
195+
type fileState interface {
196+
// Remaining reports the number of remaining bytes in the current file.
197+
// This count includes any sparse holes that may exist.
198+
Remaining() int64
71199
}
72200

73201
// FileInfo returns an os.FileInfo for the Header.
@@ -300,6 +428,17 @@ const (
300428
paxUname = "uname"
301429
paxXattr = "SCHILY.xattr."
302430
paxNone = ""
431+
432+
// Keywords for GNU sparse files in a PAX extended header.
433+
paxGNUSparseNumBlocks = "GNU.sparse.numblocks"
434+
paxGNUSparseOffset = "GNU.sparse.offset"
435+
paxGNUSparseNumBytes = "GNU.sparse.numbytes"
436+
paxGNUSparseMap = "GNU.sparse.map"
437+
paxGNUSparseName = "GNU.sparse.name"
438+
paxGNUSparseMajor = "GNU.sparse.major"
439+
paxGNUSparseMinor = "GNU.sparse.minor"
440+
paxGNUSparseSize = "GNU.sparse.size"
441+
paxGNUSparseRealSize = "GNU.sparse.realsize"
303442
)
304443

305444
// FileInfoHeader creates a partially-populated Header from fi.
@@ -373,6 +512,9 @@ func FileInfoHeader(fi os.FileInfo, link string) (*Header, error) {
373512
h.Size = 0
374513
h.Linkname = sys.Linkname
375514
}
515+
if sys.SparseHoles != nil {
516+
h.SparseHoles = append([]SparseEntry{}, sys.SparseHoles...)
517+
}
376518
}
377519
if sysStat != nil {
378520
return h, sysStat(fi, h)
@@ -390,3 +532,10 @@ func isHeaderOnlyType(flag byte) bool {
390532
return false
391533
}
392534
}
535+
536+
func min(a, b int64) int64 {
537+
if a < b {
538+
return a
539+
}
540+
return b
541+
}

src/archive/tar/format.go

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,12 @@ const (
5050
prefixSize = 155 // Max length of the prefix field in USTAR format
5151
)
5252

53+
// blockPadding computes the number of bytes needed to pad offset up to the
54+
// nearest block edge where 0 <= n < blockSize.
55+
func blockPadding(offset int64) (n int64) {
56+
return -offset & (blockSize - 1)
57+
}
58+
5359
var zeroBlock block
5460

5561
type block [blockSize]byte
@@ -192,11 +198,11 @@ func (h *headerUSTAR) Prefix() []byte { return h[345:][:155] }
192198

193199
type sparseArray []byte
194200

195-
func (s sparseArray) Entry(i int) sparseNode { return (sparseNode)(s[i*24:]) }
201+
func (s sparseArray) Entry(i int) sparseElem { return (sparseElem)(s[i*24:]) }
196202
func (s sparseArray) IsExtended() []byte { return s[24*s.MaxEntries():][:1] }
197203
func (s sparseArray) MaxEntries() int { return len(s) / 24 }
198204

199-
type sparseNode []byte
205+
type sparseElem []byte
200206

201-
func (s sparseNode) Offset() []byte { return s[00:][:12] }
202-
func (s sparseNode) NumBytes() []byte { return s[12:][:12] }
207+
func (s sparseElem) Offset() []byte { return s[00:][:12] }
208+
func (s sparseElem) Length() []byte { return s[12:][:12] }

0 commit comments

Comments
 (0)