-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Description
Background and motivation
In the initial Tar proposal, the assumption was that the PAX format allowed one special entry known as Global Extended Attributes (GEA from now on), at the beginning of the archive, to allow overriding the extended attributes all the subsequent entries in the archive. This was incorrect.
The FreeBSD tar spec does not explain much about the GEA entries, except that they exist.
Both the OpenGroup pax manual and the GNU tar manual explain the format of the name in the GEA entry, which is $TMPDIR/GlobalHead.%p.%n
, and describes the suffix number as:
%n An integer that represents the
sequence number of the global extended
header record in the archive, starting
at 1.
But there is no mention of when to expect more than 1 entry and what they mean.
Then I recently found this spec: IBM z/OS 2.5.0 pax manual, which has a clear and detailed description of how the GEA entry should work:
g
Represents global extended header records for the following files in the archive.
[...]
Each value shall affect all subsequent files that do not override that value in their own extended header record and until another global extended header record is reached that provides another value for the same field.
API Proposal
New:
namespace System.Formats.Tar
{
public sealed partial class PaxGlobalExtendedAttributesTarEntry : System.Formats.Tar.PosixTarEntry
{
public PaxGlobalExtendedAttributesTarEntry(System.Collections.Generic.IEnumerable<System.Collections.Generic.KeyValuePair<string, string>> globalExtendedAttributes) { }
public System.Collections.Generic.IReadOnlyDictionary<string, string> GlobalExtendedAttributes { get { throw null; } }
}
Remove:
public sealed partial class TarReader : System.IDisposable
{
- public System.Collections.Generic.IReadOnlyDictionary<string, string>? GlobalExtendedAttributes { get { throw null; } }
}
Modify:
public sealed partial class TarWriter : System.IDisposable
{
- public TarWriter(System.IO.Stream archiveStream, System.Collections.Generic.IEnumerable<System.Collections.Generic.KeyValuePair<string, string>>? globalExtendedAttributes = null, bool leaveOpen = false) { }
+ public TarWriter(System.IO.Stream archiveStream, bool leaveOpen = false) { }
}
API Usage
Before
We could only add one GEA entry:
// Write
Dictionary<string, string> attributes = new();
attributes["SomeAttributeName"] = "I'm an extended attribute!";
using MemoryStream ms = new();
using TarWriter writer = new(ms, attributes, leaveOpen: true)
{
// Add some more entries if desired
}
ms.Position = 0;
using TarReader reader = new(ms, leaveOpen: false);
reader.GetNextEntry(); // Advance the reader to detect format and GEA entry
// Access the values of the single GEA
Console.WriteLine(reader.GlobalExtendedAttributes["SomeAttributeName"]); // "I'm an extended attribute!"
After
With this proposed change, we could now add multiple GEA entries:
// Write
Dictionary<string, string> attributes1 = new();
attributes["attr1"] = "I'm extended attribute 1!";
PaxGlobalExtendedAttributesTarEntry gea1 = new(attributes1);
Dictionary<string, string> attributes2 = new();
attributes["attr2"] = "I'm extended attribute 2!";
PaxGlobalExtendedAttributesTarEntry gea2 = new(attributes2);
using MemoryStream ms = new();
using TarWriter writer = new(ms, leaveOpen: true) // Default format is PAX for this constructor
{
writer.WriteEntry(gea1);
// Add some more entries if desired, they'll be affected by gea1
writer.WriteEntry(gea2);
// Add some more entries if desired, they'll be affected by gea2
}
ms.Position = 0;
using TarReader reader = new(ms, leaveOpen: false);
PaxGlobalExtendedAttributesTarEntry readGea1 = reader.GetNextEntry() as PaxGlobalExtendedAttributesTarEntry;
Console.WriteLine(readGea1.GlobalExtendedAttributes["attr1"]); // "I'm extended attribute 1!"
// Multiple calls of GetNextEntry for the other entries, until reaching the next GEA entry
PaxGlobalExtendedAttributesTarEntry readGea2 = reader.GetNextEntry() as PaxGlobalExtendedAttributesTarEntry;
Console.WriteLine(readGea2.GlobalExtendedAttributes["attr2"]); // "I'm extended attribute 2!"
Alternative Designs
Reuse PaxTarEntry
We could avoid adding a new class to represent a GEA entry, and instead reuse the existing PaxTarEntry
class. But there's a problem: it would be confusing to create a TarEntryType.GlobalExtendedAttributes
entry, because the constructor expects an entryName
argument, and in a GEA entry, the name is created internally by TarWriter
: the name depends on the platforms $TmpDir
, on the process ID, and the current GEA entry number, which is stored internally by TarWriter
.
Having the entryName
argument isn't really necessary, since we expose it in its own property with a getter and a setter. So there's a clean way of reusing PaxTarEntry
if we do the following modifications:
- None of the constructors should take a
entryName
as an argument, and the user should set it manually later. If the user attempts to pass an entry without a name toTarWriter.WriteEntry
, an exception is thrown, except if the entry is ofTarEntryType.GlobalExtendedAttributes
, becauseTarWriter
is in charge of writing the name. - The
PaxTarEntry.Name
field would throw if the user attempts to set it on a GEA entry.
public sealed partial class GnuTarEntry : System.Formats.Tar.PosixTarEntry
{
- public GnuTarEntry(System.Formats.Tar.TarEntryType entryType, string entryName) { }
+ public GnuTarEntry(System.Formats.Tar.TarEntryType entryType) { }
}
public sealed partial class PaxTarEntry : System.Formats.Tar.PosixTarEntry
{
- public PaxTarEntry(System.Formats.Tar.TarEntryType entryType, string entryName) { }
+ public PaxTarEntry(System.Formats.Tar.TarEntryType entryType) { }
- public PaxTarEntry(System.Formats.Tar.TarEntryType entryType, string entryName, System.Collections.Generic.IEnumerable<System.Collections.Generic.KeyValuePair<string, string>> extendedAttributes) { }
+ public PaxTarEntry(System.Formats.Tar.TarEntryType entryType, System.Collections.Generic.IEnumerable<System.Collections.Generic.KeyValuePair<string, string>> extendedAttributes) { }
}
public sealed partial class UstarTarEntry : System.Formats.Tar.PosixTarEntry
{
- public UstarTarEntry(System.Formats.Tar.TarEntryType entryType, string entryName) { }
+ public UstarTarEntry(System.Formats.Tar.TarEntryType entryType) { }
}
public sealed partial class V7TarEntry : System.Formats.Tar.TarEntry
{
- public V7TarEntry(System.Formats.Tar.TarEntryType entryType, string entryName) { }
+ public V7TarEntry(System.Formats.Tar.TarEntryType entryType) { }
}
Discarded design
I also considered avoiding adding a new entry type, and instead write the dictionary directly:
Dictionary<string, string> attributes = new();
attributes["hello"] = "world";
writer.WriteGlobalExtendedAttributes(attributes);
But then how should we give the user the GEA entries in a reader? GetNextEntry
returns a TarEntry
. Having a dictionary of dictionaries that could hold the GEA dictionaries would be too messy and confusing.
Risks
Low. The APIs are new in 7.0, we are on time to improve them.