Skip to content

[API Proposal]: Allow multiple Global Extended Attributes entries in Tar archives with PAX format #69935

@carlossanlop

Description

@carlossanlop

Background and motivation

In the initial Tar proposal, the assumption was that the PAX format allowed one special entry known as Global Extended Attributes (GEA from now on), at the beginning of the archive, to allow overriding the extended attributes all the subsequent entries in the archive. This was incorrect.

The FreeBSD tar spec does not explain much about the GEA entries, except that they exist.

Both the OpenGroup pax manual and the GNU tar manual explain the format of the name in the GEA entry, which is $TMPDIR/GlobalHead.%p.%n, and describes the suffix number as:

  %n                 An integer that represents the
                        sequence number of the global extended
                        header record in the archive, starting
                        at 1.

But there is no mention of when to expect more than 1 entry and what they mean.

Then I recently found this spec: IBM z/OS 2.5.0 pax manual, which has a clear and detailed description of how the GEA entry should work:

g
Represents global extended header records for the following files in the archive.
[...]
Each value shall affect all subsequent files that do not override that value in their own extended header record and until another global extended header record is reached that provides another value for the same field.

API Proposal

New:

namespace System.Formats.Tar
{
    public sealed partial class PaxGlobalExtendedAttributesTarEntry : System.Formats.Tar.PosixTarEntry
    {
        public PaxGlobalExtendedAttributesTarEntry(System.Collections.Generic.IEnumerable<System.Collections.Generic.KeyValuePair<string, string>> globalExtendedAttributes) { }
        public System.Collections.Generic.IReadOnlyDictionary<string, string> GlobalExtendedAttributes { get { throw null; } }
    }

Remove:

    public sealed partial class TarReader : System.IDisposable
    {
-        public System.Collections.Generic.IReadOnlyDictionary<string, string>? GlobalExtendedAttributes { get { throw null; } }
    }

Modify:

    public sealed partial class TarWriter : System.IDisposable
    {
-        public TarWriter(System.IO.Stream archiveStream, System.Collections.Generic.IEnumerable<System.Collections.Generic.KeyValuePair<string, string>>? globalExtendedAttributes = null, bool leaveOpen = false) { }
+        public TarWriter(System.IO.Stream archiveStream, bool leaveOpen = false) { }
    }

API Usage

Before

We could only add one GEA entry:

// Write
Dictionary<string, string> attributes = new();
attributes["SomeAttributeName"] = "I'm an extended attribute!";

using MemoryStream ms = new();
using TarWriter writer = new(ms, attributes, leaveOpen: true)
{
   // Add some more entries if desired
}

ms.Position = 0;
using TarReader reader = new(ms, leaveOpen: false);
reader.GetNextEntry(); // Advance the reader to detect format and GEA entry
// Access the values of the single GEA
Console.WriteLine(reader.GlobalExtendedAttributes["SomeAttributeName"]); // "I'm an extended attribute!"

After

With this proposed change, we could now add multiple GEA entries:

// Write
Dictionary<string, string> attributes1 = new();
attributes["attr1"] = "I'm extended attribute 1!";
PaxGlobalExtendedAttributesTarEntry gea1 = new(attributes1);

Dictionary<string, string> attributes2 = new();
attributes["attr2"] = "I'm extended attribute 2!";
PaxGlobalExtendedAttributesTarEntry gea2 = new(attributes2);

using MemoryStream ms = new();
using TarWriter writer = new(ms, leaveOpen: true) // Default format is PAX for this constructor
{
  writer.WriteEntry(gea1);
  // Add some more entries if desired, they'll be affected by gea1
  writer.WriteEntry(gea2);
  // Add some more entries if desired, they'll be affected by gea2
}

ms.Position = 0;
using TarReader reader = new(ms, leaveOpen: false);
PaxGlobalExtendedAttributesTarEntry readGea1 = reader.GetNextEntry() as PaxGlobalExtendedAttributesTarEntry;
Console.WriteLine(readGea1.GlobalExtendedAttributes["attr1"]); // "I'm extended attribute 1!"

// Multiple calls of GetNextEntry for the other entries, until reaching the next GEA entry

PaxGlobalExtendedAttributesTarEntry readGea2 = reader.GetNextEntry() as PaxGlobalExtendedAttributesTarEntry;
Console.WriteLine(readGea2.GlobalExtendedAttributes["attr2"]); // "I'm extended attribute 2!"

Alternative Designs

Reuse PaxTarEntry

We could avoid adding a new class to represent a GEA entry, and instead reuse the existing PaxTarEntry class. But there's a problem: it would be confusing to create a TarEntryType.GlobalExtendedAttributes entry, because the constructor expects an entryName argument, and in a GEA entry, the name is created internally by TarWriter: the name depends on the platforms $TmpDir, on the process ID, and the current GEA entry number, which is stored internally by TarWriter.

Having the entryName argument isn't really necessary, since we expose it in its own property with a getter and a setter. So there's a clean way of reusing PaxTarEntry if we do the following modifications:

  • None of the constructors should take a entryName as an argument, and the user should set it manually later. If the user attempts to pass an entry without a name to TarWriter.WriteEntry, an exception is thrown, except if the entry is of TarEntryType.GlobalExtendedAttributes, because TarWriter is in charge of writing the name.
  • The PaxTarEntry.Name field would throw if the user attempts to set it on a GEA entry.
   public sealed partial class GnuTarEntry : System.Formats.Tar.PosixTarEntry
    {
-        public GnuTarEntry(System.Formats.Tar.TarEntryType entryType, string entryName) { }
+        public GnuTarEntry(System.Formats.Tar.TarEntryType entryType) { }
    }
    public sealed partial class PaxTarEntry : System.Formats.Tar.PosixTarEntry
    {
-        public PaxTarEntry(System.Formats.Tar.TarEntryType entryType, string entryName) { }
+        public PaxTarEntry(System.Formats.Tar.TarEntryType entryType) { }
-        public PaxTarEntry(System.Formats.Tar.TarEntryType entryType, string entryName, System.Collections.Generic.IEnumerable<System.Collections.Generic.KeyValuePair<string, string>> extendedAttributes) { }
+        public PaxTarEntry(System.Formats.Tar.TarEntryType entryType, System.Collections.Generic.IEnumerable<System.Collections.Generic.KeyValuePair<string, string>> extendedAttributes) { }
    }
    public sealed partial class UstarTarEntry : System.Formats.Tar.PosixTarEntry
    {
-       public UstarTarEntry(System.Formats.Tar.TarEntryType entryType, string entryName) { }
+       public UstarTarEntry(System.Formats.Tar.TarEntryType entryType) { }
    }
    public sealed partial class V7TarEntry : System.Formats.Tar.TarEntry
    {
-       public V7TarEntry(System.Formats.Tar.TarEntryType entryType, string entryName) { }
+       public V7TarEntry(System.Formats.Tar.TarEntryType entryType) { }
    }

Discarded design

I also considered avoiding adding a new entry type, and instead write the dictionary directly:

Dictionary<string, string> attributes = new();
attributes["hello"] = "world";
writer.WriteGlobalExtendedAttributes(attributes);

But then how should we give the user the GEA entries in a reader? GetNextEntry returns a TarEntry. Having a dictionary of dictionaries that could hold the GEA dictionaries would be too messy and confusing.

Risks

Low. The APIs are new in 7.0, we are on time to improve them.

@bartonjs @jeffhandley @adamsitnik @jozkee @tmds

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions