-
Notifications
You must be signed in to change notification settings - Fork 189
Description
While working on incremental updates (see #112) and adding support for deleted objects, i encountered a behavior that may or may not be intended.
When i delete a page from a document, it gets removed from the pages-array as expected.
When checking the output-file, I observed, that only the page-reference was removed from the pages-array, the page itself and all referenced objects (i.e. content-streams) are still present in the file.
If I understand correctly, the method PdfCrossReferenceTable.Compact()
is intended to clean up these objects, is that true ?
At least it would clean up (i.e. remove) the page and the objects referenced by that page, if the pages-array were the only place where the page is referenced.
But a page could be referenced from multiple locations, some places that come to mind:
- Outlines
- Named Destinations
- Link-Annotations (and Annotations in general, /P entry)
- GoTo Actions
In my case, the page, that was not deleted was referenced (at least) by 3 different outlines.
Simple test-case (add it to PdfSharp.Tests.IO.WriterTests
):
[Fact]
public void Deleted_Page_Not_Really_Deleted()
{
var sourceFile = IOUtility.GetAssetsPath("archives/grammar-by-example/GBE/ReferencePDFs/WPF 1.31/Table-Layout.pdf")!;
var targetFile = Path.Combine(Path.GetTempPath(), "AA-Original.pdf");
File.Copy(sourceFile, targetFile, true);
using var fs = File.Open(targetFile, FileMode.Open, FileAccess.Read);
using var doc = PdfReader.Open(fs, PdfDocumentOpenMode.Modify);
doc.Pages.RemoveAt(0);
targetFile = Path.Combine(Path.GetTempPath(), "AA-Deleted.pdf");
doc.Save(targetFile);
}
Open the file AA-Deleted.pdf
and observe, the page and it's contents are still present.
Question:
Is this the intended behavior ?
Are there other CleanUp-methods I'm not aware of ?
IMHO the methods to remove pages are "high level" methods and the library should take care of the "low level" stuff, including cleaning up after itself to maintain the integrity of the document.
I do understand however, that this might not be an easy issue to solve.
In theory, the library has to scan the whole document to find references to deleted pages and then has to decide based on the context (where the reference is found), how to deal with it.
- delete Outlines and re-link the remaining ones
- delete Annotations
- etc...