Skip to content

Getting started

Giorgio Bianchini edited this page Jul 26, 2022 · 1 revision

The first step when using MuPDFCore is to create a MuPDFCore.MuPDFContext object that is used internally by the MuPDF library to store various things:

    MuPDFContext context = new MuPDFContext();

This object is IDisposable, therefore you should always call the Dispose() method on it once you are done with it (or, better yet, wrap it in a using directive). In most instances, you will only need one instance of MuPDFContext for your whole application.

Amongst other things, MuPDF uses this context to store a cache of "assets" (e.g. images or fonts) that have been used while rendering documents and that may be needed in future. This requires some memory: by default, the maximum size of this cache store is 256MB; however, if you want to restrict how much memory can be used, you can alter this by providing a long value to constructor, indicating the size in bites for the store. A value of 0 means that the store can grow up to an unlimited size. Furthermore, you can clear the cache completely by using the MuPDFContext.ClearCache method, or partially by using the MuPDFContext.ShrinkCache method.

Once you have obtained a MuPDFContext, you can use it to open a MuPDFDocument. A document can be opened from a file on disk:

    MuPDFDocument document = new MuPDFDocument(context, "path/to/file");

Or from a byte[] array (in this case, you will have to specify the format of the document):

    byte[] data;

    ...

    MuPDFDocument document = new MuPDFDocument(context, data, InputFileTypes.PDF);

Or from a MemoryStream (in this case too, you will have to specify the format of the document):

    MemoryStream stream;
    
    ...
    
    MuPDFDocument document = new MuPDFDocument(context, ref stream, InputFileTypes.PDF);

The MemoryStream is passed with the ref keyword to indicate that the MuPDFDocument will take care of appropriately disposing it once it finishes using it.

A MuPDFDocument is also IDisposable and should be properly disposed of to avoid memory leaks.

Important note: the constructor taking a byte[] and the one taking a MemoryStream will not copy the data bytes before sending them to the native MuPDF library functions. Rather, they will pin them in place. This is a bad thing because it will mess up with the Garbage Collector's management of memory. Therefore, this is only suitable for short-lived objects. If you need to initialise a long-lived document object from memory, you should first copy the data to unmanaged memory and then use one of the constructors that take an IntPtr parameter, e.g.:

    byte[] data;

    ...
    
    //Allocate enough unmanaged memory
    IntPtr ptr = Marshal.AllocHGlobal(data.Length);
    
    //Copy the byte array to unmanaged memory
    Marshal.Copy(data, 0, ptr, data.Length);

    //Wrap the pointer in an IDisposable
    IDisposable dispIntPtr = new DisposableIntPtr(ptr);

    //Create the document
    MuPDFDocument document = new MuPDFDocument(ctx, ptr, data.Length, InputFileTypes.PDF, ref dispIntPtr);

The DisposableIntPtr class is a wrapper around a pointer that calls Marshal.FreeHGlobal on it once it is disposed. Passing it as the final optional parameter of MuPDFDocument constructor (again by reference, to indicate that the document takes ownership of the object) makes sure that the memory is properly freed once the document is disposed.

After having obtained a document, you can do many things with it: for example, you can render a page and save the results to a file on disk, or you can collect multiple pages and combine them in a new document. Code to do this can be found in the Program.cs file of the Demo project.

Furthermore, you can render a page directly to memory:

    byte[] pixelData = document.Render(0, 1, PixelFormats.RGBA);

This method renders page 0 (i.e. the first page of the document) at a 1x resolution (1pt in the document is equivalent to 1px in the image), preserving alpha (transparency) information, and returns the image as an array of the bytes that constitute the pixel data (four bytes per pixel). A variation of this method allows you to supply a rectangular region of the page that you would like to render, rather than the whole page.

Alternatively, if you already know where the image data should be put (e.g. because you are using some kind of graphics library that lets you manipulate the pixel data of its images), you can use the methods that take an IntPtr destination:

    IntPtr destination;

    ...

    document.Render(0, 1, PixelFormats.RGBA, destination);

In this case, you have to make sure that there is enough memory to hold the resulting image! Otherwise, an AccessViolationException will occur and your program will usually fail catastrophically. Since it may sometimes be hard to determine how much memory a particular image will need (especially because of subtle differences in the rounding routines, which can cause images to be 1px larger or shorter than expected), the GetRenderedSize method is provided, which returns the number of bytes that will be needed to render a certain page. For example:

    //Get the number of bytes that will be necessary to hold the rendered page at the given resolution.
    int sizeInBytes = document.GetRenderedSize(0, 1, PixelFormats.RGBA);

    //Allocate an appropriate amount of memory.
    IntPtr destination = Marshal.AllocHGlobal(sizeInBytes);

    //Again, we use a DisposableIntPtr to make sure that we are freeing the memory when we are done with it.
    using (DisposableIntPtr holder = new DisposableIntPtr(destination))
    {
        //Make sure that all the parameters match those of the call to GetRenderedSize, or the size of the
        //resulting image may be different than expected! Even a translation of 1px could have catastrophic
        //consequences.
        document.Render(0, 1, PixelFormats.RGBA, destination);
    }

Finally, none of these methods are inherently thread-safe! E.g. you cannot render multiple pages of the same document (nor multiple regions of a single page) by simply performing multiple calls to MuPDFDocument.Render in parallel.