Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question / Comment: How to view insertPDF progress #640

Closed
sant527 opened this issue Sep 3, 2020 · 11 comments
Closed

Question / Comment: How to view insertPDF progress #640

sant527 opened this issue Sep 3, 2020 · 11 comments
Assignees
Labels

Comments

@sant527
Copy link

sant527 commented Sep 3, 2020

I saw the code inside insertPDF in fitz.py it uses

_fitz.Document_insertPDF(self, docsrc, from_page, to_page, start_at, rotate, links, annots)

I have very large documents of 100,000 pages. I want to see somprogress if possible after inserting every 1000/5000 pages

How can i send some logging messages to the python code from _fitz.Document_insertPDF function after every 1000/5000 pages of insertion.

Anything i can change in the source code and then recompile it.

Else the option is keep inserting 5000 pages at a time by mentioning the start and and page

@JorjMcKie
Copy link
Collaborator

This a C function, so that one would have to be modified.
Alternatively segment your input in appropriate pieces as you indicate.

@sant527
Copy link
Author

sant527 commented Sep 3, 2020

where can i find the c function file.

@JorjMcKie
Copy link
Collaborator

JorjMcKie commented Sep 3, 2020

You have to modify fitz_wrap.c. Look for function JM_merge_range.
Good luck!
After modifaction: do not forget you also have to install MuPDF, before you can build PyMuPDF.

@JorjMcKie
Copy link
Collaborator

What is your system config?
I am considering to make that mod and send you a pre-version hotfix 😊.

@sant527
Copy link
Author

sant527 commented Sep 4, 2020

I am using archlinux.

I tried to change the JM_merge_range in fitz_wrap.c but its not showing the printf output

void JM_merge_range(fz_context *ctx, pdf_document *doc_des, pdf_document *doc_src, int spage, int epage, int apage, int rotate, int links, int annots)
{
    int page, afterpage;
    pdf_graft_map *graft_map;
    afterpage = apage;
    graft_map = pdf_new_graft_map(ctx, doc_des);

    fz_try(ctx) {
        if (spage < epage) {
            for (page = spage; page <= epage; page++, afterpage++)
                page_merge(ctx, doc_des, doc_src, page, afterpage, rotate, links, annots, graft_map);
                // CODE ADDED
                if( page % 500 == 0){
                  printf("CURRENT PAGES %d / %d",page,epage);
                  fflush(stdout);
                }
        } else {
            for (page = spage; page >= epage; page--, afterpage++)
                page_merge(ctx, doc_des, doc_src, page, afterpage, rotate, links, annots, graft_map);
                // CODE ADDED
                if( page % 500 == 0){
                  printf("CURRENT PAGES %d / %d",page,epage);
                  fflush(stdout);
                }
        }
    }

This is from the PyMupdf-1.17.5 (https://files.pythonhosted.org/packages/source/P/PyMuPDF/PyMuPDF-1.17.5.tar.gz)

Now i run the below python script

import fitz
doc0 = fitz.open("file1.pdf")  # open file 1 (100,000 pages)
doc1 = fitz.open("file2.pdf")  # open file 2
doc0.insertPDF(doc1,links=False,annots =False)
doc0.save()

I am expecting the printf command to be shown from the c library. Why its not working

@JorjMcKie
Copy link
Collaborator

You found the right place to modify!
I already have implemented a similar modification, which will be part of the next version. Here is the respective code snippet. I used the Python output to sys.stdout - to make sure it also shows in environments like IDLE:

void JM_merge_range(fz_context *ctx, pdf_document *doc_des, pdf_document *doc_src, int spage, int epage, int apage, int rotate, int links, int annots, int show_progress)
{
    int page, afterpage;
    pdf_graft_map *graft_map;
    afterpage = apage;
    graft_map = pdf_new_graft_map(ctx, doc_des);
    int counter = 0;  // copied page counter
    int total = fz_absi(epage - spage) + 1;  // total pages to copy

    fz_try(ctx) {
        if (spage < epage) {
            for (page = spage; page <= epage; page++, afterpage++) {
                page_merge(ctx, doc_des, doc_src, page, afterpage, rotate, links, annots, graft_map);
                counter ++;
                if (show_progress > 0 && counter % show_progress == 0) {
                    PySys_WriteStdout("Inserted %i of %i pages.\n", counter, total);
                }
            }
        } else {
            for (page = spage; page >= epage; page--, afterpage++) {
                page_merge(ctx, doc_des, doc_src, page, afterpage, rotate, links, annots, graft_map);
                counter ++;
                if (show_progress > 0 && counter % show_progress == 0) {
                    PySys_WriteStdout("Inserted %i of %i pages.\n", counter, total);
                }
            }
        }
    }

    fz_always(ctx) {
        pdf_drop_graft_map(ctx, graft_map);
    }
    fz_catch(ctx) {
        fz_rethrow(ctx);
    }
}

@sant527
Copy link
Author

sant527 commented Sep 4, 2020

Thank you. Also can i show the src filename and also destination filename

eg:

doc0.insertPDF(doc1,links=False,annots =False)

With the c code changes no it looks like this

Inserted 1348 of 8138 pages.

Can we show file name here like

FROM: doc1.name
DES: doc0.name
Inserted 1348 of 8138 pages.

@JorjMcKie
Copy link
Collaborator

The next version will also show file information.
For now you can do a similar hack in fitz.py, which is generated in parallel to fitz_wrap.c.

  • look for def insertPDF(self, ...)

grafik

  • at the line marked with a red dot insert a print statement to your liking. The two file names are in docsrc.name (source file) and self.name (target file).
  • both file names may be empty strings, in which case they exist in memory.

@sant527
Copy link
Author

sant527 commented Sep 5, 2020

But this will print only once. and will be lost in the print statements.

in fitz_wrap.c

void JM_merge_range(fz_context *ctx, pdf_document *doc_des, pdf_document *doc_src, int spage, int epage, int apage, int rotate, int links, int annots, int show_progress)

I tried

doc_dec->name and doc_src->name and when i try to build it gives error.

how to show in this function.

@JorjMcKie
Copy link
Collaborator

how to show in this function.

There is no way to do that currently.

@JorjMcKie
Copy link
Collaborator

New version 1.17.7 is currently being uploaded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants