Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build time appears to be worse than linear in number of pages #75

Closed
jfbu opened this issue May 15, 2023 · 4 comments
Closed

Build time appears to be worse than linear in number of pages #75

jfbu opened this issue May 15, 2023 · 4 comments

Comments

@jfbu
Copy link

jfbu commented May 15, 2023

I am attaching a file, which models a dtx. It includes the source code of mathastext.sty in the implementation part, actually 2 copies of it. The whole thing is divided into 6 big macrocode chunks each of about 840 lines (roughly 17 pages in output). The following command allows to compare compile times when including only 1, or 2, etc..., up to 6 chunks. Sorry syntax is unix only.

for totalchunks in 1 2 3 4 5 6; do rm -f test_tagging_timings.{out,hd,aux,pdf}; echo "$totalchunks chunks will be included"; time lualatex-dev -interaction=batchmode '\def\dotag{'$totalchunks'}\input test_tagging_timings.tex'; echo ""; done

Here is output

1 chunks will be included
This is LuaHBTeX, Version 1.17.0 (TeX Live 2023) 
 restricted system commands enabled.

real	0m15.449s
user	0m15.153s
sys	0m0.286s

2 chunks will be included
This is LuaHBTeX, Version 1.17.0 (TeX Live 2023) 
 restricted system commands enabled.

real	0m33.178s
user	0m32.832s
sys	0m0.331s

3 chunks will be included
This is LuaHBTeX, Version 1.17.0 (TeX Live 2023) 
 restricted system commands enabled.

real	0m58.310s
user	0m57.917s
sys	0m0.369s

4 chunks will be included
This is LuaHBTeX, Version 1.17.0 (TeX Live 2023) 
 restricted system commands enabled.

real	1m30.088s
user	1m29.672s
sys	0m0.397s

5 chunks will be included
This is LuaHBTeX, Version 1.17.0 (TeX Live 2023) 
 restricted system commands enabled.

real	2m18.392s
user	2m17.914s
sys	0m0.446s

6 chunks will be included
This is LuaHBTeX, Version 1.17.0 (TeX Live 2023) 
 restricted system commands enabled.

real	3m27.865s
user	3m27.217s
sys	0m0.572s

Successive deltas are about 15, 17, 25, 31, 48, 69 seconds, next deltas are 2, 8, 6, 17, 21, so it looks as if build time is at least quadratic in number of pages. Here the total document ends up having 102 pages and in previous context where I observed this slowing down it becomes yet slower beyond that number of pages. I will after posting this extend to 9 chunks.

For context the whole document with 6 chunks needs about 2.5s with no tagging

 time lualatex-dev -interaction=batchmode test_tagging_timings.tex
This is LuaHBTeX, Version 1.17.0 (TeX Live 2023) 
 restricted system commands enabled.

real	0m2.699s
user	0m2.471s
sys	0m0.220s

I added dummy .txt extension to file else I could not attach it here. (I hope upload of .txt did not change contents).

test_tagging_timings.tex.txt

I am using lualatex as I understand else tagpdf uses label/ref system which may be source of other bottleneck.

EDIT: sorry I forgot to use a4paper or like option to document class, but this is peripheral. I launched extended run up to 9 chunks and I am currently waiting for the process to complete.

EDIT: I am doing this on a single-CPU old computer from 2011-2012. Your timings should be faster.

@jfbu
Copy link
Author

jfbu commented May 15, 2023

Posting here the file which allows up to 9 big macrocode chunks:

test_tagging_timings.tex.txt

@jfbu
Copy link
Author

jfbu commented May 15, 2023

for totalchunks in 1 2 3 4 5 6 7 8 9; do rm -f test_tagging_timings.{out,hd,aux,pdf}; echo "$totalchunks chunks will be included"; time lualatex-dev -interaction=batchmode '\def\dotag{'$totalchunks'}\input test_tagging_timings.tex'; echo ""; done

output:

1 chunks will be included
This is LuaHBTeX, Version 1.17.0 (TeX Live 2023) 
 restricted system commands enabled.

real	0m15.375s
user	0m15.078s
sys	0m0.289s

2 chunks will be included
This is LuaHBTeX, Version 1.17.0 (TeX Live 2023) 
 restricted system commands enabled.

real	0m35.364s
user	0m34.952s
sys	0m0.381s

3 chunks will be included
This is LuaHBTeX, Version 1.17.0 (TeX Live 2023) 
 restricted system commands enabled.

real	0m58.485s
user	0m58.082s
sys	0m0.379s

4 chunks will be included
This is LuaHBTeX, Version 1.17.0 (TeX Live 2023) 
 restricted system commands enabled.

real	1m34.108s
user	1m33.627s
sys	0m0.434s

5 chunks will be included
This is LuaHBTeX, Version 1.17.0 (TeX Live 2023) 
 restricted system commands enabled.

real	2m20.476s
user	2m19.979s
sys	0m0.455s

6 chunks will be included
This is LuaHBTeX, Version 1.17.0 (TeX Live 2023) 
 restricted system commands enabled.

real	3m26.085s
user	3m25.463s
sys	0m0.553s

7 chunks will be included
This is LuaHBTeX, Version 1.17.0 (TeX Live 2023) 
 restricted system commands enabled.

real	5m1.586s
user	5m0.595s
sys	0m0.773s

8 chunks will be included
This is LuaHBTeX, Version 1.17.0 (TeX Live 2023) 
 restricted system commands enabled.

real	6m34.787s
user	6m33.941s
sys	0m0.739s

9 chunks will be included
This is LuaHBTeX, Version 1.17.0 (TeX Live 2023) 
 restricted system commands enabled.

real	8m55.031s
user	8m53.833s
sys	0m1.019s

Deltas (15), 20, 23, 35, 46, 65, 95, 93, 140 seconds.

Second deltas 5, 3, 12, 11, 19, 30, -2, 47 (seconds).

Measurements are not easy on a multi-task computer which checks internet.

The timings do not allow to separate the delay at end of run (hidden from user experience here by using batchmode) right before the final output of lualatex to console.

Producing the 152 pages document requires about 9 minutes compared to 15 seconds for 17 pages. If one takes the delta between 1 and 2 or 1 and 9 chunks to take out common set-up cost, we have 20seconds versus about 520seconds, hence a build time multiplied by 26. So worth than linear for sure, but perhaps sub-quadratic, I will edit my title to be more cautious.

@jfbu jfbu changed the title Build time appears to be at least quadratic perhaps even at least cubic in number of pages Build time appears to be worth than linear in number of pages May 15, 2023
@u-fischer
Copy link
Member

Well looks as if the main problem is that I forgot to remove an unused prop and every structure still added something to it (and that is slow if the prop gets longer and longer).

@jfbu jfbu changed the title Build time appears to be worth than linear in number of pages Build time appears to be worse than linear in number of pages May 15, 2023
u-fischer added a commit that referenced this issue May 15, 2023
@jfbu
Copy link
Author

jfbu commented May 18, 2023

I updated with 0.98g and redid the timings, here are the results for info:

0.98f 0.98g notag
1 0m15.078s 0m13.221s 0m1.411s
2 0m34.952s 0m25.421s 0m1.628s
3 0m58.082s 0m38.200s 0m1.823s
4 1m33.627s 0m51.697s 0m2.017s
5 2m19.979s 1m6.258s 0m2.238s
6 3m25.463s 1m20.530s 0m2.454s
7 5m0.595s 1m36.242s 0m2.649s
8 6m33.941s 1m51.758s 0m2.860s
9 8m53.833s 2m10.137s 0m3.069s

I took the "user" time out of output from my unix time function, and the timings are not to be taken seriously although for the 0.98g update I tried to disconnect from internet and avoid screen saver trigger etc..., but I had to repeat once or twice the longer tests to avoid disparities.

The 0.98g results are very consistent with linear or quasi-linear growth (the longest test was done twice and I do not know if the slight slowing down means anything at all; besides the successive chunks do not have exactly the same size, each about 840 lines).

For people who see these timings, let me recall each chunk occupies about 840 lines of source macro code so gives as many one-line paragraphs for tagging so this is many many paragraphs and real-life documents have much less paragraphs per page. (here: one line = one paragraph).

edit: perhaps the codeline numbering adds one further paragraph to each line? not checked.

The "notag" results confirm the amount of added work done by tagging. On this computer, about a bit less than 0.017s per paragraph (for a total of a bit less than 7570 paragraphs here), or 5 hundreds of a second per 3 paragraphs, or about 60 paragraphs per second.

This is on an old single CPU 2.8Ghz computer so most people reading this will have a faster one. (the ratio 60/2.8 is about 21.5 so maybe if you multiply your CPU frequency by this factor you get an estimate of how many paragraphs per second can be handled). [but see edit above, not sure if my count of paragraphs is ok]

memo: this is done with lualatex-dev; with pdflatex-dev the notag builds are faster and the tagging builds slower.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants