Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSS and Images not loading(in Docker) #66

Closed
vitosaver opened this issue Mar 17, 2022 · 28 comments
Closed

CSS and Images not loading(in Docker) #66

vitosaver opened this issue Mar 17, 2022 · 28 comments
Assignees

Comments

@vitosaver
Copy link

vitosaver commented Mar 17, 2022

Hi,

First of all thanks for the great library.

When I generate PDF CSS and images are not loaded correctly.

I am using absolute URLs and when I return pure HTML CSS and images load correctly.

Here is the code snippet that I am using:

using (Converter converter = new Converter(logger: _logger))
using (MemoryStream stream = new MemoryStream())
{
    converter.AddChromeArgument("--no-sandbox");

    ChromeHtmlToPdfLib.Enums.PaperFormat paperFormat = ChromeHtmlToPdfLib.Enums.PaperFormat.A4;
    if (Format.Equals("A5"))
    {
        paperFormat = ChromeHtmlToPdfLib.Enums.PaperFormat.A5;
    }
    converter.LogNetworkTraffic = true;

    converter.ConvertToPdf(html, stream, new ChromeHtmlToPdfLib.Settings.PageSettings(paperFormat)
    {
        MarginBottom = 0,
        MarginLeft = 0,
        MarginRight = 0,
        MarginTop = 0
    });

    string fileName = "Test.pdf";

    Response.Headers.Add("content-disposition", $"inline; filename=\"{fileName}\"");
    return File(stream.ToArray(), "application/pdf");
}

In terms of docker configuration, I added chrome install from the issue linked in readme -> #39

# Suppress an apt-key warning about standard out not being a terminal. Use in this script is safe.
ENV APT_KEY_DONT_WARN_ON_DANGEROUS_USAGE=DontWarn

# export DEBIAN_FRONTEND="noninteractive"
ENV DEBIAN_FRONTEND noninteractive

# Install deps + add Chrome Stable + purge all the things
RUN apt-get update && apt-get install -y \
	apt-transport-https \
	ca-certificates \
	curl \
	gnupg \
	--no-install-recommends \
	&& curl -sSL https://dl.google.com/linux/linux_signing_key.pub | apt-key add - \
	&& echo "deb [arch=amd64] https://dl.google.com/linux/chrome/deb/ stable main" > /etc/apt/sources.list.d/google-chrome.list \
	&& apt-get update && apt-get install -y \
	google-chrome-stable \
	--no-install-recommends \
	&& apt-get purge --auto-remove -y curl gnupg \
	&& rm -rf /var/lib/apt/lists/*

# Chrome Driver
RUN apt-get update && \
    apt-get install -y unzip && \
    wget https://chromedriver.storage.googleapis.com/2.31/chromedriver_linux64.zip && \
    unzip chromedriver_linux64.zip && \
    mv chromedriver /usr/bin && rm -f chromedriver_linux64.zip

I suspect that PDF is generated before all resources are loaded but I am not really sure.

I am using NuGet package version 2.5.25
I tested also in IIS Express without converter.AddChromeArgument("--no-sandbox"); but I get same result

Please let me know if you need more info about my issue

UPDATE:
I was testing it a bit more and I found the following things

  • It works when you put waitForWindowsStatus to true but only on IIS Express
  • If I put waitForWindowsStatus to true in development and with Docker for Desktop it never works
    • I believe this has something to do with concurrent connections as I am requesting other resources on the same server while there is already an open connection
  • If I put waitForWindowsStatus to true in production in Kubernetes it works but it always goes to WaitForWindowsStatusTimeout and then sometimes it loads images and CSS but sometimes it doesn't most often on the second request everything loads as expected, like I can put timeout to 2000 ms and it works sometimes but if I put 120000ms it will wait 2 minutes and then show the file as normal, like windows.status never goes to true but it should
@Sicos1977
Copy link
Owner

Sicos1977 commented Mar 17, 2022

Can you check if you get these lines in your logging;

  • The 'Page.lifecycleEvent' with param name 'DomContentLoaded' has been fired, the dom content is now loaded and parsed, waiting for stylesheets, images and sub frames to finish loading
  • The 'Page.frameNavigated' event has been fired, waiting for the 'Page.lifecycleEvent' with name 'networkIdle'
  • The 'Page.lifecycleEvent' event with name 'networkIdle' has been fired, the page is now fully loaded

You should get them in these order and that way you know if the full page has been loaded

@vitosaver
Copy link
Author

Not even close, do I maybe missing logging level or something?

Web.Pages.Order.PdfModel: Information: Resetting Chrome arguments to default
Web.Pages.Order.PdfModel: Information: Adding Chrome argument '--headless'
Web.Pages.Order.PdfModel: Information: Adding Chrome argument '--disable-gpu'
Web.Pages.Order.PdfModel: Information: Adding Chrome argument '--hide-scrollbars'
Web.Pages.Order.PdfModel: Information: Adding Chrome argument '--mute-audio'
Web.Pages.Order.PdfModel: Information: Adding Chrome argument '--disable-background-networking'
Web.Pages.Order.PdfModel: Information: Adding Chrome argument '--disable-background-timer-throttling'
Web.Pages.Order.PdfModel: Information: Adding Chrome argument '--disable-default-apps'
Web.Pages.Order.PdfModel: Information: Adding Chrome argument '--disable-extensions'
Web.Pages.Order.PdfModel: Information: Adding Chrome argument '--disable-hang-monitor'
Web.Pages.Order.PdfModel: Information: Adding Chrome argument '--disable-prompt-on-repost'
Web.Pages.Order.PdfModel: Information: Adding Chrome argument '--disable-sync'
Web.Pages.Order.PdfModel: Information: Adding Chrome argument '--disable-translate'
Web.Pages.Order.PdfModel: Information: Adding Chrome argument '--metrics-recording-only'
Web.Pages.Order.PdfModel: Information: Adding Chrome argument '--no-first-run'
Web.Pages.Order.PdfModel: Information: Adding Chrome argument '--disable-crash-reporter'
Web.Pages.Order.PdfModel: Information: Adding Chrome argument '--remote-debugging-port="0"'
Web.Pages.Order.PdfModel: Information: Adding Chrome argument '--window-size="1366,768"'
Web.Pages.Order.PdfModel: Information: Adding Chrome argument '--no-sandbox'
Web.Pages.Order.PdfModel: Information: Starting Chrome from location '/usr/bin/google-chrome' with working directory '/usr/bin'
Web.Pages.Order.PdfModel: Information: "/usr/bin/google-chrome" --headless --disable-gpu --hide-scrollbars --mute-audio --disable-background-networking --disable-background-timer-throttling --disable-default-apps --disable-extensions --disable-hang-monitor --disable-prompt-on-repost --disable-sync --disable-translate --metrics-recording-only --no-first-run --disable-crash-reporter --remote-debugging-port="0" --window-size="1366,768" --no-sandbox
Web.Pages.Order.PdfModel: Information: Chrome process started
Web.Pages.Order.PdfModel: Information: Received Chrome error data: 'DevTools listening on ws://127.0.0.1:45725/devtools/browser/6a2ff928-8130-451d-acd4-18e50fcee280'
Web.Pages.Order.PdfModel: Information: Connecting to dev protocol on uri 'ws://127.0.0.1:45725/devtools/browser/6a2ff928-8130-451d-acd4-18e50fcee280'
Web.Pages.Order.PdfModel: Information: Creating new websocket connection to url 'ws://127.0.0.1:45725/devtools/browser/6a2ff928-8130-451d-acd4-18e50fcee280'
Web.Pages.Order.PdfModel: Information: Opening websocket connection with a timeout of 30 seconds
Web.Pages.Order.PdfModel: Information: Websocket opened
Web.Pages.Order.PdfModel: Information: Creating new websocket connection to url 'ws://127.0.0.1:45725/devtools/page/82AFEAB73D9BD8BAB6985B1515234043'
Web.Pages.Order.PdfModel: Information: Opening websocket connection with a timeout of 30 seconds
Web.Pages.Order.PdfModel: Information: Websocket opened
Web.Pages.Order.PdfModel: Information: Connected to dev protocol
Web.Pages.Order.PdfModel: Information: Chrome started
Web.Pages.Order.PdfModel: Information: Getting page frame tree
Web.Pages.Order.PdfModel: Information: Setting document content
Web.Pages.Order.PdfModel: Information: Document content set
Web.Pages.Order.PdfModel: Information: Waiting for window.status 'true' or a timeout of 1000 milliseconds
Web.Pages.Order.PdfModel: Information: Waiting timed out
Web.Pages.Order.PdfModel: Information: Converting to PDF
Web.Pages.Order.PdfModel: Information: Converted
Web.Pages.Order.PdfModel: Information: Stopping Chrome
Web.Pages.Order.PdfModel: Information: Chrome stopped

@Sicos1977
Copy link
Owner

Sicos1977 commented Mar 17, 2022

Ow you are using this method internal void SetDocumentContent(string html) ... that does not have the logic that waits until the page has been loaded. It just expect that everything is in the HTML that is given. All the logic that waits for the fully loaded page is in the NavigateTo method and that one is never used in your case.

Never realised that using the SetDocumentContent would also have the loading problem. This is something I have to change in the code to make everything work.

As a temporary workaround you could save your html to a file and load this through the ConvertToPdf method that expects a ConvertUri

@Sicos1977 Sicos1977 self-assigned this Mar 17, 2022
@vitosaver
Copy link
Author

Yes, I forgot to mention that.

I tested it now with ConvertUri and it is working as expected.

In development still doesn't load CSS and images but I believe that has to do something with concurrent connections because on production it is working like a charm.

Thanks for the help and I will keep an eye on a fix for direct HTML injection, I just find that as a more elegant solution. Now I am saving it to a file and deleting it after PDF creation.

Thanks again!

@Sicos1977
Copy link
Owner

Sicos1977 commented Mar 17, 2022

I'm already changing the code... just it is not that much work but just never realised people would use the setcontent method to load css and things like that.

@Sicos1977 Sicos1977 reopened this Mar 17, 2022
@Sicos1977
Copy link
Owner

I just released a new nuget package .26 ... can you try that one and see if it will work as expected when setting the html the way you did it before?

Just curious, for what are you use ChromeHtmlTopdf?

@vitosaver
Copy link
Author

I think it is not working properly it is stuck on
Web.Pages.Order.PdfModel: Information: The 'Page.lifecycleEvent' with param name 'DomContentLoaded' has been fired, the dom content is now loaded and parsed, waiting for stylesheets, images and sub frames to finish loading

Tested on both dev(5min+ just loading) and prod(after 1 min I get a timeout from the server as expected from configuration) env

I am using it to generate a webshop order confirmation

Let me know if I can provide some additional information

@Sicos1977
Copy link
Owner

Can you provide me your logging?

@vitosaver
Copy link
Author

Full log is below:

Web.Pages.Order.PdfModel: Information: Resetting Chrome arguments to default
Web.Pages.Order.PdfModel: Information: Adding Chrome argument '--headless'
Web.Pages.Order.PdfModel: Information: Adding Chrome argument '--disable-gpu'
Web.Pages.Order.PdfModel: Information: Adding Chrome argument '--hide-scrollbars'
Web.Pages.Order.PdfModel: Information: Adding Chrome argument '--mute-audio'
Web.Pages.Order.PdfModel: Information: Adding Chrome argument '--disable-background-networking'
Web.Pages.Order.PdfModel: Information: Adding Chrome argument '--disable-background-timer-throttling'
Web.Pages.Order.PdfModel: Information: Adding Chrome argument '--disable-default-apps'
Web.Pages.Order.PdfModel: Information: Adding Chrome argument '--disable-extensions'
Web.Pages.Order.PdfModel: Information: Adding Chrome argument '--disable-hang-monitor'
Web.Pages.Order.PdfModel: Information: Adding Chrome argument '--disable-prompt-on-repost'
Web.Pages.Order.PdfModel: Information: Adding Chrome argument '--disable-sync'
Web.Pages.Order.PdfModel: Information: Adding Chrome argument '--disable-translate'
Web.Pages.Order.PdfModel: Information: Adding Chrome argument '--metrics-recording-only'
Web.Pages.Order.PdfModel: Information: Adding Chrome argument '--no-first-run'
Web.Pages.Order.PdfModel: Information: Adding Chrome argument '--disable-crash-reporter'
Web.Pages.Order.PdfModel: Information: Adding Chrome argument '--remote-debugging-port="0"'
Web.Pages.Order.PdfModel: Information: Adding Chrome argument '--window-size="1366,768"'
Web.Pages.Order.PdfModel: Information: Adding Chrome argument '--no-sandbox'
Web.Pages.Order.PdfModel: Information: Starting Chrome from location '/usr/bin/google-chrome' with working directory '/usr/bin'
Web.Pages.Order.PdfModel: Information: "/usr/bin/google-chrome" --headless --disable-gpu --hide-scrollbars --mute-audio --disable-background-networking --disable-background-timer-throttling --disable-default-apps --disable-extensions --disable-hang-monitor --disable-prompt-on-repost --disable-sync --disable-translate --metrics-recording-only --no-first-run --disable-crash-reporter --remote-debugging-port="0" --window-size="1366,768" --no-sandbox
Web.Pages.Order.PdfModel: Information: Chrome process started
Web.Pages.Order.PdfModel: Information: Received Chrome error data: 'DevTools listening on ws://127.0.0.1:41779/devtools/browser/eef161ab-9b43-453a-adcc-11121dabe0e7'
Web.Pages.Order.PdfModel: Information: Connecting to dev protocol on uri 'ws://127.0.0.1:41779/devtools/browser/eef161ab-9b43-453a-adcc-11121dabe0e7'
Web.Pages.Order.PdfModel: Information: Creating new websocket connection to url 'ws://127.0.0.1:41779/devtools/browser/eef161ab-9b43-453a-adcc-11121dabe0e7'
Web.Pages.Order.PdfModel: Information: Opening websocket connection with a timeout of 30 seconds
Web.Pages.Order.PdfModel: Information: Websocket opened
Web.Pages.Order.PdfModel: Information: Creating new websocket connection to url 'ws://127.0.0.1:41779/devtools/page/A2641680465E3F8F0CB507D9AF7AB6CC'
Web.Pages.Order.PdfModel: Information: Opening websocket connection with a timeout of 30 seconds
Web.Pages.Order.PdfModel: Information: Websocket opened
Web.Pages.Order.PdfModel: Information: Connected to dev protocol
Web.Pages.Order.PdfModel: Information: Chrome started
Web.Pages.Order.PdfModel: Information: Disabling caching
Web.Pages.Order.PdfModel: Information: Getting page frame tree
Web.Pages.Order.PdfModel: Information: The 'Page.lifecycleEvent' with param name 'DomContentLoaded' has been fired, the dom content is now loaded and parsed, waiting for stylesheets, images and sub frames to finish loading
Web.Pages.Order.PdfModel: Information: Setting document content
Web.Pages.Order.PdfModel: Information: Document content set
Web.Pages.Order.PdfModel: Information: The 'Page.lifecycleEvent' with param name 'DomContentLoaded' has been fired, the dom content is now loaded and parsed, waiting for stylesheets, images and sub frames to finish loading

@Sicos1977
Copy link
Owner

Is it possible to sent me the html you are using so that I can try it myself?

If so then please sent it to sicos2002@hotmail.com

@vitosaver
Copy link
Author

I just sent it

@Sicos1977
Copy link
Owner

Thanks will look into it to see why it hangs

@vitosaver
Copy link
Author

Thanks, if you need anything just let me know

@Sicos1977
Copy link
Owner

I think that you get a timeout because you did not set a base url in the html content so Chrome does not know from where to get the css files ... can you try to add this --> base href (https://developer.mozilla.org/en-US/docs/Web/HTML/Element/base)

@Sicos1977
Copy link
Owner

Sicos1977 commented Mar 17, 2022

If I add this in your html then everything seems to be working as expected when using an HTML string as input --> <base href="https://www.blooming.hr">

So my guess is right, Chrome does not know where to get the media and css files and probably just tries to load them from working directory

@Sicos1977
Copy link
Owner

I think 2.5.27 fixes your latest issue, when using setDocumentContent some events are not fired that I used to see if the page is fully loaded. Could you try this version and see if it works now?

@vitosaver
Copy link
Author

vitosaver commented Mar 17, 2022

I get exception FileNotFoundException: Could not find file '/app/d:\test.html'.

forgot to delete debug code here 😃

@Sicos1977
Copy link
Owner

Sicos1977 commented Mar 17, 2022

:-) ... oops... will fix that :-)

@Sicos1977
Copy link
Owner

2.5.28

@vitosaver
Copy link
Author

Yes! The third time is a charm 😄

All working as it should, to be honest, can't believe that base is making problems as we are passing absolute URL, but okay.

@Sicos1977
Copy link
Owner

I don't think that was the problem.. just a wrong assumption from me :-) ... before I figured out that Chome was not firing the Page.Navigated event... but thanks for making this issue because it made ChromeHtmlToPdf better because never thought about adding the page loading logic to the setDocumentContent method.

@Sicos1977
Copy link
Owner

Sicos1977 commented Mar 17, 2022

Also I liked the shop you build, it is running very fast even from the Neterlands... if I did read the info on the webpage correctly you are from Croatia? .... Netherlands overhere.

@vitosaver
Copy link
Author

Aha okay, no problem, thank you for providing this great library and maintaining it I am using it now for almost two years and to be honest, there is no better library than this, very simple and straightforward.

Thank you very much, speed and design were top priorities when I was building it.

Yes, I am. I was a few times in Amsterdam, a beautiful city.

If you get a chance to come this way hit me up maybe we can grab a beer 😉

@Sicos1977
Copy link
Owner

What I already said in the readme is that I needed a replacement for wkHtmlToPdf ... because that one sucked in the end... no support for HTML5, no bugfixes anymore, it was a great tool but now obsolete :-)

Since I'm an opensource enthousiast I decided to share the code so other people could also use it. I have a lott of tools on my GitHub page that are all related to document management since I'm working in that industry

@mertulusan
Copy link

@vitovanjak hello, do you able to work it on docker? If so do you have forked repo to share with me?

@vitosaver
Copy link
Author

Yes it is working, I will be on computer in 1-2h I will post my configuration

@Sicos1977
Copy link
Owner

Is it also okey that I add your configuration to the main github page for other users with the same question?

@vitosaver
Copy link
Author

vitosaver commented Apr 13, 2022

@Sicos1977 of course, feel free...

Basic info:
Hosting: Kubernetes
Compile: GitLab CI
.NET version: .NET 6
Docker version: latest stable
ChromeHtmlToPdf version: 2.5.31

Docker file:
It is basically auto generated file from Visual Studio + script from #39

FROM mcr.microsoft.com/dotnet/aspnet:6.0 AS base

# Suppress an apt-key warning about standard out not being a terminal. Use in this script is safe.
ENV APT_KEY_DONT_WARN_ON_DANGEROUS_USAGE=DontWarn

# export DEBIAN_FRONTEND="noninteractive"
ENV DEBIAN_FRONTEND noninteractive

# Install deps + add Chrome Stable + purge all the things
RUN apt-get update && apt-get install -y \
	apt-transport-https \
	ca-certificates \
	curl \
	gnupg \
	--no-install-recommends \
	&& curl -sSL https://dl.google.com/linux/linux_signing_key.pub | apt-key add - \
	&& echo "deb [arch=amd64] https://dl.google.com/linux/chrome/deb/ stable main" > /etc/apt/sources.list.d/google-chrome.list \
	&& apt-get update && apt-get install -y \
	google-chrome-stable \
	--no-install-recommends \
	&& apt-get purge --auto-remove -y curl gnupg \
	&& rm -rf /var/lib/apt/lists/*

# Chrome Driver
RUN apt-get update && \
    apt-get install -y unzip && \
    wget https://chromedriver.storage.googleapis.com/2.31/chromedriver_linux64.zip && \
    unzip chromedriver_linux64.zip && \
    mv chromedriver /usr/bin && rm -f chromedriver_linux64.zip

WORKDIR /app
EXPOSE 80
EXPOSE 443

FROM mcr.microsoft.com/dotnet/sdk:6.0 AS build
WORKDIR /src
COPY Web/Web.csproj Web/
RUN dotnet restore "Web/Web.csproj"
COPY . .
WORKDIR "/src/Web"
RUN dotnet build "Web.csproj" -c Release -o /app/build

FROM build AS publish
RUN dotnet publish "Web.csproj" -c Release -o /app/publish

FROM base AS final
WORKDIR /app
COPY --from=publish /app/publish .
ENTRYPOINT ["dotnet", "Web.dll"]

GitLab CI file:
If not familiar with GitLab CI basically this runs docker build command

stages:
  - build

docker:
  image: docker:stable
  stage: build
  services:
    - docker:dind
  only:
    - master
  before_script:
    - docker login registry.gitlab.com -u ${CI_REGISTRY_USER} -p ${CI_REGISTRY_PASSWORD}
  script:
    - docker build -t ${CI_REGISTRY_IMAGE}:latest ./
    - docker push ${CI_REGISTRY_IMAGE}:latest
  after_script:
    - docker logout ${CI_REGISTRY}
  tags: 
    - docker

C# code:

 var HTML = "<HTML code>";
 using (Converter converter = new Converter())
 using (MemoryStream stream = new MemoryStream())
 {
     // This is necessary when running on Docker
     converter.AddChromeArgument("--no-sandbox");

     // Create PDF out of HTML string
     converter.ConvertToPdf(html, stream, new ChromeHtmlToPdfLib.Settings.PageSettings());

     // Return file to user
     return File(stream.ToArray(), "application/pdf");
 }

So as you can see I didn't do anything crazy and out of ordinary, I was just following available documentation.

#39 this issue helped me the most but I didn't copy code from this issue I copied from forked project that is linked in that issue

If any question feel free to ask...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants