Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recreate vignettes/pdf/* #1331

Merged
merged 1 commit into from
Sep 16, 2024
Merged

Recreate vignettes/pdf/* #1331

merged 1 commit into from
Sep 16, 2024

Conversation

Enchufa2
Copy link
Member

Closes #1330. For some reason these intermediate files are flagged by various antivirus software. A simple solution is to avoid including them in the tarball. See:

As a side advantage, the tarball is now 1.8 MB instead of 3.3 MB.

@eddelbuettel
Copy link
Member

eddelbuettel commented Sep 11, 2024

Are you suggesting to no longer ship pdf vignettes?

I think that is a non-starter for official release as eg for CRAN. If someone wants to created a 'permanently cloning' that shadows this one without pdfs I cannot stop them, but the official release should include the official documentation.

@Enchufa2
Copy link
Member Author

Enchufa2 commented Sep 11, 2024

No, I'm just suggesting to avoid shipping the intermediate PDFs. The vignettes are still there in the tarball under inst/doc/Rcpp-*.pdf.

In other words, there are two sets of PDFs:

  • Intermediate ones, created from the Rmds, in vignettes/pdf/Rcpp-*.pdf.
  • Final ones, which are the ones that are used when you open a vignette, created from Sweave files, in vignettes/Rcpp-*.pdf

The first ones are not needed and are the ones flagged as virus.

@eddelbuettel
Copy link
Member

eddelbuettel commented Sep 11, 2024

Gotcha.

I think the reason I left them in is as they are, "intermediate and all" was to permit the R CMD check ... job to rebuild the vignettes which requires the intermediates as they are input to the Rnw files that R sees. Don't we fail a full check run if you omit them?

@Enchufa2
Copy link
Member Author

I see. Yes, you are right. Then one solution would be to put the .Rnw files under vignettes/Rnw and leave just the final PDFs under vignettes/. In this way, there's nothing to rebuild and no possible issue, right?

@eddelbuettel
Copy link
Member

It's been a while and I don't recall why one cannot ship 'entirely pre-made pdfs' as vignettes. I think there is a reason for the two step.

And in event, doesn't any pdf 'blob' run the risk of randomly upsetting the malware checker (in an error of the latter) ?

@kevinushey
Copy link
Contributor

To quote Office Space... “No way! Why should I change? He’s the one who sucks!”

IMHO, if people want to use broken virus scanning tools, then they can do the legwork to work around their own broken tools.

@Enchufa2 Enchufa2 marked this pull request as draft September 11, 2024 19:50
@Enchufa2
Copy link
Member Author

Enchufa2 commented Sep 11, 2024

I tend to agree. But again, we are shipping 3.3 MB and only 1.8 MB of useful stuff. Let me see if I can reduce this to save everybody's bandwidth, and be CRAN compliant; and, as a complement, if we can make the life of these people, who probably don't select which antivirus software runs in their corporate computers, easier, that's a plus. But if you don't like the change, no problem in closing this, but let me try, please.

@Enchufa2
Copy link
Member Author

Two observations:

  1. If I knit the Rmds again, the resulting intermediate PDFs are not flagged anymore. Comparing them with the current ones, the only difference I see is the Ghostscript version:
    • For current `vignettes/*.pdf`` files, 10.01.2
    • For current vignettes/pdf/*.pdf files, 10.00.0 <- these are the ones flagged
    • For regenerated files, in my computer, 10.02.1

So maybe Ghostscript 10.00.0 is doing something deemed wrong/insecure that was fixed later? And

  1. Vignettes were moved from inst/doc to vignettes back in 2013 because it is "preferred". This is true as long as the source is provided and checked on CRAN. But now that we provide prebuilt ones, doesn't it make more sense to move them back to inst/doc, including the Rmds and get rid of the Rnws? Again, with the goal of reducing the tarball footprint.

@eddelbuettel
Copy link
Member

Nice catch. Maybe they just needed a refresh.

What we currently use "works" (under the standard definition of CRAN and R CMD check being happy) and is used the same way by a few other packages I look after. That said, if you think you find a better way I would all ears but I don't think it is really that pressing a problem. (The largest CRAN source packages come in at over 40, 50 or even 60 mb so I do not think we pose a problem at 3.3mb and bi-annual releases.)

@Enchufa2
Copy link
Member Author

Ok, so first things first. PDFs recreated, they are no longer flagged by any antivirus. I have also refined some of the entries about vignettes in the dot files: dropped outdated ones, and refined some to effectively ignore some intermediate files that were not covered by some outdated entries.

I can propose a relocation of the vignettes separately, and if you agree with the approach, then we can open another PR for that.

@Enchufa2 Enchufa2 changed the title Exclude vignettes/pdf Recreate vignettes/pdf/* Sep 12, 2024
@Enchufa2
Copy link
Member Author

Enchufa2 commented Sep 12, 2024

BTW, did something change in pinp? I noticed the links are now red instead of blue. Do you want me to recreate also the vignettes/*.pdf files so that colors match?

@Enchufa2
Copy link
Member Author

Ok, I see they should be blue, but you didn't publish the fix to CRAN just yet. So installing pinp from your repo and recreating again...

@mvankessel-EMC
Copy link

Thank you @Enchufa2 for making the PR.

I've reached out to the organisation that flagged this for us. They confirm that Ghostscript is most likely the culprit, eventhough they agree that this looks like a false positive.

This is what I've been able to find regarding Ghostscript: this change log (2024-05-02). Which reports the following vulnerabilities in versions <10.03.1:

ID Description
CVE-2024-33869 An issue was discovered in Artifex Ghostscript before 10.03.1. Path traversal and command execution can occur (via a crafted PostScript document) because of path reduction in base/gpmisc.c. For example, restrictions on use of %pipe% can be bypassed via the aa/../%pipe%command# output filename.
CVE-2023-52722 An issue was discovered in Artifex Ghostscript before 10.03.1. psi/zmisc1.c, when SAFER mode is used, allows eexec seeds other than the Type 1 standard.
CVE-2024-33871 An issue was discovered in Artifex Ghostscript before 10.03.1. contrib/opvp/gdevopvp.c allows arbitrary code execution via a custom Driver library, exploitable via a crafted PostScript document. This occurs because the Driver parameter for opvp (and oprp) devices can have an arbitrary name for a dynamic library; this library is then loaded.
CVE-2024-29510 Artifex Ghostscript before 10.03.1 allows memory corruption, and SAFER sandbox bypass, via format string injection with a uniprint device.
CVE-2024-33870 An issue was discovered in Artifex Ghostscript before 10.03.1. There is path traversal (via a crafted PostScript document) to arbitrary files if the current directory is in the permitted paths. For example, there can be a transformation of ../../foo to ./../../foo and this will grant access if ./ is permitted.

@Enchufa2
Copy link
Member Author

Ok, so my suspicion was right. @mvankessel-EMC Can you please check whether the attached artifact Rcpp_1.0.13.1.tar.gz raises any flags? There is one set of PDFs with version 10.01.2 and the newest one with version 10.02.1, so maybe they still cause trouble. In such a case, I could look into installing the newest version of ghostscript to recreate them again.

@Enchufa2
Copy link
Member Author

Ghostscript 10.03.1 was in rawhide but not in F40 for some reason. So it was easier for me to just rebuild the package for F40, install it, and then rebuild all PDFs. In this way, we are in the safe side. Now we have:

$ strings vignettes/{,rmd/}*.pdf | grep Ghostscript
<rdf:Description rdf:about="" xmlns:pdf='http://ns.adobe.com/pdf/1.3/' pdf:Producer='GPL Ghostscript 10.03.1'/>
<rdf:Description rdf:about="" xmlns:pdf='http://ns.adobe.com/pdf/1.3/' pdf:Producer='GPL Ghostscript 10.03.1'/>
<rdf:Description rdf:about="" xmlns:pdf='http://ns.adobe.com/pdf/1.3/' pdf:Producer='GPL Ghostscript 10.03.1'/>
<rdf:Description rdf:about="" xmlns:pdf='http://ns.adobe.com/pdf/1.3/' pdf:Producer='GPL Ghostscript 10.03.1'/>
<rdf:Description rdf:about="" xmlns:pdf='http://ns.adobe.com/pdf/1.3/' pdf:Producer='GPL Ghostscript 10.03.1'/>
<rdf:Description rdf:about="" xmlns:pdf='http://ns.adobe.com/pdf/1.3/' pdf:Producer='GPL Ghostscript 10.03.1'/>
<rdf:Description rdf:about="" xmlns:pdf='http://ns.adobe.com/pdf/1.3/' pdf:Producer='GPL Ghostscript 10.03.1'/>
<rdf:Description rdf:about="" xmlns:pdf='http://ns.adobe.com/pdf/1.3/' pdf:Producer='GPL Ghostscript 10.03.1'/>
<rdf:Description rdf:about="" xmlns:pdf='http://ns.adobe.com/pdf/1.3/' pdf:Producer='GPL Ghostscript 10.03.1'/>
<rdf:Description rdf:about="" xmlns:pdf='http://ns.adobe.com/pdf/1.3/' pdf:Producer='GPL Ghostscript 10.03.1'/>
<rdf:Description rdf:about="" xmlns:pdf='http://ns.adobe.com/pdf/1.3/'><pdf:Producer>GPL Ghostscript 10.03.1</pdf:Producer>
<rdf:Description rdf:about="" xmlns:pdf='http://ns.adobe.com/pdf/1.3/'><pdf:Producer>GPL Ghostscript 10.03.1</pdf:Producer>
<rdf:Description rdf:about="" xmlns:pdf='http://ns.adobe.com/pdf/1.3/'><pdf:Producer>GPL Ghostscript 10.03.1</pdf:Producer>
<rdf:Description rdf:about="" xmlns:pdf='http://ns.adobe.com/pdf/1.3/'><pdf:Producer>GPL Ghostscript 10.03.1</pdf:Producer>
<rdf:Description rdf:about="" xmlns:pdf='http://ns.adobe.com/pdf/1.3/'><pdf:Producer>GPL Ghostscript 10.03.1</pdf:Producer>
<rdf:Description rdf:about="" xmlns:pdf='http://ns.adobe.com/pdf/1.3/'><pdf:Producer>GPL Ghostscript 10.03.1</pdf:Producer>
<rdf:Description rdf:about="" xmlns:pdf='http://ns.adobe.com/pdf/1.3/'><pdf:Producer>GPL Ghostscript 10.03.1</pdf:Producer>
<rdf:Description rdf:about="" xmlns:pdf='http://ns.adobe.com/pdf/1.3/'><pdf:Producer>GPL Ghostscript 10.03.1</pdf:Producer>
<rdf:Description rdf:about="" xmlns:pdf='http://ns.adobe.com/pdf/1.3/'><pdf:Producer>GPL Ghostscript 10.03.1</pdf:Producer>
<rdf:Description rdf:about="" xmlns:pdf='http://ns.adobe.com/pdf/1.3/'><pdf:Producer>GPL Ghostscript 10.03.1</pdf:Producer>

@mvankessel-EMC
Copy link

Ok, so my suspicion was right. @mvankessel-EMC Can you please check whether the attached artifact Rcpp_1.0.13.1.tar.gz raises any flags? There is one set of PDFs with version 10.01.2 and the newest one with version 10.02.1, so maybe they still cause trouble. In such a case, I could look into installing the newest version of ghostscript to recreate them again.

They confirmed it came out clean. Also on my local environment, it comes out clean. Thanks

.Rbuildignore Outdated Show resolved Hide resolved
.Rbuildignore Outdated Show resolved Hide resolved
@Enchufa2
Copy link
Member Author

Files restored. Is it ok to add those entries to vignettes/.gitignore to avoid the hassle of avoiding untracked files?

Copy link
Member

@eddelbuettel eddelbuettel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great -- thanks for doing this!

@eddelbuettel
Copy link
Member

Is it ok to add those entries to vignettes/.gitignore to avoid the hassle of avoiding untracked files?

Sure. I often think a central, top-level .gitignore is easier but I see we already had this one :) Some of those files are/were removed by cleanup too methinks but the addition to that vignettes/.gitignore is clean and simple.

@kevinushey
Copy link
Contributor

LGTM; thanks for doing the leg work! Although my initial objection still stands :-)

@Enchufa2
Copy link
Member Author

Anything else from my side?

@eddelbuettel
Copy link
Member

🚢

@eddelbuettel
Copy link
Member

@Enchufa2 If there is anything stopping you from merging let us know. Looked ready to me days ago...

@Enchufa2 Enchufa2 merged commit 8ba7810 into master Sep 16, 2024
18 checks passed
@Enchufa2 Enchufa2 deleted the pdfs branch September 16, 2024 22:21
@Enchufa2
Copy link
Member Author

I wouldn't touch the "Merge" button unless explicitly asked by you. Now I feel permission was granted. ;-)

@eddelbuettel
Copy link
Member

Would you mind undoing the 'merge'? We really prefer squash merges.

An easy equivalent fix is to

Equivalently you can rebase but this is ..... ugly 😢

image

@eddelbuettel
Copy link
Member

Ah, yes, and we forgot to flip it to 'squash - merge'. If you do it once the UI remembers...

@eddelbuettel
Copy link
Member

You could also, as a variant, roll back hard to #1327, then create a new branch to redo the pr, cherry-pick the commit in there, I approve again and you then squash merge. Difference is that you get a PR# on the commit line. Oh well.

@eddelbuettel
Copy link
Member

eddelbuettel commented Sep 16, 2024

I guess I could also manuall rebase the commit + merge commit into one for you. Should I ? Easier on non-merged commits, apparently.

@eddelbuettel
Copy link
Member

Ok -- fixed it:

image

But I forced-pushed so you now need to

$ git reset --hard dfa585d2           # to jump back before your merge
$ git pull --all 

@Enchufa2
Copy link
Member Author

Ooops, sorry!

@eddelbuettel
Copy link
Member

I should have pointed it out -- my bad. But we're good now.

'Squash merges' are nice for linear, more compact histories (for repos with enough activity). We do that at work by prefernce (and a setting in the repo outlawing merge or direct commits, but that is too uptight for us here). @kevinushey surely has similar setups at work.

Thanks for cleaning this up. Having "non-viral" builds via r-universe will be good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Suspected malware
4 participants