Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Desktop: Add OneNote Importer #10642

Open
wants to merge 153 commits into
base: dev
Choose a base branch
from
Open

Conversation

pedr
Copy link
Collaborator

@pedr pedr commented Jun 20, 2024

Summary

I'm adding a new package library that will be used to convert OneNote exports into Joplin Notes.

The onenote-converter is based on the implementations found in:
https://github.com/msiemens/one2html
https://github.com/msiemens/onenote.rs

We will be compiling the Rust code to wasm with the package wasm-pack to output a Node package that can be imported in our codebase


What is missing at this point:

  • Create resources from the SVG tags after the note is converted to HTML
  • Remove the feature from mobile/app-cli. Find a way to disable the option of importing there
  • Find a way to build the onenote-converter package (described bellow).
  • Fix run_ci.sh (changed to debug)
  • Fix yarn dist process (make sure the package is always build there)
  • Review code
  • [ ]

How the package is being integrated

Now Rust is a requirement, the message that is visible when Rust is not installed in the developer machine:

Error message Message that shows when you try to run yarn build from the package without having Rust:
js@mint:~/Desktop/joplin/fork/packages/onenote-converter$ yarn build
> wasm-pack build --target nodejs --release
Error: failed to start `cargo metadata`: No such file or directory (os error 2)
Caused by: failed to start `cargo metadata`: No such file or directory (os error 2)
Caused by: No such file or directory (os error 2)
Fatal error
----------------------------------------------------------------
Rust toolchain is missing, please install it: https://rustup.rs/
----------------------------------------------------------------

2024-08-12_17-13

pedr and others added 30 commits April 1, 2024 07:15
@pedr
Copy link
Collaborator Author

pedr commented Aug 15, 2024

  • Move onenote-converter to shim so it can be injected instead of being required directly (See packages/lib/services/ocr/drivers/OcrDriverTesseract.ts)

@pedr
Copy link
Collaborator Author

pedr commented Aug 17, 2024

Since we talked in the last meeting about finding a way to remove the requirement of adding Rust to the docker image for the server what I did was use shim.dynamicImport to get the file from the package/onenote-converter/pkg during run time.

Inside the Dockerfile.server I added a line to remove the import of @joplin/onenote-converter from the lib/package.json (otherwise the resolve step of yarn install fails)

@@ -79,7 +84,7 @@ ARG BUILD_DATE
ARG REVISION
ARG VERSION
LABEL org.opencontainers.image.created="$BUILD_DATE" \
org.opencontainers.image.title="Joplin Server" \
org.opencontainers.image.title="Joplin Server WITH RUST" \
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that debugging code?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sorry, I'm going to clean this up

for (const encodedLink of fileLinks) {
const link = decodeURI(encodedLink);

if (isDataUrl(link)) {
if (isDataUrl(link) || isMailTo(link) || isFilenameTooLong(link)) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That fix should be moved to a separate PR

}

public static setIdGenerator(generator: ()=> string) {
this.uuidGenerator = generator;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A good pattern for this kind of setter is to return the previous generator so that it can be reset by the caller. That's more future proof.

parser.write(html);
parser.end();

return { svgs, html: body.join('') };
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the impact of this change on the app security? Looking at the code, I feel it will have the same SVG-related vulnerabilities that we already dealt with before. What are your thoughts on this? I guess it depends how the extracted SVGs are going to be used.

And as we've discussed before an HTML parser is not capable of parsing an SVG element safely. Maybe a different, more secure parser would be needed?

Comment on lines 108 to 116
export const isFilenameTooLong = (filename: string) => {
return filename.length > 255;
};

export const isLink = (text: string) => {
if (!text) return false;
const linkRegex = /^(https?|file|joplin):\/\/[^)\s]+$/;
return !!text.match(linkRegex);
};
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do you need these? It's not really generic functions so I'm not sure we need it there

@pedr
Copy link
Collaborator Author

pedr commented Aug 26, 2024

@laurent22

What's the impact of this change on the app security? Looking at the code, I feel it will have the same SVG-related vulnerabilities that we already dealt with before. What are your thoughts on this? I guess it depends how the extracted SVGs are going to be used.

And as we've discussed before an HTML parser is not capable of parsing an SVG element safely. Maybe a different, more secure parser would be needed?

The idea here is that we are removing the node out of HTML and using it as a resource linked in

With this we are protecting the user from exploits that happen when the exploit in inside the svg node (e.g.: a script tag inside). While the script tag would still be there it won't be executed unless the user opens the svg by itself, which I don't believe is possible inside Joplin.

Two other things are important to remember, this is a process that happens before any other sanitization and the original idea would be that this was only going to be used in the importer process of OneNote (there isn't any way to add svg to the Note of OneNote).

A simpler implementation (and probably more secure) using jsdom could be done like:

const result = []
const svgs = document.querySelectAll('svg');

for (const svg of svgs) {
    const id = titleGenerator();
    const parent = svg.parentNode;
    parentNode.replaceChild(<img src={id} />, svg);
    result.push({ id, svg: svg: serialize(svg) })
}

@pedr
Copy link
Collaborator Author

pedr commented Aug 26, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants