Skip to content

add main.py for converter html to markdown #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from Aug 10, 2023
Merged

add main.py for converter html to markdown #2

merged 2 commits into from Aug 10, 2023

Conversation

ghost
Copy link

@ghost ghost commented Mar 30, 2023

Hello everyone!

Initially I removed local files like js, css. So I convert all html files to md. I made a fork where I'm making necessary changes, don't use this code.

this code makes no sense, better code could be:

import os
import html2markdown
from bs4 import BeautifulSoup, Doctype
from markdownify import markdownify

directory = './dataset/'
for root, dirnames, filenames in os.walk(directory):
    for filename in filenames:
        if filename.endswith('.html'):
            fname = os.path.join(root, filename) #print('Filename: {}'.format(fname))
            file = open(fname, "r").read() # file = open("./index.html", "r").read()
            html = markdownify(file, heading_style="ATX")
            newfile = open(fname, "w")
            newfile.write(html)
            newfile.close()

this code converts any markdown html file to a document. license: https://brianli.com/python-convert-html-markdown/. I answer this here: github.com/jnode-revisited/jnode/discussions/9 and github.com/jnode-revisited/dataset-jnode.org/pull/2

question/feedback

what do you all think of this idea?

@ghost ghost mentioned this pull request Mar 30, 2023
@tripleo1 tripleo1 merged commit 50e2fca into jnode-revisited:main Aug 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant