Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it necessary to store file content in db.json for large blog? #3271

Closed
ahuigo opened this issue Sep 26, 2018 · 5 comments
Closed

Is it necessary to store file content in db.json for large blog? #3271

ahuigo opened this issue Sep 26, 2018 · 5 comments

Comments

@ahuigo
Copy link

ahuigo commented Sep 26, 2018

I have nearly about 800 markdown files, and it leads db.json increment to about 20M.

I don't think It is necessary to store content within db.json.

@tcrowe
Copy link
Contributor

tcrowe commented Sep 29, 2018

Yeah @ahuigo that is an ongoing discussion what to do with large sites. Any ideas? Just keep in memory or what? The db.json is a cache so it doesn't re-parse anything it already parsed.

@tcrowe tcrowe added question Needs help in usage #perfmatters labels Sep 29, 2018
@ahuigo
Copy link
Author

ahuigo commented Sep 30, 2018

Some ideas about decreasing the building time of Hexo.

  1. The db.json
    1. Stores only markdown files's meta info(path,title,date,updated,category). and building info such as last building time.
    2. Don't cache the whole file content in db.json. Read the content directly from file system If we need it.
  2. We can just find out the modified files via git ,find, or other tools . https://stackoverflow.com/questions/16085958/scripts-find-the-files-have-been-changed-in-last-24-hours
  3. Support incremental building. We can just build the modified files only when build site every time.
    Building should not relate to unmodified files .

For example:

# hexo g;
# {build_meta:{'last_time':'2018-09-29...'}, files_meta:{...}}
dbinfo = parse('db.json') 
cmd = 'git diff-index --cached --name-status --diff-filter=ACMRD HEAD -- ./_posts '
output = getoutput(cmd).strip()
if output:
    # find out modified files and deleted files
    modified_blogs = {}
    delete_blogs = []
    for line in output.split('\n'):
        status, path = line.split('\t')
        if status == 'D':
            delete_blogs.append(path)
            continue

        blog = parseBlog(path)
        modified_blogs[path] = blog['meta']

    # delete file
    if path not in dbinfo['files_meta']:
        html_path = f'public/{path}.html'
        getoutput(f'rm {html_path}')
        hexo_delete_tags(file_meta)
        hexo_delete_category(file_meta)

    # add & update file(Incremental Building)
    for path,file_meta in modified_blogs.items():
        hexo_generate_html(path)
        hexo_add_update_tags(file_meta)
        hexo_add_update_category(file_meta)

    # save db.json
    hexo_update_db('db.json',modified_blogs, delete_blogs)

@ahuigo
Copy link
Author

ahuigo commented Oct 6, 2018

I've written a script to generate static blog. https://github.com/ahuigo/a/blob/master/tool/pre-commit It's only for my own use, not for hexo.

@stale stale bot added the stale label Mar 4, 2019
@hexojs hexojs deleted a comment from stale bot Mar 4, 2019
@curbengh curbengh removed the question Needs help in usage label Dec 9, 2019
@stevenjoezhang
Copy link
Member

See also hexojs/warehouse#13

@stevenjoezhang
Copy link
Member

I'll close this issue, because the major performance overhead of Hexo is not reading or writing db.json, but processing the cross-refs, e.g. finding posts with a tag or tags of a post.

See #2579 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants