Skip to content

gingerchew/astro-word-loader

Repository files navigation

astro-word-loader

Take a glob of word files and convert them to an Astro collection.

How to use:

import { defineCollection } from 'astro:content';
import wordLoader from 'astro-word-loader';

const wordDocs = defineCollection({
    loader: wordLoader({
        sources: ['./my-word/docs/*.docx'], // file paths are run through node:glob
        styleMap: ['p[style-name="Section Title"] => h1:fresh']
    })
})

styleMap is the options passed to mammoth, to learn more about how styleMap works, check out the documentation here.

Special Notes

Each entry is given an id based on the file name, so if there are multiple word documents with the same file name, there will be conflict.

E.g. sample-docx-files-sample2.docx is retrievable with getEntry by using the id sample-docx-files-sample2.

Images are embedded as data-uri's for now. This means that a document with a ton of images will tank performance. I'm looking into possible solutions for this.

TODO

  • Images
    • Currently images in word docs seem to be converted to a data-uri
    • Add in loading and decoding attributes
    • Sizing with Astro's <Image /> component?
    • Other performance enhancements
    • Currently (12/18/24) sample4 takes too long to load for pagespeed to generate recommendations
  • Better style map interface
  • Performance testing?
  • Other word editor formats? Pages, open office?

Thanks

  • Mammoth and Mammoth.js specifically for powering this loader