Take a glob of word files and convert them to an Astro collection.
Important
This is a beta release See the Special Notes section to learn some of the rough edges
import { defineCollection } from 'astro:content';
import wordLoader from 'astro-word-loader';
const wordDocs = defineCollection({
loader: wordLoader({
sources: ['./my-word/docs/*.docx'], // file paths are run through node:glob
styleMap: ['p[style-name="Section Title"] => h1:fresh']
})
})
styleMap
is the options passed to mammoth
, to learn more about how styleMap
works, check out the documentation here.
Each entry is given an id based on the file name, so if there are multiple word documents with the same file name, there will be conflict.
E.g. sample-docx-files-sample2.docx
is retrievable with getEntry
by using the id sample-docx-files-sample2
.
Images are embedded as data-uri
's for now. This means that a document with a ton of images will tank performance.
I'm looking into possible solutions for this.
- Images
- Currently images in word docs seem to be converted to a data-uri
- Add in loading and decoding attributes
- Sizing with Astro's
<Image />
component? - Other performance enhancements
- Currently (12/18/24) sample4 takes too long to load for pagespeed to generate recommendations
- Better style map interface
- Performance testing?
- Other word editor formats? Pages, open office?
- Mammoth and Mammoth.js specifically for powering this loader