Creates and stores a Markdown rendition for every document in Alfresco Repository
- Generates
cm:markdown(text/markdown) from the PDF version of a document using either the newly createdcm:pdfrendition or the original when it’s already PDF. So Markdown is produced for any source mimetype - Works in two cases:
- When Alfresco generates a
cm:pdfrendition from another format (DOCX, ODT, etc.) - When the original upload is already a PDF
- When Alfresco generates a
- The Markdown file is stored as a proper rendition:
- Association:
rn:rendition - Association name (rendition id):
cm:markdown - Child node has aspect:
rn:rendition - Mimetype:
text/markdown
- Association:
- Runs asynchronously after commit (does not block user transactions)
- Build the JAR File using
mvn clean package - Place it in
<ALFRESCO_HOME>/modules/platform/or in Docker/k8s deployment folder - Restart Alfresco Repository
- Ensure Transform Service is configured (both Community and Enterprise versions are accepted)
For instance, for a Docker Deployment with https://github.com/Alfresco/alfresco-docker-installer/:
- Add the Markdown TEngine reference to Alfresco Configuration
alfresco:
environment:
JAVA_OPTS : '
-DlocalTransform.core-aio.url=http://transform-core-aio:8090/
-DlocalTransform.md.url=http://transform-md:8090/
'- Add the Markdown TEngine service
# Requires local Ollama running with "llava" model pulled
transform-md:
image: docker.io/angelborroy/alf-tengine-convert2md
environment:
SPRING_AI_OLLAMA_BASE_URL: http://host.docker.internal:11434- Copy the
markdown-rendition-0.8.0.jartoalfresco/modules/jarsdeployment folder
-
Triggers on either
- Rendition path: fires when a child association
rn:rendition/cm:pdfis created - Content path: fires on content create/update when the original node has mimetype
application/pdf
- Rendition path: fires when a child association
-
Transform
- Uses the Alfresco Transform Service to convert PDF to Markdown
- Requires a Transform Engine capable of
application/pdf TO text/markdown(likealf-tengine-convert2md)
-
Execution model
- Work is queued within the current transaction and executed post-commit on a background thread
- Each job runs in a fresh Repository transaction with system privileges
-
Storage semantics
- The Markdown output is persisted as a child of the original node under
rn:renditionwith namecm:markdown - Surfaces in Share/ADF/REST like native renditions (
cm:doclib,cm:webpreview,cm:pdfandcm:markdown)
- The Markdown output is persisted as a child of the original node under
- Repository: Alfresco Content Services 25.x (Community & Enterprise)
- Transform: Any TEngine advertising
application/pdf → text/markdown - Clients: Share / ADF / Public REST will list and retrieve the
cm:markdownrendition
- Fetch Markdown content
GET http://localhost:8080/alfresco/api/-default-/public/alfresco/versions/1/nodes/{nodeId}/renditions/markdown/contentAccept: text/markdown
-
No
cm:markdownshown- Confirm the Transform Engine supports PDF to Markdown
- Ensure the source is a PDF (either via
cm:pdfrendition or original mimetype) - Check Repository logs for transform failures
-
Unexpected delays
- Post-commit execution means the rendition appears shortly after the original transaction completes
Issues and PRs welcome!