-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Welcome to the docx2html5-responsive-converter wiki! # Welcome to the docx2html5-responsive-converter Wiki!
Welcome to the official wiki for docx2html5-responsive-converter. This software converts Microsoft DOCX files into clean, responsive HTML5 documents using LibreOffice CLI and custom Python scripts.
The docx2html5-responsive-converter project provides a streamlined solution for converting DOCX documents into modern HTML5 pages. The conversion process is divided into two main stages:
-
DOCX to XHTML Conversion: The tool uses LibreOffice’s command-line interface (CLI) to convert DOCX files into XHTML format.
-
HTML Post-Processing and Optimization: Custom Python scripts (utilizing libraries such as lxml) then process the XHTML output to:
-
Clean up and optimize the HTML.
-
Inject responsive meta tags and Bootstrap CSS.
-
Ensure images and other media elements are optimized for responsiveness.
-
Extract alternative text from
<wp:docPr>elements for accessibility.
-
The code is modular and organized by functionality, making it easy to customize or extend for your specific needs.
-
Automated Conversion: Converts DOCX files to XHTML using LibreOffice CLI.
-
Responsive HTML Output: Processes and optimizes the XHTML into responsive HTML5, ready for modern web browsers.
-
Alt Text Extraction: Automatically extracts and maps alt text for images from DOCX files to enhance accessibility.
-
Batch Conversion Support: (Optional) Process multiple DOCX files at once using the provided batch conversion functionality.
-
Customizable Templates: Easily modify the HTML template, CSS styles, and meta tags used during post-processing.
Before using the converter, ensure you have the following installed:
-
LibreOffice: Ensure LibreOffice is installed and that its CLI tool (
soffice.exeon Windows) is accessible. -
Python 3.8 or Above: Download and install Python from the [official Python website](https://www.python.org/).
-
Required Python Libraries: Install the necessary Python libraries by running:
bash pip install -r requirements.txt Installation Clone the Repository:
bash
git clone https://github.com/reddyapuru/docx2html5-responsive-converter.git Navigate to the Repository Directory:
bash cd docx2html5-responsive-converter Usage Single File Conversion To convert a single DOCX file, run the conversion script:
bash python libre-docx2html5.py Follow the on-screen instructions and enter the full path to your DOCX file when prompted. The script will output a responsive HTML file based on your input.
Batch Conversion For converting multiple DOCX files at once, use the batch conversion function. This function processes all DOCX files in a specified input folder (excluding temporary files like those starting with ~$) and saves the converted HTML files in a designated output folder.
Example usage in a Python script:
python
input_folder = "path/to/your/docx_files" output_folder = "path/to/save/html_files"
batch_results = convert_docx_to_html_batch(input_folder, output_folder)
for docx_file, message in batch_results.items():
print(f"{docx_file}: {message}")
Customization and Configuration
HTML Template:
Modify the optimize_html function to change the injected meta tags, Bootstrap CSS, and custom styles.
Alt Text Mapping: Adjust the logic in the extract_alt_text_from_docx function if your DOCX files have custom attributes or structure for image descriptions.
Batch Processing Filters: The batch conversion code automatically excludes temporary files (e.g., those starting with ~$). You can further customize the filtering logic as needed.
Contributing Contributions are welcome! If you have bug fixes, feature suggestions, or improvements:
Fork the repository. Make your changes. Submit a pull request. Please adhere to the coding style and include relevant tests with your contributions.
Our DOCX to Responsive HTML Converter is deployed on [Koyeb](https://www.koyeb.com), leveraging their robust and scalable cloud platform to provide fast and secure conversions. Here are the details of our deployment:
-
Access URL: https://stormy-kirsteni-latest2all-b166eaa4.koyeb.app/
-
This URL serves as the public endpoint where users can upload their DOCX files for conversion.
-
Backend: The application is built using Flask on Python 3.9 and employs a production-ready Docker container.
-
Conversion Engine: We use LibreOffice in headless mode to convert DOCX files into responsive HTML, with additional processing (including image extraction and HTML optimization) performed by our custom scripts.
-
Containerization: The app is containerized using Docker, ensuring consistent behavior across different environments.
-
Continuous Deployment: Our source code is hosted on GitHub at [reddyapuru/docx2html5-responsive-converter](https://github.com/reddyapuru/docx2html5-responsive-converter), and Koyeb’s seamless integration with GitHub triggers automatic builds and deployments.
-
Responsive Output: The final output is a fully responsive HTML file (packaged in a ZIP along with images) that looks great on any device.
-
Privacy-First: All uploaded files and converted packages are automatically deleted after 10 minutes, ensuring your data remains secure and confidential.
-
Scalable Infrastructure: Hosted on Koyeb’s free instance, the deployment can easily be scaled as needed, benefiting from Koyeb’s global infrastructure.
This deployment strategy ensures that our tool remains fast, reliable, and secure, providing a seamless conversion experience for users while maintaining the highest standards of privacy.
License This project is licensed under the GPL-3.0 License. You are free to use, modify, and distribute the software under the terms of this license.
Contact For questions, suggestions, or support, please open an issue in the repository or refer to the contact details provided in the repository’s README.
Happy Converting! # Sponsor Our DOCX2HTML5 Converter
At Latest2All, we are dedicated to providing innovative, high-quality tools—like our DOCX to Responsive HTML Converter—that streamline your document conversion process. If you find our service valuable, you can help us keep it free and continuously improve it by supporting our work through purchasing our e-book:
[Database Management Using AI: A Comprehensive Guide by A Purushotham Reddy](https://books.google.co.in/books?id=gBYrEQAAQBAJ&printsec=frontcover&dq=database+management+using+ai&hl=en&newbks=1&newbks_redir=0&sa=X&redir_esc=y#v=onepage&q=database%20management%20using%20ai&f=false)
-
Sustainability: Your purchase helps fund the ongoing development and maintenance of our DOCX2HTML5 Converter, ensuring that we continue to offer a fast, secure, and free tool for everyone.
-
Future Enhancements: Funds from e-book sales will be reinvested into adding new features and improving the converter, further enhancing its performance and user experience.
-
Community Support: By supporting our project, you contribute to an open-source initiative that benefits developers and digital creators worldwide.
Database Management Using AI: A Comprehensive Guide provides an in-depth look at modern database management techniques powered by artificial intelligence. Whether you’re new to the field or a seasoned professional, this e-book covers essential concepts, best practices, and practical examples to help you leverage AI for efficient database management.
If you appreciate the convenience and innovation of our DOCX to Responsive HTML Converter, please consider purchasing the e-book to sponsor our work. Your support is invaluable in helping us keep our projects sustainable and continuously improve our services.
Thank you for your support!
docx2html5 sponsored by www.latest2all.com