Textract Plus

Extract text from any document with more power and a more wide extension scope. No more muss. No more fuss.

How To Use

Install Package -

pip install textract-plus

Import and Extract:

import textractplus as tp
text=tp.process('/path/to/document')
print(text)

Currently supporting extensions

Textract Plus supports a growing and extended list of file types for text extraction than textract. If you don't see your favorite file type here, Please recommend other file types by either mentioning them on the issue tracker or by :ref:`contributing a pull request <contributing>`.

.csv via python builtins
.tsv and .tab via python builtins
.doc via antiword
.docx via python-docx2txt
.eml via python builtins
.epub via ebooklib
.gif via tesseract-ocr
.jpg and .jpeg via tesseract-ocr
.json via python builtins
.html and .htm via beautifulsoup4
.mp3 via sox, SpeechRecognition, and pocketsphinx
.msg via msg-extractor
.odt via python builtins
.ogg via sox, SpeechRecognition, and pocketsphinx
.pdf via pdftotext (default) or pdfminer.six
.png via tesseract-ocr
.pptx via python-pptx
.ps via ps2ascii
.rtf via unrtf
.tiff and .tif via tesseract-ocr
.txt via python builtins
.wav via SpeechRecognition and pocketsphinx
.xlsx via xlrd
.xls via xlrd

Extended support

.dotx via docx2python
.docm via docx2python
.pptm via python-pptx

Name		Name	Last commit message	Last commit date
Latest commit History 606 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
bin		bin
docs		docs
provision		provision
requirements		requirements
tests		tests
textractplus		textractplus
.coveragerc		.coveragerc
.gitignore		.gitignore
.pyup.yml		.pyup.yml
.travis.yml		.travis.yml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.rst		README.rst
Vagrantfile		Vagrantfile
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Textract Plus

How To Use

Currently supporting extensions

Extended support

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

vaibhavhaswani/textract-plus

Folders and files

Latest commit

History

Repository files navigation

Textract Plus

How To Use

Currently supporting extensions

Extended support

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages