Skip to content

vaibhavhaswani/textract-plus

 
 

Repository files navigation

Textract Plus

Extract text from any document with more power and a more wide extension scope. No more muss. No more fuss.

Full documentation.

Build Status Version Test Coverage Documentation Status Updates Forks

How To Use

  • Install Package -

pip install textract-plus

  • Import and Extract:

    import textractplus as tp
    text=tp.process('/path/to/document')
    print(text)
    

Currently supporting extensions

Textract Plus supports a growing and extended list of file types for text extraction than textract. If you don't see your favorite file type here, Please recommend other file types by either mentioning them on the issue tracker or by :ref:`contributing a pull request <contributing>`.

Extended support


About

A fork from textract with extended extension support and features

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Languages

  • HTML 77.4%
  • Rich Text Format 12.4%
  • Python 9.2%
  • Shell 0.5%
  • Makefile 0.2%
  • PostScript 0.2%
  • Dockerfile 0.1%