Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ A text extraction node module.
* DXF
* `application/javascript`
* All `text/*` mime-types.
* We will be adding more text formats for you all.

In almost all cases above, what textract cares about is the mime type. So `.html` and `.htm`, both possessing the same mime type, will be extracted. Other extensions that share mime types with those above should also extract successfully. For example, `application/vnd.ms-excel` is the mime type for `.xls`, but also for 5 other file types.

Expand Down Expand Up @@ -167,4 +168,7 @@ textract.fromUrl(url, config, function( error, text ) {})
- `sudo port install tesseract-chi-sim`
- `sudo port install tesseract-eng`
- You will also want to disable textract's usage of textutil as the tests are based on output from antiword.
- Go into `/lib/extractors/{doc|doc-osx|rtf}` and modify the code under `if ( os.platform() === 'darwin' ) {`. Uncommented the commented lines in these sections.
- Go into `/lib/extractors/{doc|doc-osx|rtf}` and modify the code under `if ( os.platform() === 'darwin' ) {`. Uncommented the commented lines in these sections.


* We are working continously to make this project more efficient. Till then , keep extracting!!!!!!!!!!!!!