-
Notifications
You must be signed in to change notification settings - Fork 816
Issues: Unstructured-IO/unstructured
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Add manual coordinate constraints to New feature or request
pdf
partition_pdf()
.
enhancement
#3072
opened May 22, 2024 by
ChiNoel-osu
Disable telemetry/tracking
bug
Something isn't working
documentation
Improvements or additions to documentation
#3459
opened Aug 1, 2024 by
TaylorN15
Support to parse confluence wiki content
enhancement
New feature or request
#3457
opened Aug 1, 2024 by
zjffdu
Table Title and Table content separate chunks: Merge contents of parent_id and element.id
chunking
Related to element chunking.
enhancement
New feature or request
#3012
opened May 14, 2024 by
weissenbacherpwc
pptx: shapes "off-slide" to the right and bottom are not excluded
enhancement
New feature or request
ppt
Related to Microsoft PowerPoint (.ppt) legacy file format
pptx
Related to Microsoft PowerPoint (.pptx) file format
#1473
opened Sep 20, 2023 by
scanny
feat/group elements by parent_id
enhancement
New feature or request
good first issue
Good for newcomers
#1489
opened Sep 21, 2023 by
ron-unstructured
feat: ability to skip non-plain-text element types in chunk_by_title()
chunking
Related to element chunking.
enhancement
New feature or request
#1695
opened Oct 10, 2023 by
cragwolfe
docx: partitioner finds text nested in revision-marks
docx
Related to Microsoft Word (.docx) file format
enhancement
New feature or request
#1821
opened Oct 20, 2023 by
scanny
feat/Guard against excessive memory usage when partitioning PDFs
enhancement
New feature or request
pdf
#2129
opened Nov 20, 2023 by
flash1293
feat/retain md image links
enhancement
New feature or request
html
#2225
opened Dec 6, 2023 by
shreyanid
feat/parse_html_embed_objects
enhancement
New feature or request
html
#2233
opened Dec 7, 2023 by
My3VM
(PDF) How to let New feature or request
partition_pdf
and partition_via_api
detect automatically language(s) of a PDF?
enhancement
#2288
opened Dec 18, 2023 by
piegu
Adding a progress bar when partitioning pdfs
enhancement
New feature or request
#2351
opened Jan 4, 2024 by
TheoLvs
feat/Option to flatten metadata extraction
enhancement
New feature or request
#2432
opened Jan 19, 2024 by
ron-unstructured
Add possibility to deactivate OCR
enhancement
New feature or request
#2467
opened Jan 29, 2024 by
thomascerbelaud
feat(docx): detect OutlineLevel of paragraph style and use for computing section hierarchy
enhancement
New feature or request
#2470
opened Jan 29, 2024 by
scanny
feat/clean_newline
enhancement
New feature or request
good first issue
Good for newcomers
#2513
opened Feb 6, 2024 by
manuelrech
bug/group_bullet_paragraph causes problems by returning a list
bug
Something isn't working
langchain
#2547
opened Feb 13, 2024 by
rchen19
ocr metadata
enhancement
New feature or request
ocr
Related to optical character recognition (OCR).
#2568
opened Feb 21, 2024 by
hakankaraoguz
feat/Use local model for hi_res partition
enhancement
New feature or request
models
#2631
opened Mar 11, 2024 by
AntoninLeroy
feat/ extract style or font for Text elements.
enhancement
New feature or request
pdf
#2695
opened Mar 26, 2024 by
LunaticMaestro
feat/add New feature or request
pdf
extract_image_block_output_dir
to partition_via_api
enhancement
#2833
opened Apr 2, 2024 by
awalker4
Chunk overlap prefix is on even word boundary >= overlap character count.
chunking
Related to element chunking.
enhancement
New feature or request
#2886
opened Apr 12, 2024 by
scanny
Enhancement: better element ID's
enhancement
New feature or request
#2461
opened Jan 26, 2024 by
cragwolfe
Previous Next
ProTip!
Adding no:label will show everything without a label.