Stars
A modern Python REST client for Apache Tika server
thf24 / tika-client
Forked from stumpylog/tika-clientA modern Python REST client for Apache Tika server
Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the official repository for the PubTables-1M dataset and GriTS ev…
docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
Parsee's PDF reader, specialized on the extraction of tables with numeric values and the accurate extraction and preservation of text-paragraphs. Full support for scans and images.
Datasets, case studies and benchmarks for extracting structured information from PDFs, HTML files or images, created by the Parsee.ai team. Datasets also on Hugging Face: https://huggingface.co/par…
Retrieval of fully structured data made easy. Use LLMs or custom models. Specialized on PDFs and HTML files. Extensive support of tabular data extraction and multimodal queries.
Spring Batch examples in Kotlin (from simple to advanced)
Vue 3 compatible drag-and-drop component based on Sortable.js
AngularJS fixed header scrollable table directive
Makes 'SimFin' data (https://simfin.com/) easily accessible in R.
Tutorials for SimFin - Simple financial data for Python
Convert SimFin data set into quarterly table format with respect to daily data
Search engine implementing a web crawler, fuzzy search and a simple GUI. 1st semester project
Community maintained fork of pdfminer - we fathom PDF
Some examples how the web-API can be used to retrieve data from SimFin.
Python PDF Parser (Not actively maintained). Check out pdfminer.six.
Headless chrome/chromium automation library (unofficial port of puppeteer)
JavaScript API for Chrome and Firefox
Module that provides AngularJS-directives for formatting, validating and working with payments