A collection of very useful scripts containing various algorithms.
These scripts are provided as-is. There is no guarantee that they will work. You will need to understand them to use them in your projects.
Please use issues for requests, fixes, suggestions, and new scripts you want to share.
- Script Locators
- Field Formatters
- Validation Rules
- Zones
- Tables
- Locator Customization
- Text Content Locator
- Databases & Dictionaries
- Classification
- Page Extraction and Classification
- Geometry
- OCR
- Powerful Functions
- Output
- Benchmarking
- Images
- Documents
- Batches
- Project Manipulation
- File System functions
- Column Locator detects text columns in a document
- Dynamic Fuzzy Search Locator POWERFUL fuzzy search a document for values from a previous locator!
- Compare 2 documents POWERFUL script that detects all differences between two documents
- NLP (Natural Language Prcessing)
- Passport MRZ Locator
- Run Previous Locators from Script VERY POWERFUL your script locators now know which locators they are dependent on and run then on-demand only if needed, saving you valuble time. Just press Test on the locator and everything is automatically calculated!
- UK VAT Locator look up VAT id's online at UK government. Only works inside UK.
- Webservice
- Scripting Field Formatters
- Fuzzy Field Formatter useful to make a spellchecker!
- Name Suggestor Demo
- UK VAT Formatter
- Fuzzy Validation Rule useful for finding unusual spellings and suggesting potential corrections
- Move Zones by Script
- Perform Zone OCR in script
- Register Zones on difficult pages.
- Automatically Generate Zone Locators from external coordinate data
- How to Use Table Locators
- Table Benchmark Guide
- Advanced Table Locator Guide (new locator in 2023)
- Copy Zones into to Table
- Copy Subfields into a Table
- Fast Table Lassoing quickly and interactively select table columns and rows in the Validation Interface
- 3-way Line Item Matching demo a complete project showing Line Item Matching Locator, 3-way matching and interactive SQL database lookup in Validation
- Table Detection by Gridlines
- Table Extraction by Regex
- Table Header Pack Parser
- Insert Missing Rows into a Table automatically finds missing rows that the table locator missed
- Force Table Locator to use a particular algorithm the table locator has 5 internal algorithms that are all run and voted against. Here you decide which algorithm wins always
- Validate Table Rows with a Fuzzy Database
- Write Table to CSV
- Table Scripting Framework a powerfu& generic approach to enhance table locators
- Reading unknown table layouts and tables in tables Powerful new algorithms for automatically analyzing unknown tables layouts, including tables in tables.
- How to customize any locator
- Force Format Locator to search across multiple lines the format locator only searches within each line of text. This makes it search further..
- Make a locator multiline.
- Fuzzy search a database from script
- Database script functions
- Fuzzy search a dictionary
- update database per document POWERFUL changes a fuzzy database instantly per document. If you know who the document is from you can search ONLY for their address, phone number, date of birth - the database will contain no-one else
- Fuzzy Dictionary Substitution POWERFUL fuzzy search a document for words/phrases and return associated fields for these values
- Fast Table Lassoing demo video and script quickly and interactively select table columns and rows in the Validation Interface
- Must-read guide to classification. Everything you need to know to prepare your classification project.
- Convert your Xdocuments to text files XDoc2text
- Custom Classification
- Page Classification
- Page Locators VERY POWERFUL * write locators at the page level*
- Paragraph Classification
- String Classification VERY POWERFUL classify any string, even a word or phrase!
- Text Layout Classification VERY POWERFUL a completely new classification strategy. No configuration required. It classifies a page based on the position of every word on the page. It is very sensitive to subtle changes between similar documents. If your forms only vary slightly, this will detect that!
- Find a blank space in a document for imprinting a barcode or signature.
- Calculate Overlaps of fields, zones, rows etc. Fundamental to many geometry algorithms and custom table locators.
- Find Left Margin of a Page very precise and fuzzy with sub-pixel accuracy for the left margin of a page. Useful for comparing two pages and paragraph detection
- Change_OCR_Characters.md
- Use Microsoft Document Imaging as OCR engine and for feeding locators.
- JSON Parser fully compliant JSON parser
- JSON Table Exporter Script to export all tables in a document in JSON format.
- Field Copy VERY POWERFUL This is the most important KT script! intelligently & recursively copy a field, locator, alternative, subfield, cell, row, xdoc into another. This script will dramitically simplify your own scripts and make them much more readable.
- File System Get All files, File_Exists, Dir_Exists, File_NameWithoutExtension etc
- Sorting Alternatives
- Fuzzy Match Text VERY POWERFUL fuzzy match any two pieces of text. 0%=no match, 100%=exact match
- IBAN validation
- JSON PArser fully compliant JSON parser
- Quicksort VERY POWERFUL sort alternatives fast by confidence, alphabetically, coordinates, page, textline, etc.
- String Regex *Split a string via regex. eg "2004-12-23" into "2004","12","23"
- Numbers to Text Convert numbers to text eg "1234" to "one thousand two hundred and thirty four". Useful for checking that numbers match their text form
- Write Fields to CSV
- Write Table to CSV
- Write Fields to Excel including colors, formats, images and more!
- Detect Page Size detects whether a page is A4, A3, US Letter, Foolscap, etc. Landscape vs Portrait. Works well on cropped images too
- Image Cleanup and Custom OCR Use VRS and remove lines and dots before OCR.
- Compare 2 documents POWERFUL script that detects all differences between two documents
- Text Deskew If a document is not deskewed before or during OCR the textlines can be messed up. This calculates the page skew AFTER OCR and then realigns all the words into their correct text lines.
- Convert PDF to TIFF VERY POWERFUL convert your PDF samples to TIFF while preserving the Text layer. Speeds locator testing x10 !
- Gibberish/Nonsense/Bad OCR Detection check if a document is mostly unreadable OCR or corrupted/encrpyted PDF. Useful for language detection as well
- How to read Russian Invoices
- Merge documents based on field value. If two documents have the same field value, then merge them. KTM Only.
These are advanced scripting techniques to access to project and locator settings via script. This gives you the power to create, delete and edit classses, fields, locators, and almost any setting in the project. This is very dangerous and can destroy your projects. Also note that the Project Builder will not be updated with changes you make to the project and will cause GUI errors. Tread carefully and - you are on your own - don't expect support from Tech Support!