-
Does it work for scanned PDF documents?
Yes, Vision Parse is specifically designed to handle scanned PDF documents effectively. It uses advanced Vision LLMs to extract text, tables, images, and LaTeX equations from both regular and scanned PDF documents with high precision.
-
I am facing latency issues while running llama3.2-vision locally. How can I improve the performance of locally hosted vision models?
This is a known limitation with locally hosted Ollama models. Here are some solutions:
- Use API-based Models: For better performance, consider using API-based models like OpenAI, DeepSeek, or Gemini, which are significantly faster and more accurate.
- Enable Concurrency: Set
enable_concurrency
toTrue
so that multiple pages are processed in parallel, thereby reducing latency. You can also increase the value ofOLLAMA_NUM_PARALLEL
to maximize the number of pages that can be processed in parallel. - Disable Detailed Extraction: Disable the
detailed_extraction
parameter for simpler PDF documents, which can improve latency.
-
The llama3.2-vision:11b model was hallucinating and unable to extract content accurately from the PDF document. How can I improve the extraction accuracy of locally hosted vision models?
To improve extraction accuracy with the llama3.2-vision:11b model:
- Adjust Model Parameters: Lower the
temperature
andtop_p
for more deterministic outputs and to reduce hallucinations. - Define Custom Prompts: By defining custom prompts according to your document structure, you can guide the model to extract content more accurately.
- Enable Detailed Extraction: Enabling
detailed_extraction
will help the Vision LLM detect the presence of images, LaTeX equations, structured, and semi-structured tables, and then extract them with high accuracy. - Consider Using Alternative Models: Try API-based models like gpt-4o or gemini-1.5-pro for better accuracy and performance. Avoid using smaller models that are prone to hallucination.
- Adjust Model Parameters: Lower the
-
What are the recommended values for model parameters such as temperature, top_p, etc., to improve extraction accuracy?
Here are the recommended values for model parameters to improve extraction accuracy:
- Set
temperature
to 0.7 andtop_p
to 0.5. - For Ollama models, increase
num_ctx
to 16384 andnum_predict
to 8092 (depending on the model size) and setrepeat_penalty
to 1.3. - For OpenAI models, increase
max_tokens
to 8192 (depending on the model size) and setfrequency_penalty
to 0.3.
Note: The recommended values are generic and may need to be adjusted based on your document structure and the model's capabilities.
- Set