Skip to content

graeps/tedrasim-vlm

Repository files navigation

TEDRASIM – VLM Approach

TEDRASIM is a research demonstrator from a larger project exploring automated analysis of technical drawings.

Traditionally, technical drawings are created first in order to manufacture or model a three-dimensional object. TEDRASIM investigates the reverse direction: starting from a 3D scan or a photo of an object, the system automatically derives the corresponding technical drawing.

The overall TEDRASIM project explores two complementary approaches:

  1. Mesh-based pipeline – extracting projections and feature edges directly from 3D scan data
  2. Vision–Language Model (VLM) approach – generating 3D models from textual descriptions using fine-tuned multimodal models

This repository contains the machine learning–based VLM approach.

The geometry-based pipeline is implemented in a separate repository:

Note: This repository contains a simplified public version of the project; parts of the original pipeline and the core technical drawing similarity search system are not included.


Overview

The goal of this module is to generate technical drawings from images using Visual Language Models (VLMs) fine-tuned on a custom dataset.

The project currently includes:

  1. Synthetic dataset generation
  2. Fine-tuning of a VLM
  3. A pipeline that converts images into 3D models
  4. TODO: Pipeline that convers 3D models into technical drawings

You can find the dataset used for the fine-tuning here.


Target Pipeline

The approach relies on the deliberately constrained nature of the dataset, which was designed for a hands-on experimental setup in a lab environment. All objects are constructed from a finite set of geometric primitives.

These primitives can be represented as a structured JSON scene graph that describes object types and their spatial relationships.

This JSON representation acts as the intermediate representation between the VLM and the geometric solver.

Target pipeline:

Photo of an object
→ VLM
→ structured JSON scene
→ solver
→ 3D model
→ technical drawing

Please see the notebooks for an example.


Project Structure

tedrasim_vlm/
├── notebooks/              # pipeline demos and fine-tuning experiments
├── data_example/           # example dataset structure
│   ├── raw_data/
│   └── training/
│       ├── real_dataset/
│       └── synthetic_dataset/
│
├── src/tedrasim/
│   ├── json_to_3dmodel_pipeline/   # JSON → 3D model solver
│   ├── synth_dataset_generation/   # scene generation + rendering
│   ├── gui_apps/                   # annotation / evaluation tools
│   └── viz/                        # visualization utilities
│
├── assets/prompts/         # prompt templates for VLM training/inference
├── research/               # experimental notes and references
├── pyproject.toml
├── uv.lock
└── README.md

Fine-tuning

The base model is
InternVL_3.5_8B, which was finetuned using the notebook notebooks/finetune_internvl_3_5.ipynb, following a LoRA finetuning approach on the LLM part of the model on our custom dataset (TODO: Link to Huggingface).

Fine-tuning was based on the tutorial:
https://github.com/Arseny5/InternVL-3.5-QLoRA-Fine-tune


GUI Applications

gui_apps/ contains small applications for experimenting with the pipeline.

They allow:

  • sending requests to different models
  • visualizing predicted JSON scenes
  • interactively inspecting generated 3D models

Installation

The project uses uv for dependency management.

Create a virtual environment:

uv venv
source .venv/bin/activate

Install dependencies:

uv sync

Requirements

  • Python 3.10+
  • PyTorch
  • HuggingFace Transformers

Optional:

  • Blender is required for synthetic dataset generation (scene rendering)

About

VLM-based pipeline that reconstructs structured 3D models from images via JSON scene graphs

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors