Skip to content

Differential Binarization model #2095

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 92 commits into
base: master
Choose a base branch
from

Conversation

mehtamansi29
Copy link
Collaborator

@mehtamansi29 mehtamansi29 commented Feb 12, 2025

@sachinprasadhs sachinprasadhs added the WIP Pull requests which are work in progress and not ready yet for review. label Apr 11, 2025
Copy link
Collaborator

@sachinprasadhs sachinprasadhs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took high level pass and left some comments.
Also,
Make al the file names in follow the same format like other files, for db_utils and losses.py

hertschuh and others added 25 commits July 22, 2025 12:14
The inputs to `generate` are `"prompts"`, not `"text"`.

Fixes keras-team#1685
* routine HF sync

* code reformat
Bumps the python group with 2 updates: torch and torchvision.


Updates `torch` from 2.6.0+cu126 to 2.7.0+cu126

Updates `torchvision` from 0.21.0+cu126 to 0.22.0+cu126

---
updated-dependencies:
- dependency-name: torch
  dependency-version: 2.7.0+cu126
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: python
- dependency-name: torchvision
  dependency-version: 0.22.0+cu126
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: python
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Modify TransformerEncoder masking documentation

* Added space before parenthesis
* Fix Mistral conversion script

This commit addresses several issues in the Mistral checkpoint conversion script:

- Adds `dropout` to the model initialization to match the Hugging Face model.
- Replaces `requests.get` with `hf_hub_download` for more reliable tokenizer downloads.
- Adds support for both `tokenizer.model` and `tokenizer.json` to handle different Mistral versions.
- Fixes a `TypeError` in the `save_to_preset` function call.

* address format issues

* adopted to latest hub style

* address format issues

---------

Co-authored-by: laxmareddyp <laxmareddyp@laxma-n2-highmem-256gbram.us-central1-f.c.gtech-rmi-dev.internal>
Updates the requirements on [tensorflow-cpu](https://github.com/tensorflow/tensorflow), [tensorflow](https://github.com/tensorflow/tensorflow), [tensorflow-text](https://github.com/tensorflow/text), torch, torchvision and [tensorflow[and-cuda]](https://github.com/tensorflow/tensorflow) to permit the latest version.

Updates `tensorflow-cpu` to 2.19.0
- [Release notes](https://github.com/tensorflow/tensorflow/releases)
- [Changelog](https://github.com/tensorflow/tensorflow/blob/master/RELEASE.md)
- [Commits](tensorflow/tensorflow@v2.18.1...v2.19.0)

Updates `tensorflow` to 2.19.0
- [Release notes](https://github.com/tensorflow/tensorflow/releases)
- [Changelog](https://github.com/tensorflow/tensorflow/blob/master/RELEASE.md)
- [Commits](tensorflow/tensorflow@v2.18.1...v2.19.0)

Updates `tensorflow-text` to 2.19.0
- [Release notes](https://github.com/tensorflow/text/releases)
- [Commits](tensorflow/text@v2.18.0...v2.19.0)

Updates `torch` from 2.7.0+cu126 to 2.7.1+cu126

Updates `torchvision` from 0.22.0+cu126 to 0.22.1+cu126

Updates `tensorflow[and-cuda]` to 2.19.0
- [Release notes](https://github.com/tensorflow/tensorflow/releases)
- [Changelog](https://github.com/tensorflow/tensorflow/blob/master/RELEASE.md)
- [Commits](tensorflow/tensorflow@v2.18.0...v2.19.0)

---
updated-dependencies:
- dependency-name: tensorflow-cpu
  dependency-version: 2.19.0
  dependency-type: direct:production
  dependency-group: python
- dependency-name: tensorflow
  dependency-version: 2.19.0
  dependency-type: direct:production
  dependency-group: python
- dependency-name: tensorflow-text
  dependency-version: 2.19.0
  dependency-type: direct:production
  dependency-group: python
- dependency-name: torch
  dependency-version: 2.7.1+cu126
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: python
- dependency-name: torchvision
  dependency-version: 0.22.1+cu126
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: python
- dependency-name: tensorflow[and-cuda]
  dependency-version: 2.19.0
  dependency-type: direct:production
  dependency-group: python
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* init

* update

* bug fixes

* add qwen causal lm test

* fix qwen3 tests
* support flash-attn at torch backend

* fix

* fix

* fix

* fix conflit

* fix conflit

* fix conflit

* fix conflit

* fix conflit

* fix conflit

* format
* init: Add initial project structure and files

* bug: Small bug related to weight loading in the conversion script

* finalizing: Add TIMM preprocessing layer

* incorporate reviews: Consolidate stage configurations and improve API consistency

* bug: Unexpected argument error in JAX with Keras 3.5

* small addition for the D-FINE to come: No changes to the existing HGNetV2

* D-FINE JIT compile: Remove non-essential conditional statement

* refactor: Address reviews and fix some nits
* Register qwen3 presets

* fix format
@sachinprasadhs sachinprasadhs removed the WIP Pull requests which are work in progress and not ready yet for review. label Aug 7, 2025
Copy link
Collaborator

@sachinprasadhs sachinprasadhs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added my review comment, in the issue description add the original implementation, reference paper, training colab and end to end working demo of the trained model including post processing.

Comment on lines +39 to +42
if head_kernel_list is None:
head_kernel_list = [3, 2, 2]
if image_shape is None:
image_shape = (640, 640, 3)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this, head_kernel_list you can add it to default argument if it is common for all the implementations, else this can be part of the config as per the specific checkpoint.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added head_kernel_list like this, as gemini-code-reveiw suggested(Using mutable default arguments like lists or tuples is a common pitfall in Python and can lead to unexpected behavior).
I updated this part with config. Next commit able to see the changes.

)(topdown_p2)
featuremap_p4 = layers.UpSampling2D((4, 4), dtype=dtype)(featuremap_p4)
featuremap_p3 = layers.UpSampling2D((2, 2), dtype=dtype)(featuremap_p3)
featuremap_p2 = layers.UpSampling2D((1, 1), dtype=dtype)(featuremap_p2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Address this comment, it can be removed since it doesn't do anything

Comment on lines +37 to +39
run_mixed_precision_check=False,
run_quantization_check=False,
run_data_format_check=False,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enable these tests

Comment on lines +16 to +19
if backend.backend() == "jax":
pytest.skip(
"JAX backend does not support this test due to NaN issues."
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be you should investigate and make it work on JAX as well.

Comment on lines +21 to +22
The probability map output generated by `predict()` can be translated into
polygon representation using `model.postprocess_to_polygons()`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this postprocess_to_polygons is not implemented, you need to update the docstring accordingly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apply the changes with relevant doc string. Next commit able to see the changes

Comment on lines +46 to +52
`map_output` now holds a 8x224x224x3 tensor, where the last dimension
corresponds to the model's probability map, threshold map and binary map
outputs. Use `postprocess_to_polygons()` to obtain a polygon
representation:
```python
detector.postprocess_to_polygons(map_output[...,0])
```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here.

import keras


def Polygon(coords):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow snake casing

Comment on lines +34 to +35
image_size=(640, 640),
annotation_size=(640, 360),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are we using these two arguments?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove annotation_size argument which I missed it. As earlier I was using diffbin_utils function with image text data preprocessor. Next commit able to see the changes



@keras_hub_export("keras_hub.models.ImageTextDetectorPreprocessor")
class ImageTextDetectorPreprocessor(Preprocessor):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any difference between ImageClassifierPreprocessor and this, if it is used only for preprocessing images, may be you can just use ImageClassifierPreprocessor

Copy link
Collaborator Author

@mehtamansi29 mehtamansi29 Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because earlier I was using diffbin_utils functions(some functions like cv2,shapely created with keras.ops). But this was giving not accurate result as with cv2 and shapely result. So I used this ImageClassifierPreprocessor.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we can subclass ImageClassifierPreprocessor, like other vision models

@sachinprasadhs sachinprasadhs added the kokoro:force-run Runs Tests on GPU label Aug 7, 2025
@kokoro-team kokoro-team removed the kokoro:force-run Runs Tests on GPU label Aug 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.