-
UC Berkeley
- Berkeley, California
- https://tonylian.com/
- in/longlian
- @LongTonyLian
Highlights
Stars
Official inference repo for FLUX.1 models
[ECCV 2024] ControlCap: Controllable Region-level Captioning
[CVPR 24] The repository provides code for running inference and training for "Segment and Caption Anything" (SCA) , links for downloading the trained model checkpoints, and example notebooks / gra…
Official repository for paper MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning(https://arxiv.org/abs/2406.17770).
A minimalist, open source online pastebin where the server has zero knowledge of pasted data. Data is encrypted/decrypted in the browser using 256 bits AES.
SGLang is a fast serving framework for large language models and vision language models.
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
[ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of the Open World"
When do we not need larger vision models?
[ECCV 2024, Oral] DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors
CLAIR: A (surprisingly) simple semantic text metric with large language models.
We release the DaTaSeg Objects365 Instance Segmentation Dataset introduced in the DaTaSeg paper, which can be used as an evaluation benchmark for weakly or semi supervised segmentation.
GRiT: A Generative Region-to-text Transformer for Object Understanding (https://arxiv.org/abs/2212.00280)
The official homepage of the COCO-Stuff dataset.
[ECCV 2024] Official implementation of the paper "Semantic-SAM: Segment and Recognize Anything at Any Granularity"
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
This is the official repository of our paper "What If We Recaption Billions of Web Images with LLaMA-3 ?"
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
This repo contains documentation and code needed to use PACO dataset: data loaders and training and evaluation scripts for objects, parts, and attributes prediction models, query evaluation scripts…
[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (PRG)
VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
A guidance language for controlling large language models.
Improved Implementation for Training GLIGEN: Open-Set Grounded Text-to-Image Generation
[CVPR 2024] Code release for "Unsupervised Universal Image Segmentation"