I am a PhD candidate at the University of Cambridge focusing on unifying data understanding and data generation through latent-space modeling with Diffusion-based Generative and Language Models (DLMs). My work develops principled frameworks that bridge representation learning and controllable generation across heterogeneous data regimes: discrete (natural language), mixed-type tabular, structured relational/graph data, and emerging multimodal combinations. A central theme is designing architectures and training objectives that faithfully capture semantics, uncertainty, structure, and cross-domain correspondences while remaining computationally and statistically efficient.
- Latent Space Unification: Joint embedding/decoding frameworks for heterogeneous (text, tabular, graph, multimodal) data
- Diffusion + Language Model Hybrids (DLMs): Integrating discrete token modeling with DiT-based methods.
- Mixed-Type & Tabular DATA Generation: Generative handling of continuous, categorical, ordinal, and sparse relational fields
- Multimodal Alignment: Cross-domain latent factorization and conditional synthesis
- Private Graph Generation: Graph Distillation
- Privacy & Decentralization: Federated / decentralized training
Contact: Academic Homepage