Skip to content
View duoan's full-sized avatar

Block or report duoan

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
duoan/README.md

👋 Hi, I'm Duo An (Victor)

Senior Machine Learning Engineer @ Amazon AGI
Scaling multimodal foundation models — optimizing how they learn, generalize, and align through data and systems co-design.


🧭 About

I work at the intersection of machine learning and distributed systems,
designing large-scale learning pipelines and multimodal data systems that improve how foundation models learn from vast, diverse signals.

My focus areas:

  • 🧠 Training dynamics & optimization — improving convergence, stability, and efficiency of large-scale multimodal models
  • 🧩 Learning-centric systems — integrating data, architecture, and feedback to enhance representation learning and model alignment
  • ⚙️ Scalable orchestration — leveraging Ray, Spark, and Kubernetes to parallelize multimodal workloads across thousands of GPUs
  • 🔍 Evaluation & feedback loops — automating model-driven data refinement and continual quality signals for alignment and adaptation

My work centers on how models learn, not just how they’re trained.


🧰 Core Stack

Machine Learning

PyTorch Transformers Ray FAISS DeepSpeed

Systems & Infra

AWS Spark Kubernetes CDK Docker

Languages

Python Scala Rust C++


⚖️ Principles

1. Models and systems co-evolve.
The best architectures emerge when data, compute, and learning dynamics are designed together.

2. Scale reveals behavior.
Many learning problems only appear — and can only be solved — at massive scale.

3. Data is part of the model.
Every batch defines what the model becomes.


📊 Snapshot

duoan's github stats Top Langs


🌐 Connect

Linkedin Badge Gmail Badge


“At scale, learning is a systems problem — and every system is a hypothesis about how intelligence forms.”
— Duo An

Popular repositories Loading

  1. mini-rpc mini-rpc Public

    Spring + Netty + Protostuff + ZooKeeper 实现了一个轻量级 RPC 框架,使用 Spring 提供依赖注入与参数配置,使用 Netty 实现 NIO 方式的数据传输,使用 Protostuff 实现对象序列化,使用 ZooKeeper 实现服务注册与发现。使用该框架,可将服务部署到分布式环境中的任意节点上,客户端通过远程接口来调用服务端的具体实现,让服务…

    Java 235 143

  2. OpenKettleWebUI OpenKettleWebUI Public archive

    一款基于kettle的数据处理web调度控制平台,支持文档资源库和数据库资源库,通过web平台控制kettle数据转换,可作为中间件集成到现有系统中

    Java 152 96

  3. ijcai18-mama-ads-competition ijcai18-mama-ads-competition Public

    IJCAI-18 阿里妈妈搜索广告转化预测初赛方案

    Jupyter Notebook 74 22

  4. codes-scratch-crawler codes-scratch-crawler Public

    读书笔记《自己动手写网络爬虫》,自己敲的代码。主要记录了网络爬虫的基本实现,网页去重的算法,网页指纹算法,文本信息挖掘

    Java 47 22

  5. codes-scratch-zookeeper-netty codes-scratch-zookeeper-netty Public

    zk + netty 实现集群节点文件同步服务

    Java 32 26

  6. codes-scratch-akka codes-scratch-akka Public

    akka学习理解,使用了maven、sbt两种构建方式,同时使用量java和scala两种语言实现。akka入门,清晰理解akka流程

    Java 13 10