|
Zhongbin IceInPot Guo
My research focuses on developing interpretable multimodal models with real-world applications. I actively explore both theoretical frameworks and empirical findings, with specific research interests in:
- Unified Multimodal Models: Understanding-Generation synergy and cross-modality reasoning.
- Spatio-Temporal Perception & Actions: Spatio-temporal perception and VLAs.
- Interpretability & Trustworthiness: Interpretability and trustworthiness of multimodal models in real-world applications.
I am actively seeking Ph.D. opportunities and visiting scholar positions. If you are interested in my research or potential collaboration, please feel free to reach out!
Email /
Scholar /
Github /
RedNote
|
|
|
I'm a B.S. student at Beijing Institute of Technology in Beijing, China, advised by Ping Jian. During my undergraduate studies, I interned at 4Paradigm, Meituan M17 Team and now at ByteDance Commercial AI Team, working on (Unified) Multimodal Foundation Models Pretraining and related research.
I'm always glad to collaborate with graduate/undergraduate students. Please drop me an email if you want to work with me. Feel free to reach out if you're interested in my research.
|
News
|
Apr' 26
|
Three papers including SiT-Bench have been accepted by ACL 2026, with 1 Main-Conference Paper and 2 Findings, more details coming soon!
|
|
Mar' 26
|
We drop LongCat-Next, a unified multimodal model which natively lexicalizes vision and audio as discrete tokens under a single autoregressive objective, achieving competitive performance with both specialized understanding and generation models, a work where I participated in during my internship at M17 Team.
|
|
Feb' 26
|
TACS has been accepted by CVPR 2026.
|
|
Jan' 26
|
TAMMs has been accepted by ICLR 2026, thanks to all co-authors!
|
|
Jan' 26
|
We drop SiT-Bench for evaluating Pure-Text spatial reasoning ablities.
|
|
Nov' 25
|
We present GEODE, a light-weight architecture that decouples spatial reasoning from numerical generation for enhanced spatial intelligence.
|
|
Sep' 25
|
We present TAMMs, a unified framework for Temporal Change Description and Future Satellite Image Forecasting.
|
|
Publications and Preprints
|
|
* denotes Equal Contributions and Project Lead; † indicates Corresponding Author.
|
|
|
How Do LLMs and VLMs Understand Viewpoint Rotation Without Vision? An Interpretability Study
Zhen Yang, Ping Jian†, Zhongbin Guo, Zuming Zhang, Chengzhi Li, Yonghong Deng, Xinyue Zhang, Wenpeng Lu
ACL 2026 (Main Conference)
Paper /
Code
|
|
|
LongCat-Next: Lexicalizing Modalities as Discrete Tokens
Meituan LongCat Team (Participated during Zhongbin Guo's internship at M17 Team)
arXiv, 2026 | Technical Report
Paper /
Code /
Model
|
|
|
Understanding Temporal Logic Consistency in Video-Language Models through Cross-Modal Attention Discriminability
Chengzhi Li, Heyan Huang, Ping Jian†, Zhen Yang, Yaning Tian, Zhongbin Guo
CVPR 2026
|
|
|
Can LLMs See Without Pixels? Benchmarking Spatial Intelligence from Textual Descriptions
Zhongbin Guo*, Zhen Yang*, Yushan Li, Xinyue Zhang, Wenyu Gao, Jiacheng Wang, Chengzhi Li, Xiangrui Liu, Ping Jian†
ACL 2026 (Findings)
Paper /
Code /
Benchmark(Coming Soon)
|
|
|
Beyond Flatlands: Unlocking Spatial Intelligence by Decoupling 3D Reasoning from Numerical Regression
Zhongbin Guo, Jiahe Liu, Yushan Li, Wenyu Gao, Zhen Yang, Chengzhi Li, Xinyue Zhang, Ping Jian†
arXiv, 2025
Paper /
Code(Coming Soon)
|
|
|
TAMMs: Temporal-Aware Multimodal Model for Satellite Image Change Understanding and Forecasting
Zhongbin Guo, Yuhao Wang, Ping Jian†, Chengzhi Li, Xinyue Chen, Zhen Yang, Ertai E
ICLR 2026
Paper /
Datasets
|
Honors and Awards
|
- Outstanding Student of Beijing Institute of Technology, 2025
- Outstanding Student Leader of Beijing Institute of Technology, 2024
- Second Prize Scholarship of Beijing Institute of Technology (Five times)
- Special Prize of "Challenge Cup" Competition, Beijing 2025
|
Misc
If you are curious about my ID, iceinpot actually comes from a pun on my Chinese name Zhongbin Guo — in Chinese "Ice in Pot" (锅中冰) is a perfect homophone for my name!
I serve as the President and Guzheng section leader of the BIT Folk Orchestra. My all-time favorite piece is Battling the Typhoon (战台风). You're more than welcome to watch our performance video of Qian Li (千里) in BRICS Partnership on New Industrial Revolution Innovation Center 2025 Spring Festival Reception.
Outside of research, I have a wide range of interests including badminton, basketball, skiing, and traveling.
I am also an avid fan of various sports and competitive games, following everything from F1 (cheering for both Max and Lewis 🏎️), NBA (Lakers fan 💜💛), Premier League (dreaming of visiting Anfield one day ⚽) to Tennis (supporting both Djokovic and Alcaraz 🎾) and CS2 (supporting m0NESY 🎮).
|
© Zhongbin Guo. All rights reserved for content and custom design.
Base template by Jon Barron.
|
|