|
Zhongbin IceInPot Guo
My research focuses on developing interpretable multimodal models with real-world applications. I actively explore both theoretical frameworks and empirical findings, with specific research interests in:
- Unified Multimodal Models: Understanding-Generation synergy and cross-modality reasoning.
- Spatio-Temporal Perception & Actions: Spatio-temporal perception and VLAs.
- Interpretability & Trustworthiness: Interpretability and trustworthiness of multimodal models in real-world applications.
I am actively seeking Ph.D. opportunities and visiting scholar positions. If you are interested in my research or potential collaboration, please feel free to reach out!
Email /
Scholar /
Github
|
|
|
I'm a M.S. student at Beijing Institute of Technology in Beijing, China, advised by Ping Jian. I received my B.S. degree in Artificial Intelligence also from BIT. During my undergraduate studies, I interned at 4Paradigm, Meituan M17 Team and now at ByteDance Commercial AI Team, working on (Unified) Multimodal Foundation Models Pretraining and related research.
I'm always glad to collaborate with graduate/undergraduate students. Please drop me an email if you want to work with me. Feel free to reach out if you're interested in my research.
|
News
|
Jan' 26
|
TAMMs has been accepted by ICLR 2026, thanks to all co-authors!
|
|
Jan' 7
|
We drop SiT-Bench for evaluating Pure-Text spatial reasoning ablities.
|
|
Nov' 25
|
We present GEODE, a light-weight architecture that decouples spatial reasoning from numerical generation for enhanced spatial intelligence.
|
|
Sep' 25
|
We present TAMMs, a unified framework for Temporal Change Description and Future Satellite Image Forecasting.
|
|
Publications and Preprints
|
|
* denotes Equal Contributions and Project Lead; † indicates Corresponding Author.
|
|
|
Can LLMs See Without Pixels? Benchmarking Spatial Intelligence from Textual Descriptions
Zhongbin Guo*, Zhen Yang*, Yushan Li, Xinyue Zhang, Wenyu Gao, Jiacheng Wang, Chengzhi Li, Xiangrui Liu, Ping Jian†
arXiv, 2026
Paper /
Code /
Benchmark(Coming Soon)
|
|
|
Beyond Flatlands: Unlocking Spatial Intelligence by Decoupling 3D Reasoning from Numerical Regression
Zhongbin Guo, Jiahe Liu, Yushan Li, Wenyu Gao, Zhen Yang, Chengzhi Li, Xinyue Zhang, Ping Jian†
arXiv, 2025
Paper /
Code(Coming Soon)
|
|
|
TAMMs: Temporal-Aware Multimodal Model for Satellite Image Change Understanding and Forecasting
Zhongbin Guo, Yuhao Wang, Ping Jian†, Chengzhi Li, Xinyue Chen, Zhen Yang, Ertai E
ICLR 2026
Paper /
Datasets
|
Honors and Awards
|
- Outstanding Student of Beijing Institute of Technology, 2025
- Outstanding Student Leader of Beijing Institute of Technology, 2024
- Second Prize Scholarship of Beijing Institute of Technology (Four times)
- Special Prize of "Challenge Cup" Competition, Beijing 2025
|
Misc
If you are curious about my ID, iceinpot actually comes from a pun on my Chinese name Zhongbin Guo — in Chinese "Ice in Pot" (锅中冰) is a perfect homophone for my name!
I serve as the President and Guzheng section leader of the BIT Folk Orchestra. My all-time favorite piece is Battling the Typhoon (战台风). You're more than welcome to watch our performance video of Qian Li (千里) in BRICS Partnership on New Industrial Revolution Innovation Center 2025 Spring Festival Reception.
Outside of research, I have a wide range of interests including badminton, basketball, skiing, and traveling.
I am also an avid fan of various sports and competitive games, following everything from F1 (cheering for both Max and Lewis 🏎️), NBA (Lakers fan 💜💛), Premier League (dreaming of visiting Anfield one day ⚽) to Tennis (supporting both Djokovic and Alcaraz 🎾) and CS2 (supporting m0NESY 🎮).
|
© Zhongbin Guo. All rights reserved for content and custom design.
Base template by Jon Barron.
|
|