For the most up-to-date list of my papers, click here.
For resources and datasets, click here.
LLM for Forecasting:
[2025] FOReCAst: The Future Outcome Reasoning and Confidence Assessment Benchmark. Zhangdie Yuan, Zifeng Ding, Andreas Vlachos. [Paper] [Resource]
[2024] PRobELM: Plausibility Ranking Evaluation for Language Models. Zhangdie Yuan, Eric Chamoun∗, Rami Aly∗, Chenxi Whitehouse∗, Andreas Vlachos. [Paper] [Resource]
LLM for Factuality:
[2025] Capturing Symmetry and Antisymmetry in Language Models through Symmetry-Aware Training Objectives. Zhangdie Yuan, Andreas Vlachos. [Paper] [Resource]
[2023] Zero-Shot Fact-Checking with Semantic Triples and Knowledge Graphs. Zhangdie Yuan, Andreas Vlachos. [Paper] [Resource]
[2022] Can Pretrained Language Models (Yet) Reason Deductively? Zhangdie Yuan∗, Songbo Hu∗, Ivan Vulić, Anna Korhonen and Zaiqiao Meng. [Paper] [Resource]
[2022] Varifocal Question Generation for Fact-Checking. Nedjma Djouhra Ousidhoum∗, Zhangdie Yuan∗ and Andreas Vlachos. [Paper] [Resource]
LLM for Healthcare:
[2025] Large Language Models Are Almost Good Medical Coders Limitations, Advances, and the Role of Verification. Zhangdie Yuan∗, Han-Chin Shing∗, Mitch Strong, Chaitanya Shivade. [Paper] [Resource]
LLM for Ontology:
[2024] Language Models as Hierarchy Encoders. Yuan He, Zhangdie Yuan, Jiaoyan Chen, Ian Horrocks. [Paper] [Resource]
Dialogue Systems:
[2024] DiaLight: Lightweight Multilingual Development and Evaluation of Task-Oriented Dialogue Systems with Large Language Models. Songbo Hu, Xiaobin Wang, Zhangdie Yuan, Anna Korhonen, Ivan Vulić. [Paper] [Resource]
[2023] A Systematic Study of Performance Disparities in Multilingual Task-Oriented Dialogue Systems. Songbo Hu, Han Zhou, Zhangdie Yuan, Milan Gritta, Guchun Zhang, Ignacio Iacobacci, Anna Korhonen, Ivan Vulić. [Paper] [Resource]
∗ equal contributions