Yuhao Zhang

Hey there, welcome!

I am currently a scientist in the founding team of Samaya AI. We are on a journey to improve knowledge discovery by harnessing the power of large language models.

Before Samaya, I was a scientist at Amazon AWS AI where I worked on core AWS services relevant to enterprise search. I obtained my PhD degree from Stanford University, where I was jointly advised by Prof. Chris Manning in the Stanford NLP Group and Prof. Curtis Langlotz in the Stanford AIMI Center. My PhD work has focused on natural language processing and its applications in medicine.

Before that, I obtained a M.S. degree in the Computer Science Department at Stanford University, and a bachelor’s degree from the Department of Electronic Engineering at Tsinghua University, China.

research interest

I care about NLP systems and their impact in real-world applications. My work has covered the following areas:

retrieval and retrieval-augmented generation;
information extraction;
summarization;
multimodal learning;
syntactic analysis and open-source NLP toolkit (I am a co-author of the widely used Stanza NLP library).

contact

You can reach me now at {first-name} ~at~ cs.stanford.edu. You can also find my various social accounts at the bottom of this page.

selected publications

For a complete list, see the publications page, or my google scholar page.

(*=equal contribution)

arXiv

Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models

Orion Weller , Benjamin Van Durme , Dawn Lawrie , and 3 more authors

arXiv preprint arXiv:2409.11136, 2024

Bib HTML

@article{weller2024promptriever,
  title = {Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models},
  author = {Weller, Orion and Van Durme, Benjamin and Lawrie, Dawn and Paranjape, Ashwin and Zhang, Yuhao and Hessel, Jack},
  journal = {arXiv preprint arXiv:2409.11136},
  year = {2024},
}

EMNLP

Dancing in Chains: Reconciling Instruction Following and Faithfulness in Language Models

Zhengxuan Wu , Yuhao Zhang, Peng Qi , and 6 more authors

In EMNLP , 2024

Bib HTML

@inproceedings{wu2024dancing,
  title = {Dancing in Chains: Reconciling Instruction Following and Faithfulness in Language Models},
  author = {Wu, Zhengxuan and Zhang, Yuhao and Qi, Peng and Xu, Yumo and Han, Rujun and Zhang, Yian and Chen, Jifan and Min, Bonan and Huang, Zhiheng},
  booktitle = {EMNLP},
  year = {2024},
}

EMNLP

RAG-QA Arena: Evaluating Domain Robustness for Long-form Retrieval Augmented Question Answering

Rujun Han , Yuhao Zhang, Peng Qi , and 6 more authors

In EMNLP , 2024

Bib HTML

@inproceedings{han2024rag,
  title = {RAG-QA Arena: Evaluating Domain Robustness for Long-form Retrieval Augmented Question Answering},
  author = {Han, Rujun and Zhang, Yuhao and Qi, Peng and Xu, Yumo and Wang, Jenyuan and Liu, Lan and Wang, William Yang and Min, Bonan and Castelli, Vittorio},
  booktitle = {EMNLP},
  year = {2024},
}

ACL Findings

RobustQA: Benchmarking the Robustness of Domain Adaptation for Open-domain Question Answering

Rujun Han , Peng Qi , Yuhao Zhang, and 6 more authors

In Findings of the Annual Meeting of the Association for Computational Linguistics (ACL) , 2023

Bib HTML

@inproceedings{chen2023improving,
  title = {RobustQA: Benchmarking the Robustness of Domain Adaptation for Open-domain Question Answering},
  author = {Han, Rujun and Qi, Peng and Zhang, Yuhao and Liu, Lan and Burger, Juliette and Wang, William and Huang, Zhiheng and Xiang, Bing and Roth, Dan},
  booktitle = {Findings of the Annual Meeting of the Association for Computational Linguistics (ACL)},
  year = {2023},
}

MLHC

Contrastive Learning of Medical Visual Representations from Paired Images and Text

Yuhao Zhang, Hang Jiang , Yasuhide Miura , and 2 more authors

In Proceedings of the 7th Machine Learning for Healthcare Conference , 2022

Bib HTML

@inproceedings{zhang2022contrastive,
  title = {Contrastive Learning of Medical Visual Representations from Paired Images and Text},
  author = {Zhang, Yuhao and Jiang, Hang and Miura, Yasuhide and Manning, Christopher D and Langlotz, Curtis P},
  booktitle = {Proceedings of the 7th Machine Learning for Healthcare Conference},
  pages = {1--24},
  volume = {182},
  year = {2022},
  series = {Proceedings of Machine Learning Research},
  publisher = {PMLR},
  dataset = {https://github.com/yuhaozhang/convirt},
}

Thesis

Deep Understanding and Generation of Medical Text and Beyond

Yuhao Zhang

Stanford University PhD Thesis, 2021

Bib HTML

@article{zhang2021deep,
  title = {Deep Understanding and Generation of Medical Text and Beyond},
  author = {Zhang, Yuhao},
  year = {2021},
  journal = {Stanford University PhD Thesis},
  school = {Stanford University},
}

JAMIA

Biomedical and Clinical English Model Packages for the Stanza Python NLP Library

Yuhao Zhang, Yuhui Zhang , Peng Qi , and 2 more authors

Journal of the American Medical Informatics Association, 2021

Bib HTML

@article{zhang2021biomedical,
  title = {Biomedical and Clinical English Model Packages for the Stanza Python NLP Library},
  author = {Zhang, Yuhao and Zhang, Yuhui and Qi, Peng and Manning, Christopher D and Langlotz, Curtis P.},
  journal = {Journal of the American Medical Informatics Association},
  volume = {28},
  number = {9},
  pages = {1892--1899},
  year = {2021},
  publisher = {Oxford University Press},
}

ACL

Stanza: A Python Natural Language Processing Toolkit for Many Human Languages

Peng Qi* , Yuhao Zhang*, Yuhui Zhang , and 2 more authors

In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL): System Demonstrations , 2020

Bib HTML

@inproceedings{qi2020stanza,
  title = {Stanza: A Python Natural Language Processing Toolkit for Many Human Languages},
  author = {Qi*, Peng and Zhang*, Yuhao and Zhang, Yuhui and Bolton, Jason and Manning, Christopher D},
  booktitle = {Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL): System Demonstrations},
  year = {2020},
}

EMNLP-CoNLL

Universal Dependency Parsing from Scratch

Peng Qi* , Timothy Dozat* , Yuhao Zhang*, and 1 more author

In Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies , 2018

Bib HTML

@inproceedings{qi2018universal,
  title = {Universal Dependency Parsing from Scratch},
  author = {Qi*, Peng and Dozat*, Timothy and Zhang*, Yuhao and Manning, Christopher D},
  booktitle = {Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies},
  year = {2018},
}