Haoyue Dai    

Hi! I am a PhD student at CMU Philosophy, fortunately advised by Prof. Kun Zhang. I am in the CMU-CLeaR (Causal Learning and Reasoning) group.

My research interests are centered around causality. On the causal discovery side, I care about identifiability with latent variables, with relaxed assumptions, and with heterogeneity. Specifically, I am curious about what we can really get from just observational data. On the machine learning side, I aim to develop methods for causal representation learning and causality-inspired explainability. On the application side, I thrive to solve real-world problems from domains including physics, biology, and social science. My ultimate goal is to enable computers to reason about the world.

Prior to CMU, I completed my undergraduate at IEEE Honor Class and Zhiyuan Honor Program, Shanghai Jiao Tong University (2017-2021).

hyda [AT] cmu.edu  /  Google Scholar  /  GitHub

profile photo
Gene Regulatory Network Inference in the Presence of Dropouts: a Causal View
Haoyue Dai, Ignavier Ng, Gongxu Luo, Peter Spirtes, Petar Stojanov, Kun Zhang
ICLR 2024, oral (AR: 1.2%). In Proceedings of the 12th Annual International Conference on Learning Representations.
arxiv / code

The first to deal with dropouts fully nonparametrically: conditional independence (CI) relations in the data with dropouts, after deleting the samples with zero values for conditioned variables, are identical to the CI relations in the original data.

Local Causal Discovery with Linear non-Gaussian Cyclic Models
Haoyue Dai*, Ignavier Ng*, Yujia Zheng, Zhengqing Gao, Kun Zhang
AISTATS 2024. In Proceedings of the 27th International Conference on Artificial Intelligence and Statistics.
arxiv / code

The first to tackle local causal discovery in cyclic models. By independent subspace analysis, all the local causal structures and coefficients in the equivalence class are identified (intersecting cycles allowed). A regression variant is given for acyclic cases.

Independence Testing-Based Approach to Causal Discovery under Measurement Error and Linear Non-Gaussian Models
Haoyue Dai, Peter Spirtes, Kun Zhang
NeurIPS 2022. In Proceedings of the 36th Annual Conference on Neural Information Processing Systems.
arxiv / demo / slides (gifs 1 2 3) / poster / code

Transformed Independent Noise (TIN) condition: use linear transformation of variables to pursue independence! It entails graphical criteria for causal discovery with latent vairables, and specifically, in the presence of measurement error.

ML4C: Seeing Causality Through Latent Vicinity
Haoyue Dai, Rui Ding, Shi Han, Dongmei Zhang
SDM 2023. In Proceedings of the 2023 SIAM International Conference on Data Mining.
arxiv / code

The first supervised causal discovery approach on discrete observational data. The identifiability is guaranteed, and empirically, it remarkably outperforms other sota algorithms in terms of accuracy, reliability, robustness and tolerance.

ML4S: Learning Causal Skeleton from Vicinal Graphs
Pingchuan Ma, Rui Ding, Haoyue Dai, Yuanyuan Jiang, Shuai Wang, Shi Han, Dongmei Zhang
KDD 2022. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining.
paper / code

Skeleton (undirected causal graph) learning via supervision. Vicinal graphs are proposed to address the domain shift.

Cancer Immunotherapy Grand Challenge, Eric and Wendy Schmidt Center at the Broad Institute, 2023
Haoyue Dai, Petar Stojanov, Gongxu Luo, Ignavier Ng, Yujia Zheng, Xinshuai Dong, Yewen Fan, Biwei Huang, Kun Zhang
Ranked 9/972 in Challenge 2: proposing novel gene knockouts to maximize T cells' cancer-fighting capabilities.

Using Perturb-seq data from 67 gene knockouts, we discover the causal relationships among different genes and cell states, and accordingly propose novel gene knockouts (in the remaining ~20,000 genes) to shift as many T cells as possible into cancer-fighting states. The proposed knockouts are then validated through real-lab experiments.

NeurIPS 2022 CausalML Challenge: Causal Insights for Learning Paths in Education
Haoyue Dai*, Ignavier Ng*, Xinshuai Dong*, Yujia Zheng, Biwei Huang, Kun Zhang
Ranked the 1st in the competition (the 1st, 1st, 1st, and 2nd, in the four tasks, respectively)!
report / slides / code

From students' learning history at an online learning platform (time series, with e.g., changing mechanism, data missingness), we discover the causal relationships among knowledge constructs, and estimate the conditional average treatment effects.

Services & Awards
  • Conference Reviewer: ICML (2022-), NeurIPS (2022-), UAI (2023-), ICLR (2023-), AISTATS (2023-)
  • Journal Reviewer: TNNLS (2021), JMLR (2023), JCGS (2023)
  • Shanghai Jiao Tong University Outstanding Graduate (2021)
  • Stars of Tomorrow (Excellent Intern) at Microsoft Research Asia (2021)
  • Zhiyuan College Honor Scholarship (2018-2021)
  • SJTU Academic Excellence Scholarship (2018-2021)
  • Chenhao Scholarship (2019)
  • Arawana Scholarship (2018)

thanks jon!