Causal AI Book
Causal AI is Robert Osazuwa Ness' book on causality. This page contains links to tutorials, notebooks, references and errata.
Chapter 1: Introduction
Book recommendations
This book takes an opinionated approach to causality that focuses on graphs, probabilistic machine learning, Bayesian decisionmaking, and using deep learning tools such as Pytorch.
For books with alternative perspectives that focus on econometrics, social science, and practical data science themes, check out:
 Causal Inference: The Mixtape
 Causal Inference and Discovery in Python
 Causal Inference in Statistics: A Primer
Key references in the chapter

D'Amour, A., Heller, K., Moldovan, D., Adlam, B., Alipanahi, B., Beutel, A., Chen, C., Deaton, J., Eisenstein, J., Hoffman, M.D. and Hormozdiari, F., 2020. Underspecification presents challenges for credibility in modern machine learning. arXiv preprint arXiv:2011.03395.
Chapter 2: Primer on probability modeling
Our course on probabilistic machine learning covers in detail the elements of Bayesian and probabilistic inference covered in this chapter.
Book recommendations
 Murphy, K.P., 2022. Probabilistic machine learning: an introduction. MIT press.
 Hsu, H.P., 1997. Schaum's outline of theory and problems of probability, random variables, and random processes. McGrawHill.
Chapter 3: Building a causal graphical model
 Chapter 3 notebooks
 See additional code and causal modeling ideas in the projects directory
Causal abstraction
 Beckers, S. and Halpern, J.Y., 2019, July. Abstracting causal models. In Proceedings of the AAAI conference on artificial intelligence (Vol. 33, No. 01, pp. 26782685).
 Beckers, S., Eberhardt, F. and Halpern, J.Y., 2020, August. Approximate causal abstractions. In Uncertainty in artificial intelligence (pp. 606615). PMLR.
 Rischel, E.F. and Weichwald, S., 2021, December. Compositional abstraction error and a category of causal models. In Uncertainty in Artificial Intelligence (pp. 10131023). PMLR.
Independence of mechanism
 Schölkopf, B., Janzing, D., Peters, J., Sgouritsa, E., Zhang, K. and Mooij, J., 2012. On causal and anticausal learning. arXiv preprint arXiv:1206.6471.
 RojasCarulla, M., Schölkopf, B., Turner, R. and Peters, J., 2018. Invariant models for causal transfer learning. Journal of Machine Learning Research, 19(36), pp.134.
 Besserve, M., Shajarisales, N., Schölkopf, B. and Janzing, D., 2018, March. Group invariance principles for causal generative models. In International Conference on Artificial Intelligence and Statistics (pp. 557565). PMLR.
 Parascandolo, G., Kilbertus, N., RojasCarulla, M. and Schölkopf, B., 2018, July. Learning independent causal mechanisms. In International Conference on Machine Learning (pp. 40364044). PMLR.
Causal data fusion and transfer learning
 Bareinboim, E. and Pearl, J., 2016. Causal inference and the datafusion problem. Proceedings of the National Academy of Sciences, 113(27), pp.73457352.
 RojasCarulla, M., Schölkopf, B., Turner, R. and Peters, J., 2018. Invariant models for causal transfer learning. Journal of Machine Learning Research, 19(36), pp.134.
 Magliacane, S., van Ommen, T., Claassen, T., Bongers, S., Versteeg, P. and Mooij, J.M., 2017. Causal transfer learning. arXiv preprint arXiv:1707.06422.
Causally invariant prediction
 Arjovsky, M., Bottou, L., Gulrajani, I. and LopezPaz, D., 2019. Invariant risk minimization. arXiv preprint arXiv:1907.02893.
 HeinzeDeml, C., Peters, J. and Meinshausen, N., 2018. Invariant causal prediction for nonlinear models. Journal of Causal Inference, 6(2), p.20170016.
 Rosenfeld, E., Ravikumar, P. and Risteski, A., 2020. The risks of invariant risk minimization. arXiv preprint arXiv:2010.05761.
 Lu, C., Wu, Y., HernándezLobato, J.M. and Schölkopf, B., 2021. Nonlinear invariant risk minimization: A causal approach. arXiv preprint arXiv:2102.12353.
Chapter 4: Testing your causal graph
 Chapter 4 notebooks
 NetworkX's d_separation algorithm.
 pgmpy's get_independencies method enumerates all dseparations in a DAG.
 Daggity.net provides both an online application for building DAGs and evaluating dseparation. It also provides an R package.
 The dsep function in the bnlearn R package evaluates simple dseparation statements as true or false.
 Causalfusion is an online app like Daggity but with a valuable set of additional features. You need to apply for access.
 See chapter 8 of Schaum's Outline of Probability, Random Variables, and Random Processes for a good introduction to statistical hypothesis testing.
 Wikipedia page on statistical hypothesis test.
 Wikipedia page on the Chisquared test, Gtest, and likelihoodratio test for conditional independence.
 Conditional independence tests implemented in pgmpy and scipy.
 Conditional independence tests implemented in R's bnlearn.
 Wikipedia page on the multiple comparisons problem that occurs when doing repeated hypothesis testing. Standard statistical remedies are to do a familywise error rate correction or calculate a false discovery rate.
Chapter 5: Building causal graphs with deep probabilistic machine learning
 Chapter 5 notebooks
 Typeface MNIST data (Kaggle)
 Schölkopf, B., Janzing, D., Peters, J., Sgouritsa, E., Zhang, K. and Mooij, J., 2012. On causal and anticausal learning. arXiv preprint arXiv:1206.6471.
Chapter 6: Structural Causal Models
Chapter 7: Interventions
Chapter 8: Counterfactuals
 Chapter 8 notebooks
 Pearl, J., 2010. Brief report: On the consistency rule in causal inference:" axiom, definition, assumption, or theorem?". Epidemiology, pp.872875.
 Beckers, S., 2021. Causal sufficiency and actual causation. Journal of Philosophical Logic, 50(6), pp.13411374. Vancouver
 Knobe, J. and Shapiro, S., 2021. Proximate cause explained. The University of Chicago Law Review, 88(1), pp.165236.
Chapter 9: The Counterfactual Inference Algorithm
 Chapter 9 notebooks
 ChiRho library for causal inference with probabilistic models (extension of Pyro)
Intractable likelihood methods in probabilistic inference
 Papamakarios, George, et al. "Normalizing flows for probabilistic modeling and inference." The Journal of Machine Learning Research 22.1 (2021): 26172680.
 Matsubara, Takuo, et al. "Robust generalised Bayesian inference for intractable likelihoods." Journal of the Royal Statistical Society Series B: Statistical Methodology 84.3 (2022): 9971022.
 Ritchie, Daniel, Paul Horsfall, and Noah D. Goodman. "Deep amortized inference for probabilistic programs." arXiv preprint arXiv:1610.05735 (2016).
 Murphy, Kevin P. Probabilistic machine learning: Advanced topics. MIT press, 2023.
Chapter 10: Causal Hierarchy and Identification
Docalculus, Pearl's Causal Hierarchy and Identification algorithms
 The Y0 repository for causal inference and identification
 Bareinboim, E., Correa, J.D., Ibeling, D. and Icard, T., 2022. On Pearl’s hierarchy and the foundations of causal inference. In Probabilistic and causal inference: the works of judea pearl (pp. 507556).
 Shpitser, I. and Pearl, J., 2006, July. Identification of joint interventional distributions in recursive semiMarkovian causal models. In Proceedings of the National Conference on Artificial Intelligence (Vol. 21, No. 2, p. 1219). Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999.
 Shpitser, I. and Pearl, J., 2008. Complete identification methods for the causal hierarchy. Journal of Machine Learning Research, 9, pp.19411979.

Huang, Yimin, and Marco Valtorta. Pearl's calculus of intervention is complete. arXiv preprint arXiv:1206.6831 (2006).
Potential outcomes, single world intervention graphs, and related concepts
 Malinsky, D., Shpitser, I. and Richardson, T., 2019, April. A potential outcomes calculus for identifying conditional pathspecific effects. In The 22nd International Conference on Artificial Intelligence and Statistics (pp. 30803088). PMLR.
 Shpitser, I., Richardson, T.S. and Robins, J.M., 2022. Multivariate counterfactual systems and causal graphical models. In Probabilistic and Causal Inference: The Works of Judea Pearl (pp. 813852).
 Robins, J.M. and Richardson, T.S., 2010. Alternative graphical causal models and the identification of direct effects. Causality and psychopathology: Finding the determinants of disorders and their cures, 84, pp.103158.
 Richardson, T.S. and Robins, J.M., 2013, July. Single world intervention graphs: a primer. In Second UAI workshop on causal structure learning, Bellevue, Washington.
 J. Robins, T.J. vanderWeele and T.S. Richardson. (2007). Contribution to discussion of Causal Effects in the presence of noncompliance a latent variable interpretation. by A. Forcina. Metron, LXIV (3) pp. 288298.
 Geneletti, S., & Dawid, A. P. (2007). Defining and identifying the effect of treatment on the treated (Tech. Rep. No. 3). Imperial College London, Department of Epidemiology and Public Health.
Identification of effect of treatment on the treated
 Shpitser, I. and Tchetgen, E.T., 2016. Causal inference with a graphical hierarchy of interventions. Annals of statistics, 44(6), p.2433.
Partial Identification Bounds
 Mueller, S. and Pearl, J., 2022. Personalized Decision MakingA Conceptual Introduction. arXiv preprint arXiv:2208.09558.
 Li, A. and Pearl, J., 2022. Probabilities of causation with nonbinary treatment and effect. arXiv preprint arXiv:2208.09568.
 Li, A. and Pearl, J., 2022, June. Unit selection with causal diagram. In Proceedings of the AAAI conference on artificial intelligence (Vol. 36, No. 5, pp. 57655772).
Chapter 11: Building a Causal Effect Estimation Workflow
Chapter 12: Building a Causal Effect Estimation Workflow
Chapter 13: Causality and Large Language Models
Key references in the chapter
 Kıcıman, E., Ness, R., Sharma, A. and Tan, C., 2023. Causal reasoning and large language models: Opening a new frontier for causality. arXiv preprint arXiv:2305.00050.