Causal AI Book

Causal AI is Robert Osazuwa Ness' book on causality. This page contains links to tutorials, notebooks, references and errata.

Chapter 1: Introduction

Book recommendations

This book takes an opinionated approach to causality that focuses on graphs, probabilistic machine learning, Bayesian decision-making, and using deep learning tools such as Pytorch.

For books with alternative perspectives that focus on econometrics, social science, and practical data science themes, check out:

Key references in the chapter

D'Amour, A., Heller, K., Moldovan, D., Adlam, B., Alipanahi, B., Beutel, A., Chen, C., Deaton, J., Eisenstein, J., Hoffman, M.D. and Hormozdiari, F., 2020. Underspecification presents challenges for credibility in modern machine learning. arXiv preprint arXiv:2011.03395.

Chapter 2: Primer on probability modeling

Our course on probabilistic machine learning covers in detail the elements of Bayesian and probabilistic inference covered in this chapter.

Chapter 2 notebooks

Book recommendations

Murphy, K.P., 2022. Probabilistic machine learning: an introduction. MIT press.
Hsu, H.P., 1997. Schaum's outline of theory and problems of probability, random variables, and random processes. McGraw-Hill.

Chapter 3: Building a causal graphical model

Chapter 3 notebooks
See additional code and causal modeling ideas in the projects directory

Causal abstraction

Beckers, S. and Halpern, J.Y., 2019, July. Abstracting causal models. In Proceedings of the AAAI conference on artificial intelligence (Vol. 33, No. 01, pp. 2678-2685).
Beckers, S., Eberhardt, F. and Halpern, J.Y., 2020, August. Approximate causal abstractions. In Uncertainty in artificial intelligence (pp. 606-615). PMLR.
Rischel, E.F. and Weichwald, S., 2021, December. Compositional abstraction error and a category of causal models. In Uncertainty in Artificial Intelligence (pp. 1013-1023). PMLR.

Independence of mechanism

Schölkopf, B., Janzing, D., Peters, J., Sgouritsa, E., Zhang, K. and Mooij, J., 2012. On causal and anticausal learning. arXiv preprint arXiv:1206.6471.
Rojas-Carulla, M., Schölkopf, B., Turner, R. and Peters, J., 2018. Invariant models for causal transfer learning. Journal of Machine Learning Research, 19(36), pp.1-34.
Besserve, M., Shajarisales, N., Schölkopf, B. and Janzing, D., 2018, March. Group invariance principles for causal generative models. In International Conference on Artificial Intelligence and Statistics (pp. 557-565). PMLR.
Parascandolo, G., Kilbertus, N., Rojas-Carulla, M. and Schölkopf, B., 2018, July. Learning independent causal mechanisms. In International Conference on Machine Learning (pp. 4036-4044). PMLR.

Causal data fusion and transfer learning

Bareinboim, E. and Pearl, J., 2016. Causal inference and the data-fusion problem. Proceedings of the National Academy of Sciences, 113(27), pp.7345-7352.
Rojas-Carulla, M., Schölkopf, B., Turner, R. and Peters, J., 2018. Invariant models for causal transfer learning. Journal of Machine Learning Research, 19(36), pp.1-34.
Magliacane, S., van Ommen, T., Claassen, T., Bongers, S., Versteeg, P. and Mooij, J.M., 2017. Causal transfer learning. arXiv preprint arXiv:1707.06422.

Causally invariant prediction

Arjovsky, M., Bottou, L., Gulrajani, I. and Lopez-Paz, D., 2019. Invariant risk minimization. arXiv preprint arXiv:1907.02893.
Heinze-Deml, C., Peters, J. and Meinshausen, N., 2018. Invariant causal prediction for nonlinear models. Journal of Causal Inference, 6(2), p.20170016.
Rosenfeld, E., Ravikumar, P. and Risteski, A., 2020. The risks of invariant risk minimization. arXiv preprint arXiv:2010.05761.
Lu, C., Wu, Y., Hernández-Lobato, J.M. and Schölkopf, B., 2021. Nonlinear invariant risk minimization: A causal approach. arXiv preprint arXiv:2102.12353.

Chapter 4: Testing your causal graph

Chapter 4 notebooks

Tools for d-separation

NetworkX's d_separation algorithm.
pgmpy's get_independencies method enumerates all d-separations in a DAG.
Daggity.net provides both an online application for building DAGs and evaluating d-separation. It also provides an R package.
The dsep function in the bnlearn R package evaluates simple d-separation statements as true or false.
Causalfusion is an online app like Daggity but with a valuable set of additional features. You need to apply for access.

Background on statistical hypothesis testing

See chapter 8 of Schaum's Outline of Probability, Random Variables, and Random Processes for a good introduction to statistical hypothesis testing.
Wikipedia has good descriptions of the Chi-squared test, G-test, and likelihood-ratio test for conditional independence.
Conditional independence tests implemented in pgmpy and scipy.
The PyWhy suite has a library for advanced statistical independence tests called pywhy-stats

Tools for causal discovery

The PyWhy suite contains the popular causal-learn library for causal-discovery
PyWhy also contains an experimental library called dodiscover, which focuses on being a user-friendly interface for discover.
The bnlearn package is an R package, and it has a corresponding Python library

False discovery rate and causal discovery

Wikipedia page on the multiple comparisons problem that occurs when doing repeated hypothesis testing. Standard statistical remedies are to do a family-wise error rate correction or calculate a false discovery rate.
Pena, J.M., 2008, March. Learning gaussian graphical models of gene networks with false discovery rate control. In European conference on evolutionary computation, machine learning and data mining in bioinformatics (pp. 165-176). Berlin, Heidelberg: Springer Berlin Heidelberg.
The bnlearn package implements the interleaved incremental association algorithm with FDR in the iamb.fdr function
Gasse, M., Aussem, A. and Elghazel, H., 2014. A hybrid algorithm for Bayesian network structure learning with application to multi-label learning. Expert Systems with Applications, 41(15), pp.6755-6772.

Go deeper on functional constraints (Verma constraints)

Tian, J. and Pearl, J., 2012. On the testable implications of causal models with hidden variables. arXiv preprint arXiv:1301.0608.
Bhattacharya, R. and Nabi, R., 2022, August. On testability of the front-door model via Verma constraints. In Uncertainty in Artificial Intelligence (pp. 202-212). PMLR.

Selected readings on causal discovery

Spirtes, P., 2001, January. An anytime algorithm for causal inference. In International Workshop on Artificial Intelligence and Statistics (pp. 278-285). PMLR.
Spirtes, P., Glymour, C. and Scheines, R., 2001. Causation, prediction, and search. MIT press. (Introduces the PC algorithm, though one might enjoy this intro by Brady Neal)
Chickering, D.M., 2002. Optimal structure identification with greedy search. Journal of machine learning research, 3(Nov), pp.507-554.
Friedman, N. and Koller, D., 2003. Being Bayesian about network structure. A Bayesian approach to structure discovery in Bayesian networks. Machine learning, 50, pp.95-125.
Heckerman, D., Meek, C. and Cooper, G., 2006. A Bayesian approach to causal discovery. Innovations in Machine Learning: Theory and Applications, pp.1-28.
Meek, C., 2013. Causal inference and causal explanation with background knowledge. arXiv preprint arXiv:1302.4972.
Cooper, G.F. and Yoo, C., 2013. Causal discovery from a mixture of experimental and observational data. arXiv preprint arXiv:1301.6686.
Ogarrio, J.M., Spirtes, P. and Ramsey, J., 2016, August. A hybrid causal search algorithm for latent variable models. In Conference on probabilistic graphical models (pp. 368-379). PMLR.
Ness, R.O., Sachs, K. and Vitek, O., 2016. From correlation to causality: statistical approaches to learning regulatory relationships in large-scale biomolecular investigations. Journal of Proteome Research, 15(3), pp.683-690.
Glymour, C., Zhang, K. and Spirtes, P., 2019. Review of causal discovery methods based on graphical models. Frontiers in genetics, 10, p.524.
Zheng, Y., Huang, B., Chen, W., Ramsey, J., Gong, M., Cai, R., Shimizu, S., Spirtes, P. and Zhang, K., 2024. Causal-learn: Causal discovery in python. Journal of Machine Learning Research, 25(60), pp.1-8.

Chapter 5: Building causal graphs with deep probabilistic machine learning

An explanation of the case study on independence of mechanism in semi-supervised learning

One way to understand this example is to use the combined perspectives of Bayesian reasoning and d-separation. In both the causal learning and anti-causal case, during training, we’re learning some set parameters for a model of P(X, Y). Let’s decouple this set into θy for the parameter set that parameterizes the causal Markov kernel for Y and θx for the parameter set that parameterizes the causal Markov kernel for X. Further, let’s take the Bayesian approach of treating θx and θy and as random variables, and give them their own nodes in the DAG, as shown in the following figure.

The DAG on the left depicts causal learning, with the feature causing the label. The DAG on the right depicts anti-causal learning, with the label causing the feature. The theta-x and theta-y are parameter oracles that set the parameters of X and Y, made explicity nodes in the graph. Assume that only X is observed. X is d-separated from theta-y in the causal learning case but not in the anti-causal case.

We’ll interpret these “Greek” nodes as oracles that set the parameters of the causal Markov kernels of X and Y, and our job during unsupervised training is to infer these unknown parameters from X alone. We can see that in the causal learning case, X and θy are d-separated and thus X is not informative of θy. But in the anti-causal learning case, X and θy are d-connected, thus X is informative of θy.

For more information, see:

Schölkopf, B., Janzing, D., Peters, J., Sgouritsa, E., Zhang, K. and Mooij, J., 2012. On causal and anticausal learning. arXiv preprint arXiv:1206.6471.

Vision as Inverse Graphics

Intro to Vision as Inverse Graphics from Max Planck Institute for Intelligent Systems
Romaszko, L., Williams, C.K., Moreno, P. and Kohli, P., 2017. Vision-as-inverse-graphics: Obtaining a rich 3d explanation of a scene from a single image. In Proceedings of the IEEE International Conference on Computer Vision Workshops (pp. 851-859).

Causal representation learning and disentanglement

Intro to Causal Representation Learning from Max Planck Institute for Intelligent Systems
Kumar, A., Sattigeri, P. and Balakrishnan, A., 2017. Variational inference of disentangled latent concepts from unlabeled observations. arXiv preprint arXiv:1711.00848.
Locatello, F., Bauer, S., Lucic, M., Raetsch, G., Gelly, S., Schölkopf, B. and Bachem, O., 2019, May. Challenging common assumptions in the unsupervised learning of disentangled representations. In international conference on machine learning (pp. 4114-4124). PMLR.
Yang, M., Liu, F., Chen, Z., Shen, X., Hao, J. and Wang, J., 2020. Causalvae: Structured causal disentanglement in variational autoencoder. arXiv preprint arXiv:2004.08697.
Wang, Y. and Jordan, M.I., 2021. Desiderata for representation learning: A causal perspective. arXiv preprint arXiv:2109.03795.
Ahuja, K., Hartford, J. and Bengio, Y., 2021. Properties from mechanisms: an equivariance perspective on identifiable representation learning. arXiv preprint arXiv:2110.15796.
Reddy, A.G. and Balasubramanian, V.N., 2022, June. On causally disentangled representations. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 36, No. 7, pp. 8089-8097).

Miscellaneous

Ali Rahimi quote comparing machine learning to alchemy can be found at the 11:59 mark in this video of his NIPS 2017 Test-of-Time Award presentation.

Chapter 6: Structural Causal Models

Chapter 6 notebooks
Javaloy, A., S'anchez-Mart'in, P., & Valera, I., 2023. Causal normalizing flows: from theory to practice. In: A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, & S. Levine, eds. Neural Information Processing Systems. Curran Associates, Inc., pp. 58833–58864. doi: 10.48550/arXiv.2306.05415.

Chapter 7: Interventions and Causal Effects

Chapter 7 notebooks

Chapter 8: Counterfactuals and Parallel Worlds

Verma, S., Dickerson, J. and Hines, K., 2020. Counterfactual explanations for machine learning: A review. arXiv preprint arXiv:2010.10596, 2.
Pearl, J., 2010. Brief report: On the consistency rule in causal inference:" axiom, definition, assumption, or theorem?". Epidemiology, pp.872-875.

Probabilities of Causation

Beckers, S., 2021. Causal sufficiency and actual causation. Journal of Philosophical Logic, 50(6), pp.1341-1374. Vancouver
Knobe, J. and Shapiro, S., 2021. Proximate cause explained. The University of Chicago Law Review, 88(1), pp.165-236.
Pearl, J., 2022. Probabilities of causation: three counterfactual interpretations and their identification. In Probabilistic and Causal Inference: The Works of Judea Pearl (pp. 317-372).

Uplift Modeling

Li, A. and Pearl, J., 2019, August. Unit selection based on counterfactual logic. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence.
Gutierrez, P. and Gérardy, J.Y., 2017, July. Causal inference and uplift modelling: A review of the literature. In International conference on predictive applications and APIs (pp. 1-13). PMLR.

Chapter 9: The Counterfactual Inference Algorithm

Chapter 9 notebooks
ChiRho library for causal inference with probabilistic models (extension of Pyro)

Intractable likelihood methods in probabilistic inference

Papamakarios, George, et al. "Normalizing flows for probabilistic modeling and inference." The Journal of Machine Learning Research 22.1 (2021): 2617-2680.
Matsubara, Takuo, et al. "Robust generalised Bayesian inference for intractable likelihoods." Journal of the Royal Statistical Society Series B: Statistical Methodology 84.3 (2022): 997-1022.
Ritchie, Daniel, Paul Horsfall, and Noah D. Goodman. "Deep amortized inference for probabilistic programs." arXiv preprint arXiv:1610.05735 (2016).
Murphy, Kevin P. Probabilistic machine learning: Advanced topics. MIT press, 2023.

Chapter 10: Causal Hierarchy and Identification

Chapter 10 notebook

Do-calculus, Pearl's Causal Hierarchy and Identification algorithms

The Y0 repository for causal inference and identification
Bareinboim, E., Correa, J.D., Ibeling, D. and Icard, T., 2022. On Pearl’s hierarchy and the foundations of causal inference. In Probabilistic and causal inference: the works of judea pearl (pp. 507-556).
Shpitser, I. and Pearl, J., 2006, July. Identification of joint interventional distributions in recursive semi-Markovian causal models. In Proceedings of the National Conference on Artificial Intelligence (Vol. 21, No. 2, p. 1219). Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999.
Shpitser, I. and Pearl, J., 2008. Complete identification methods for the causal hierarchy. Journal of Machine Learning Research, 9, pp.1941-1979.
Huang, Yimin, and Marco Valtorta. Pearl's calculus of intervention is complete. arXiv preprint arXiv:1206.6831 (2006).

Potential outcomes, single world intervention graphs, and related concepts

Malinsky, D., Shpitser, I. and Richardson, T., 2019, April. A potential outcomes calculus for identifying conditional path-specific effects. In The 22nd International Conference on Artificial Intelligence and Statistics (pp. 3080-3088). PMLR.
Shpitser, I., Richardson, T.S. and Robins, J.M., 2022. Multivariate counterfactual systems and causal graphical models. In Probabilistic and Causal Inference: The Works of Judea Pearl (pp. 813-852).
Robins, J.M. and Richardson, T.S., 2010. Alternative graphical causal models and the identification of direct effects. Causality and psychopathology: Finding the determinants of disorders and their cures, 84, pp.103-158.
Richardson, T.S. and Robins, J.M., 2013, July. Single world intervention graphs: a primer. In Second UAI workshop on causal structure learning, Bellevue, Washington.
J. Robins, T.J. vanderWeele and T.S. Richardson. (2007). Contribution to discussion of Causal Effects in the presence of non-compliance a latent variable interpretation. by A. Forcina. Metron, LXIV (3) pp. 288-298.
Geneletti, S., & Dawid, A. P. (2007). Defining and identifying the effect of treatment on the treated (Tech. Rep. No. 3). Imperial College London, Department of Epidemiology and Public Health.

Identification of effect of treatment on the treated

Shpitser, I. and Tchetgen, E.T., 2016. Causal inference with a graphical hierarchy of interventions. Annals of statistics, 44(6), p.2433.

Partial Identification Bounds

Mueller, S. and Pearl, J., 2022. Personalized Decision Making--A Conceptual Introduction. arXiv preprint arXiv:2208.09558.
Li, A. and Pearl, J., 2022. Probabilities of causation with nonbinary treatment and effect. arXiv preprint arXiv:2208.09568.
Li, A. and Pearl, J., 2022, June. Unit selection with causal diagram. In Proceedings of the AAAI conference on artificial intelligence (Vol. 36, No. 5, pp. 5765-5772).

Chapter 11: Building a Causal Effect Estimation Workflow

Notebooks

Chapter 12: Primer on Causality in Decisions, Bandits, and Reinforcement Learning

Notebooks

Causal decision theory, Newcomb's paradox, and introspection/deliberation

Causal decision theory - Wikipedia
Newcomb's paradox - Wikipedia
Agent causation - Wikipedia
Stern, R., 2018. Diagnosing Newcomb’s problem with causal graphs. Newcomb’s problem, pp.201-220.
Skyrms, B., 1982. Causal decision theory. The Journal of Philosophy, 79(11), pp.695-711.
Lewis, D., 1981. Causal decision theory. Australasian journal of philosophy, 59(1), pp.5-30.
Joyce, J.M., 2002. Levi on causal decision theory and the possibility of predicting one's own actions. Philosophical Studies, 110, pp.69-102.
Clarke, R., 1993. Toward a credible agent-causal account of free will. Noûs, 27(2), pp.191-203.

Causal Bandits and Reinforcement Learning, Counterfactual Risk Minimization

Bareinboim, E., Forney, A. and Pearl, J., 2015. Bandits with unobserved confounders: A causal approach. Advances in Neural Information Processing Systems, 28.
Buesing, L., Weber, T., Zwols, Y., Racaniere, S., Guez, A., Lespiau, J.B. and Heess, N., 2018. Woulda, coulda, shoulda: Counterfactually-guided policy search. arXiv preprint arXiv:1811.06272.
Swaminathan, A. and Joachims, T., 2015, June. Counterfactual risk minimization: Learning from logged bandit feedback. In International Conference on Machine Learning (pp. 814-823). PMLR.
Scott, S.L., 2010. A modern Bayesian look at the multi‐armed bandit. Applied Stochastic Models in Business and Industry, 26(6), pp.639-658. (This paper doesn't use causal reasoning, but its Bayesian framing of the bandit problem can extend to counterfactual reasoning by admiting counterfactual notation)

Causality in Games, Counterfactual Regret Minimization in Multiagent Games

Zinkevich, M., Johanson, M., Bowling, M. and Piccione, C., 2007. Regret minimization in games with incomplete information. Advances in neural information processing systems, 20.
Brown, N., Lerer, A., Gross, S. and Sandholm, T., 2019, May. Deep counterfactual regret minimization. In International conference on machine learning (pp. 793-802). PMLR.
Hammond, L., Fox, J., Everitt, T., Carey, R., Abate, A. and Wooldridge, M., 2023. Reasoning about causality in games. Artificial Intelligence, 320, p.103919.
Soto, M.G., Sucar, L.E. and Escalante, H.J., 2020. Causal games and causal nash equilibrium. Res Comput Sci, 149, pp.123-133.

Chapter 13: Causality and Large Language Models

Notebooks

Key references in the chapter

Kıcıman, E., Ness, R., Sharma, A. and Tan, C., 2023. Causal reasoning and large language models: Opening a new frontier for causality. arXiv preprint arXiv:2305.00050.

Causal AI Book

Chapter 1: Introduction

Book recommendations

Key references in the chapter

Chapter 2: Primer on probability modeling

Book recommendations

Chapter 3: Building a causal graphical model

Causal abstraction

Independence of mechanism

Causal data fusion and transfer learning

Causally invariant prediction

Chapter 4: Testing your causal graph

Tools for d-separation

Background on statistical hypothesis testing

Tools for causal discovery

False discovery rate and causal discovery

Go deeper on functional constraints (Verma constraints)

More on causal faithfulness

Selected readings on causal discovery

Chapter 5: Building causal graphs with deep probabilistic machine learning

An explanation of the case study on independence of mechanism in semi-supervised learning

Vision as Inverse Graphics

Causal representation learning and disentanglement

Miscellaneous

Chapter 6: Structural Causal Models

Chapter 7: Interventions and Causal Effects

Chapter 8: Counterfactuals and Parallel Worlds

Probabilities of Causation

Chapter 9: The Counterfactual Inference Algorithm

Intractable likelihood methods in probabilistic inference

Chapter 10: Causal Hierarchy and Identification

Do-calculus, Pearl's Causal Hierarchy and Identification algorithms

Potential outcomes, single world intervention graphs, and related concepts

Identification of effect of treatment on the treated

Partial Identification Bounds

Chapter 11: Building a Causal Effect Estimation Workflow

Chapter 12: Primer on Causality in Decisions, Bandits, and Reinforcement Learning

Causal decision theory, Newcomb's paradox, and introspection/deliberation

Causal Bandits and Reinforcement Learning, Counterfactual Risk Minimization

Causality in Games, Counterfactual Regret Minimization in Multiagent Games

Chapter 13: Causality and Large Language Models

Key references in the chapter