About

On Friday September 20th, 2019 as part of IBM Research's AI week we will be hosting the first workshop on Practical Bayesian methods for Big Data.

Bayesian methods have long benefited from their ability to both coherently represent uncertainty and incorporate prior knowledge, but have traditionally struggled to scale to both large data and large models. Deep learning approaches empirically demonstrate the benefits of learning large over-parameterized models from large data, but struggle with producing well calibrated uncertainties. Research attempting to both scale up Bayesian methods and combine the benefits of either paradigm has recently garnered significant attention. Examples include deep generative models and Bayesian neural networks. This workshop will advance and accelerate research on statistical underpinnings of methods at this intersection, including recent advances in Bayesian approaches for learning neural network based models , deep learning methods for Bayesian modeling, methods for scaling up Bayesian inference to large models and data, and use of classical statistical tools for measuring robustness and reliability of deep learning models.

Call For Participation

We invite researchers to submit work in (but not limited to) the following areas:

  • Bayesian approaches for learning neural network based models.
  • Advances in deep generative modeling.
  • Deep learning methods for Bayesian modeling.
  • Methods for scaling up Bayesian inference to large models and data.
  • Methods for measuring robustness and reliability of statistical models.

Submissions

Submission can be made via an EasyChair submission. The submission should be in the form of an extended abstract and should not exceed 3 pages (excluding references) in PDF format using NeurIPS style. Submissions of new ideas, recently published works and/or extension of existing works are welcome. Parallel submissions or submissions of under-review works are also permitted. Author names do not need to be anonymized. Submission will be accepted as contributed 15-minute talks or poster presentations. The final versions will be posted on the workshop website (and are archival but do not constitute a proceeding). 

Key Dates

  • Abstract and submission deadline: September 10, 2019.
  • Meeting Date: September 20, 2019.

Attendance

For each accepted paper or poster, at least one author must attend the workshop and present the paper/poster.

Schedule

8:45-9:00 AM Registration and Check-in
9:00-9:15 AM Welcome and Opening Remarks
9:15-10:00 AM Invited Talk:
Finale Doshi-Velez
Models for Bayesian Neural Networks
10:00-10:45 AM Invited Talk:
Natesh Pillai
Accelerating MCMC algorithms in computer intensive models and applications to large data sets
10:45-11:00 AM Coffee Break
11:00-11:45 AM Invited Talk:
Tamara Broderick
The Kernel Interaction Trick: Fast Bayesian Discovery of Pairwise Interactions in High Dimensions
11:45-1:00 PM Lunch
1:00-1:45 PM Invited Talk:
Jan-Willem van de Meent
Integrating Deep Learning and Probabilistic Programming
1:45-2:45 PM Contributed Talks
10 minutes each, plus 2-minute Q/A
2:45-3:00 PM Coffee Break
3:00-3:45 PM Invited Talk:
Justin Solomon
Approximating and Manipulating Probability Distributions with Transport
3:45-4:30 PM Invited Talk:
Vikash Mansinghka
Probabilistic Programming and Artificial Intelligence

Invited Speakers

Organizing Committee

Nghia Hoang Mikhail Yurokchin Kristen Severson
Prasanna Sattigeri Akash Srivastava Soumya Ghosh

Contributed Talks

1:45-1:57 PM Revisiting Reweighted Wake-Sleep for Models with Stochastic Control Flow
Tuan Anh Le
1:57-2:09 PM Assessing the Robustness of Bayesian Dark Knowledge to Posterior Uncertainty
Meet Vadera
2:09-2:21 PM Learning Free Energies with Bayesian Networks
Jonathan Vandermause
2:21-2:33 PM Dual Neural Network Architecture for Determining Epistemic and Aleatoric Uncertainties
Ravinath Kausik
2:33-2:45 PM Neural Tree Kernel Learning

Invited Talks

Models for Bayesian Neural Networks, Finale Doshi-Velez

Bayesian neural networks (BNN) are often described as a more practical, scalable alternative to other forms of distributions over functions (e.g. Gaussian processes). However, "easy" or "obvious" ways of specifying priors don't necessarily result in the properties that we expect--or want. In this talk, I will talk about some of the work in our group toward creating BNN models with nicer properties, as well as incorporating expert knowledge. Finally, I'll touch on some considerations for inference. This is joint work with: Soumya Ghosh, Jiayu Yao, Yaniv Yacoby, Weiwei Pan, Melanie Pradier, Wanqian Yang, Moritz Graule, Lars Lorch, Anirudh Suresh, Srivatsan Srinivasan, Stefan Depeweg, Miguel Hernandez-Lobato

Accelerating MCMC algorithms in computer intensive models and applications to large data sets, Natesh Pillai

We discuss a new framework for accelerating MCMC algorithms for sampling from posterior distributions in the context of computationally intensive models. We proceed by constructing local surrogates of the forward model within the Metropolis-Hastings kernel, borrowing ideas from deterministic approximation theory, optimization, and experimental design. Our work departs from previous work in surrogate-based inference by exploiting useful convergence characteristics of local approximations. We prove the ergodicity of our approximate Markov chain and show that it samples asymptotically from the exact posterior distribution of interest. We describe variations of the algorithm that construct either local polynomial approximations or Gaussian process regressors, thus spanning two important classes of surrogate models. Our theoretical results reinforce the key observation underlying this paper: when the likelihood has some local regularity, the number of model evaluations per MCMC step can be greatly reduced, without incurring significant bias in the Monte Carlo average. Our numerical experiments demonstrate order-of-magnitude reductions in the number of forward model evaluations used in representative ODE or PDE inference problems, in both real and synthetic data examples. We will also give applications of our theory for problems involving intractable likelihoods and large data sets. Joint work with Andrew Davis, Patrick Conrad, Youssef Marzouk, Aaron Smith.

The Kernel Interaction Trick: Fast Bayesian Discovery of Pairwise Interactions in High Dimensions, Tamara Broderick

Discovering interaction effects on a response of interest is a fundamental problem faced in biology, medicine, economics, and many other scientific disciplines. In theory, Bayesian methods for discovering pairwise interactions enjoy many benefits such as coherent uncertainty quantification, the ability to incorporate background knowledge, and desirable shrinkage properties. In practice, however, Bayesian methods are often computationally intractable for even moderate-dimensional problems. Our key insight is that many hierarchical models of practical interest admit a particular Gaussian process (GP) representation; the GP allows us to capture the posterior with a vector of O(p) kernel hyper-parameters rather than O(p^2) interactions and main effects. With the implicit representation, we can run Markov chain Monte Carlo (MCMC) over model hyper-parameters in time and memory linear in p per iteration. We focus on sparsity-inducing models and show on datasets with a variety of covariate behaviors that our method: (1) reduces runtime by orders of magnitude over naive applications of MCMC, (2) provides lower Type I and Type II error relative to state-of-the-art LASSO-based approaches, and (3) offers improved computational scaling in high dimensions relative to existing Bayesian and LASSO-based approaches.

Integrating Deep Learning and Probabilistic Programming, Jan-Willem Van de Meent

A clear lesson from ongoing advances in deep learning is that large overparameterized models can achieve unsurpassed performance in settings where a sufficiently large amount of data and computation are available. A much more open question is how we can improve the generalization properties of these models when we have a limited amount of data, or a limited amount of labels. In this talk I will discuss how we can combine the principles of probabilistic programming with those of deep learning to design models that incorporate inductive biases that aid generalization. I will provide examples of models that can be trained in an unsupervised or semi-supervised manner to learn structured representations of interpretable variables of interest. I will also discuss ongoing research to improve scalability of inference and evaluate generalization properties of learned models.

Approximating and Manipulating Probability Distributions with Transport, Justin Solomon

The theory of optimal transport defines a metric on the space of probability distributions that lifts the metric of the underlying geometric domain. The structure of this geometry on the space of probability distributions has several favorable properties for tasks in inference and learning. In this talk, I will introduce recent work applying transport to a variety of computational tasks in learning and statistics, including computation of coresets for efficient and approximate learning, distributionally-robust learning in the semi-supervised setting, and overcoming symmetry issues in Bayesian inference. I also will discuss computational techniques employed to overcome the computational cost of evaluating and manipulating transport distances in practice. [Joint work with E. Chien, S. Claici, C. Frogner, F. Mirzazadeh, P. Monteiller, and M. Yurochkin]

Probabilistic Programming and Artificial Intelligence, Vikash Mansingka

Probabilistic programming is an emerging field at the intersection of programming languages, probability theory, and artificial intelligence. This talk will show how to use recently developed probabilistic programming languages to build systems for robust 3D computer vision, without requiring any labeled training data; for automatic modeling of complex real-world time series; and for machine-assisted analysis of experimental data in synthetic biology that is too small and messy for standard approaches from machine learning and statistics. This talk will use these applications to illustrate recent technical innovations in probabilistic programming that formalize and unify modeling approaches from multiple eras of AI, including generative models, neural networks, symbolic programs, causal Bayesian networks, and hierarchical Bayesian modeling. Specifically, it will present languages in which models are represented using executable code, and in which inference is programmable using novel constructs for Monte Carlo, optimization-based, and neural inference. It will also present techniques for Bayesian learning of probabilistic program structure and parameters from real-world data. Finally, this talk will review challenges and research opportunities in the development and use of general-purpose probabilistic programming languages that performant enough and flexible enough for real-world AI engineering.