On Friday September 20th, 2019 as part of IBM Research's
AI week we will be hosting the first workshop on Practical Bayesian methods for Big Data.
Bayesian methods have long benefited from their ability to both coherently represent uncertainty and incorporate prior knowledge, but have traditionally struggled to scale to both large data and large models. Deep learning approaches empirically demonstrate the benefits of learning large over-parameterized models from large data, but struggle with producing well calibrated uncertainties. Research attempting to both scale up Bayesian methods and combine the benefits of either paradigm has recently garnered significant attention. Examples include deep generative models and Bayesian neural networks. This workshop will advance and accelerate research on statistical underpinnings of methods at this intersection, including recent advances in Bayesian approaches for learning neural network based models , deep learning methods for Bayesian modeling, methods for scaling up Bayesian inference to large models and data, and use of classical statistical tools for measuring robustness and reliability of deep learning models.
We invite researchers to submit work in (but not limited to) the following areas:
- Bayesian approaches for learning neural network based models.
- Advances in deep generative modeling.
- Deep learning methods for Bayesian modeling.
- Methods for scaling up Bayesian inference to large models and data.
- Methods for measuring robustness and reliability of statistical models.
Submissions
Submission can be made via an
EasyChair submission.
The submission should be in the form of an extended abstract and should not exceed 3 pages (excluding references) in PDF format using NeurIPS style. Submissions of new ideas, recently published works and/or extension of existing works are welcome. Parallel submissions or submissions of under-review works are also permitted. Author names do not need to be anonymized.
Submission will be accepted as contributed 15-minute talks or poster presentations. The final versions will be posted on the workshop website (and are archival but do not constitute a proceeding).
Key Dates
- Abstract and submission deadline: September 10, 2019.
- Meeting Date: September 20, 2019.
Attendance
For each accepted paper or poster,
at least one author must attend the workshop and present the
paper/poster.
Models for Bayesian Neural Networks, Finale Doshi-Velez
Bayesian neural networks (BNN) are often described as a more practical, scalable alternative to other forms of distributions over functions (e.g. Gaussian processes).
However, "easy" or "obvious" ways of specifying priors don't necessarily result in the properties that we expect--or want. In this talk, I will talk about some of the work in
our group toward creating BNN models with nicer properties, as well as incorporating expert knowledge. Finally, I'll touch on some considerations for inference. This is joint
work with: Soumya Ghosh, Jiayu Yao, Yaniv Yacoby, Weiwei Pan, Melanie Pradier, Wanqian Yang, Moritz Graule, Lars Lorch, Anirudh Suresh, Srivatsan Srinivasan, Stefan Depeweg, Miguel Hernandez-Lobato
Accelerating MCMC algorithms in computer intensive models and applications to large data sets, Natesh Pillai
We discuss a new framework for accelerating MCMC algorithms for sampling from posterior distributions in the context of computationally intensive models. We proceed by constructing
local surrogates of the forward model within the Metropolis-Hastings kernel, borrowing ideas from deterministic approximation theory, optimization, and experimental design. Our work
departs from previous work in surrogate-based inference by exploiting useful convergence characteristics of local approximations. We prove the ergodicity of our approximate Markov chain
and show that it samples asymptotically from the exact posterior distribution of interest. We describe variations of the algorithm that construct either local polynomial approximations
or Gaussian process regressors, thus spanning two important classes of surrogate models. Our theoretical results reinforce the key observation underlying this paper: when the likelihood
has some local regularity, the number of model evaluations per MCMC step can be greatly reduced, without incurring significant bias in the Monte Carlo average. Our numerical experiments
demonstrate order-of-magnitude reductions in the number of forward model evaluations used in representative ODE or PDE inference problems, in both real and synthetic data examples. We will
also give applications of our theory for problems involving intractable likelihoods and large data sets. Joint work with Andrew Davis, Patrick Conrad, Youssef Marzouk, Aaron Smith.
The Kernel Interaction Trick: Fast Bayesian Discovery of
Pairwise Interactions in High Dimensions, Tamara Broderick
Discovering interaction effects on a response of interest is
a fundamental problem faced in biology, medicine, economics, and many
other scientific disciplines. In theory, Bayesian methods for
discovering pairwise interactions enjoy many benefits such as coherent
uncertainty quantification, the ability to incorporate background
knowledge, and desirable shrinkage properties. In practice, however,
Bayesian methods are often computationally intractable for even
moderate-dimensional problems. Our key insight is that many
hierarchical models of practical interest admit a particular Gaussian
process (GP) representation; the GP allows us to capture the posterior
with a vector of O(p) kernel hyper-parameters rather than O(p^2)
interactions and main effects. With the implicit representation, we
can run Markov chain Monte Carlo (MCMC) over model hyper-parameters in
time and memory linear in p per iteration. We focus on
sparsity-inducing models and show on datasets with a variety of
covariate behaviors that our method: (1) reduces runtime by orders of
magnitude over naive applications of MCMC, (2) provides lower Type I
and Type II error relative to state-of-the-art LASSO-based approaches,
and (3) offers improved computational scaling in high dimensions
relative to existing Bayesian and LASSO-based approaches.
Integrating Deep Learning and Probabilistic Programming, Jan-Willem Van de Meent
A clear lesson from ongoing advances in deep learning is that large overparameterized models can achieve unsurpassed performance in settings where a sufficiently large amount
of data and computation are available. A much more open question is how we can improve the generalization properties of these models when we have a limited amount of data,
or a limited amount of labels. In this talk I will discuss how we can combine the principles of probabilistic programming with those of deep learning to design models that
incorporate inductive biases that aid generalization. I will provide examples of models that can be trained in an unsupervised or semi-supervised manner to learn structured
representations of interpretable variables of interest. I will also discuss ongoing research to improve scalability of inference and evaluate generalization properties of learned models.
Approximating and Manipulating Probability Distributions with Transport, Justin Solomon
The theory of optimal transport defines a metric on the space of probability distributions that lifts the metric of the underlying geometric domain. The structure of this geometry on the space of probability distributions has several favorable properties for tasks in inference and learning. In this talk, I will introduce recent work applying transport to a variety of computational tasks in learning and statistics, including computation of coresets for efficient and approximate learning, distributionally-robust learning in the semi-supervised setting, and overcoming symmetry issues in Bayesian inference. I also will discuss computational techniques employed to overcome the computational cost of evaluating and manipulating transport distances in practice. [Joint work with E. Chien, S. Claici, C. Frogner, F. Mirzazadeh, P. Monteiller, and M. Yurochkin]
Probabilistic Programming and Artificial Intelligence, Vikash Mansingka
Probabilistic programming is an emerging field at the intersection of programming languages, probability theory, and artificial intelligence. This talk will show how to use recently
developed probabilistic programming languages to build systems for robust 3D computer vision, without requiring any labeled training data; for automatic modeling of complex real-world time series;
and for machine-assisted analysis of experimental data in synthetic biology that is too small and messy for standard approaches from machine learning and statistics.
This talk will use these applications to illustrate recent technical innovations in probabilistic programming that formalize and unify modeling approaches from multiple eras of AI, including
generative models, neural networks, symbolic programs, causal Bayesian networks, and hierarchical Bayesian modeling. Specifically, it will present languages in which models are represented using
executable code, and in which inference is programmable using novel constructs for Monte Carlo, optimization-based, and neural inference. It will also present techniques for Bayesian learning of
probabilistic program structure and parameters from real-world data. Finally, this talk will review challenges and research opportunities in the development and use of general-purpose probabilistic
programming languages that performant enough and flexible enough for real-world AI engineering.