Biased inferences, in wide data regime select bias Add prior regulidr system gives math added to help Likely what we learned in total qauntifies our info Any stat question via manipulation of posterior Resort to an expectation reduced t0 computing an interval We do numerical wppriximation As calculating exact is hard in D To find expectation, identify where to focus our computation where is most contribution to those expectations. Interesting density consider the volume (over that density) High F lots of corners, hard Volume increase fast exponentially 2 competing forces 1 volume wanna focus on large q 2 Density focus on mode OMG (I. E. Normal) balances out in middle Region concentration is the typical set. Look at surface around the mode Markov chain : a way of finding exploring sets like that It's a random function tao After jump next time it will be a new distribution of points We get a Markov chain If we can engineer Markov chain to preserve our target distribution Markov make us humans to typical set (start exploring that surface) In m d every point is far from the typical set End nice quantification of where probability really is To compute any function average it over Markov chain history I. R Markov chain Monte Carlo mcmc Long enough ensure we always converge to the true expectation (always right answer) Q how well can we do it? 2 how quick we converge to true expectation?.if transitions expensive like in white data Exhaust computational resources long before we complete the exploration Partial exploration means biased (missing probability) lots of mcmc aloha like that Metropolis 1 proposal:add some noise 2 decide: accept reject proposal (based on where we come) If closer to mode. The. Ccwp it If away from Mode, we reject it In MD volume is weird it doesn't scale. Outside typical set there is more volume Only way is to shrink size of perturbation to a really small neighborhood We won't go any where, just a tiny transition End up v inefficient exploration, v poor mcmc So avoid guessing checking p acceptance is v small Use transition knows shape of our surface (how to stay on the contour?) Need of automation How extract info about the surface? Hamiltonian mcmc uses diff geometry Use vector field : assign direction to vectors if direction is right, don't guess anymore! Hence all new points lead to others on the same typical set How: look at density of target fun Take gradient of that function Gradient is also a vector field If we follow it it leads to mode (unuseful) Potentially correct gradient Differential geometry automatically correct the gradient Physics planet orbit & it's field Missing momentum transverse motion keeps us from falling Too much momentum gravity won't catch us at all?! Key add momenta in the right way For all parameter q, add expand a momentum 2 lift up target distribution on this space Find prob. Structure pi( p q) How by conditional distribution (for the momenta ) End join distribution, over momenta and distributions) I always recover target distribution I can project it down, get rid of momenta use symplectic integrator can bound errors, transformation required from exact o approximate Calculate how accurate the solution is by integrating over all deviations Solution I'll n between cost of algorithm, and step size End up getting lower bound upper bound (of error) x avg acceptance prob. Y = cost For almost all models relationship is bounded between. 2 lines 0.6 0.8 solution is near flat, near optimal Choose step size so that avg x Aziz in. 0.6 0.8 Intuition hoe to 1 choose kinetic energy 2 choose integration time 3 step size Fully automated Devouple 2 steps of inference 1 modeling step we choose prior likelihood 2 computation step: compute those expectations 2 step size smaller No step size work Changing your model reimplementing in different way or recharging your priors After ensures exact computation of necessary gradient 1 control stmts if else 2 prob. Density functions PDF Cdf 3 linear algebra addition multiplication decomposition 4 ode (nonstiff stiff) Space equipped with Lie group to give a flow typical set is meausre preserving flow Adibotic Monte Carlo multi modal distribution
suuuucchhhh a great talk. really clear, thank you.
Loved the talk, helped understand the intuition of HMC. Thanks
Excellent talk; thank you. And yes, to respond to your question at the end, it was that clear.
Great talk, thank you!
Amazing talk.
?watson
Are your slides available? perhaps with the lecture transcript for each slide?
Brands Hatch=Sim C egg,sam,pools? (Monile Radiation)
[ibm 'hmc'?]
Biased inferences, in wide data regime select bias
Add prior regulidr system gives math added to help
Likely what we learned in total qauntifies our info
Any stat question via manipulation of posterior
Resort to an expectation reduced t0 computing an interval
We do numerical wppriximation
As calculating exact is hard in D
To find expectation, identify where to focus our computation where is most contribution to those expectations.
Interesting density consider the volume (over that density)
High F lots of corners, hard
Volume increase fast exponentially
2 competing forces
1 volume wanna focus on large q
2 Density focus on mode
OMG (I. E. Normal) balances out in middle
Region concentration is the typical set. Look at surface around the mode
Markov chain : a way of finding exploring sets like that
It's a random function tao
After jump next time it will be a new distribution of points
We get a Markov chain
If we can engineer Markov chain to preserve our target distribution
Markov make us humans to typical set (start exploring that surface)
In m d every point is far from the typical set
End nice quantification of where probability really is
To compute any function average it over Markov chain history I. R Markov chain Monte Carlo mcmc
Long enough ensure we always converge to the true expectation
(always right answer)
Q how well can we do it?
2 how quick we converge to true expectation?.if transitions expensive like in white data
Exhaust computational resources long before we complete the exploration
Partial exploration means biased (missing probability) lots of mcmc aloha like that
Metropolis
1 proposal:add some noise
2 decide: accept reject proposal (based on where we come)
If closer to mode. The. Ccwp it
If away from Mode, we reject it
In MD volume is weird it doesn't scale. Outside typical set there is more volume
Only way is to shrink size of perturbation to a really small neighborhood
We won't go any where, just a tiny transition
End up v inefficient exploration, v poor mcmc
So avoid guessing checking p acceptance is v small
Use transition knows shape of our surface (how to stay on the contour?)
Need of automation
How extract info about the surface?
Hamiltonian mcmc uses diff geometry
Use vector field : assign direction to vectors if direction is right, don't guess anymore! Hence all new points lead to others on the same typical set
How: look at density of target fun
Take gradient of that function
Gradient is also a vector field
If we follow it it leads to mode (unuseful)
Potentially correct gradient
Differential geometry automatically correct the gradient
Physics planet orbit & it's field
Missing momentum transverse motion keeps us from falling
Too much momentum gravity won't catch us at all?!
Key add momenta in the right way
For all parameter q, add expand a momentum
2 lift up target distribution on this space
Find prob. Structure pi( p q)
How by conditional distribution
(for the momenta )
End join distribution, over momenta and distributions)
I always recover target distribution
I can project it down, get rid of momenta
use symplectic integrator can bound errors, transformation required from exact o approximate
Calculate how accurate the solution is by integrating over all deviations
Solution I'll n between cost of algorithm, and step size
End up getting lower bound upper bound (of error) x avg acceptance prob.
Y = cost
For almost all models relationship is bounded between. 2 lines
0.6 0.8 solution is near flat, near optimal
Choose step size so that avg x Aziz in. 0.6 0.8
Intuition hoe to
1 choose kinetic energy
2 choose integration time
3 step size
Fully automated
Devouple 2 steps of inference
1 modeling step we choose prior likelihood
2 computation step: compute those expectations
2 step size smaller
No step size work
Changing your model reimplementing in different way or recharging your priors
After ensures exact computation of necessary gradient
1 control stmts if else
2 prob. Density functions PDF Cdf
3 linear algebra addition multiplication decomposition
4 ode (nonstiff stiff)
Space equipped with Lie group to give a flow
typical set is meausre preserving flow
Adibotic Monte Carlo
multi modal distribution