Importance samplingis aMonte Carlo methodfor evaluating properties of a particulardistribution,while only having samples generated from a different distribution than the distribution of interest. Its introduction in statistics is generally attributed to a paper byTeun KloekandHerman K. van Dijkin 1978,[1]but its precursors can be found instatistical physicsas early as 1949.[2][3]Importance sampling is also related toumbrella samplingincomputational physics.Depending on the application, the term may refer to the process of sampling from this alternative distribution, the process of inference, or both.

Basic theory

edit

Letbe arandom variablein someprobability space.We wish to estimate theexpected valueofunder,denoted.If we have statistically independent random samples,generated according to,then an empirical estimate ofis just

and the precision of this estimate depends on the variance of:

The basic idea of importance sampling is to sample from a different distribution to lower the variance of the estimation of,or when sampling directly fromis difficult.

This is accomplished by first choosing a random variablesuch thatand that-almost everywhere. With the variablewe define a probabilitythat satisfies

The variablewill thus be sampled underto estimateas above and this estimation is improved when

Whenis of constant sign over,the best variablewould clearly be,so thatis the searched constantand a single sample undersuffices to give its value. Unfortunately we cannot take that choice, becauseis precisely the value we are looking for! However this theoretical best casegives us an insight into what importance sampling does: for all,the density ofatcan be written as

To the right,is one of the infinitesimal elements that sum up to:

therefore, a good probability changein importance sampling will redistribute the law ofso that its samples' frequencies are sorted directly according to their weights in.Hence the name "importance sampling."

Importance sampling is often used as aMonte Carlo integrator. Whenis the uniform distribution over,the expectationcorresponds to the integral of the real function.


Application to probabilistic inference

edit

Such methods are frequently used to estimate posterior densities or expectations in state and/or parameter estimation problems in probabilistic models that are too hard to treat analytically. Examples includeBayesian networksand importance weightedvariational autoencoders.[4]

Application to simulation

edit

Importance samplingis avariance reductiontechnique that can be used in theMonte Carlo method.The idea behind importance sampling is that certain values of the inputrandom variablesin asimulationhave more impact on the parameter being estimated than others. If these "important"values are emphasized by sampling more frequently, then theestimatorvariance can be reduced. Hence, the basic methodology in importance sampling is to choose a distribution which "encourages" the important values. This use of "biased" distributions will result in a biased estimator if it is applied directly in the simulation. However, the simulation outputs are weighted to correct for the use of the biased distribution, and this ensures that the new importance sampling estimator is unbiased. The weight is given by thelikelihood ratio,that is, theRadon–Nikodym derivativeof the true underlying distribution with respect to the biased simulation distribution.

The fundamental issue in implementing importance sampling simulation is the choice of the biased distribution which encourages the important regions of the input variables. Choosing or designing a good biased distribution is the "art" of importance sampling. The rewards for a good distribution can be huge run-time savings; the penalty for a bad distribution can be longer run times than for a general Monte Carlo simulation without importance sampling.

Considerto be the sample andto be the likelihood ratio, whereis the probability density (mass) function of the desired distribution andis the probability density (mass) function of the biased/proposal/sample distribution. Then the problem can be characterized by choosing the sample distributionthat minimizes the variance of the scaled sample:

It can be shown that the following distribution minimizes the above variance:[5]

Notice that when,this variance becomes 0.

Mathematical approach

edit

Consider estimating by simulation the probabilityof an event,whereis a random variable withcumulative distribution functionandprobability density function,where prime denotesderivative.A-lengthindependent and identically distributed(i.i.d.) sequenceis generated from the distribution,and the numberof random variables that lie above the thresholdare counted. The random variableis characterized by theBinomial distribution

One can show that,and,so in the limitwe are able to obtain.Note that the variance is low if.Importance sampling is concerned with the determination and use of an alternate density function(for), usually referred to as a biasing density, for the simulation experiment. This density allows the eventto occur more frequently, so the sequence lengthsgets smaller for a givenestimatorvariance. Alternatively, for a given,use of the biasing density results in a variance smaller than that of the conventional Monte Carlo estimate. From the definition of,we can introduceas below.

where

is a likelihood ratio and is referred to as the weighting function. The last equality in the above equation motivates the estimator

This is the importance sampling estimator ofand is unbiased. That is, the estimation procedure is to generate i.i.d. samples fromand for each sample which exceeds,the estimate is incremented by the weightevaluated at the sample value. The results are averaged overtrials. The variance of the importance sampling estimator is easily shown to be

Now, the importance sampling problem then focuses on finding a biasing densitysuch that the variance of the importance sampling estimator is less than the variance of the general Monte Carlo estimate. For some biasing density function, which minimizes the variance, and under certain conditions reduces it to zero, it is called an optimal biasing density function.

Conventional biasing methods

edit

Although there are many kinds of biasing methods, the following two methods are most widely used in the applications of importance sampling.

Scaling

edit

Shifting probability mass into the event regionby positive scaling of the random variablewith a number greater than unity has the effect of increasing the variance (mean also) of the density function. This results in a heavier tail of the density, leading to an increase in the event probability. Scaling is probably one of the earliest biasing methods known and has been extensively used in practice. It is simple to implement and usually provides conservative simulation gains as compared to other methods.

In importance sampling by scaling, the simulation density is chosen as the density function of the scaled random variable,where usuallyfor tail probability estimation. By transformation,

and the weighting function is

While scaling shifts probability mass into the desired event region, it also pushes mass into the complementary regionwhich is undesirable. Ifis a sum ofrandom variables, the spreading of mass takes place in andimensional space. The consequence of this is a decreasing importance sampling gain for increasing,and is called the dimensionality effect. A modern version of importance sampling by scaling is e.g. so-called sigma-scaled sampling (SSS) which is running multiple Monte Carlo (MC) analysis with different scaling factors. In opposite to many other high yield estimation methods (like worst-case distances WCD) SSS does not suffer much from the dimensionality problem. Also addressing multiple MC outputs causes no degradation in efficiency. On the other hand, as WCD, SSS is only designed for Gaussian statistical variables, and in opposite to WCD, the SSS method is not designed to provide accurate statistical corners. Another SSS disadvantage is that the MC runs with large scale factors may become difficult, e. g. due to model and simulator convergence problems. In addition, in SSS we face a strong bias-variance trade-off: Using large scale factors, we obtain quite stable yield results, but the larger the scale factors, the larger the bias error. If the advantages of SSS does not matter much in the application of interest, then often other methods are more efficient.

Translation

edit

Another simple and effective biasing technique employs translation of the density function (and hence random variable) to place much of its probability mass in the rare event region. Translation does not suffer from a dimensionality effect and has been successfully used in several applications relating to simulation ofdigital communicationsystems. It often provides better simulation gains than scaling. In biasing by translation, the simulation density is given by

whereis the amount of shift and is to be chosen to minimize the variance of the importance sampling estimator.

Effects of system complexity

edit

The fundamental problem with importance sampling is that designing good biased distributions becomes more complicated as the system complexity increases. Complex systems are the systems with long memory since complex processing of a few inputs is much easier to handle. This dimensionality or memory can cause problems in three ways:

In principle, the importance sampling ideas remain the same in these situations, but the design becomes much harder. A successful approach to combat this problem is essentially breaking down a simulation into several smaller, more sharply defined subproblems. Then importance sampling strategies are used to target each of the simpler subproblems. Examples of techniques to break the simulation down are conditioning and error-event simulation (EES) and regenerative simulation.

Evaluation of importance sampling

edit

In order to identify successful importance sampling techniques, it is useful to be able to quantify the run-time savings due to the use of the importance sampling approach. The performance measure commonly used is,and this can be interpreted as the speed-up factor by which the importance sampling estimator achieves the same precision as the MC estimator. This has to be computed empirically since the estimator variances are not likely to be analytically possible when their mean is intractable. Other useful concepts in quantifying an importance sampling estimator are the variance bounds and the notion of asymptotic efficiency. One related measure is the so-calledEffective Sample Size(ESS).[6]

Variance cost function

edit

Variance is not the only possiblecost functionfor a simulation, and other cost functions, such as the mean absolute deviation, are used in various statistical applications. Nevertheless, the variance is the primary cost function addressed in the literature, probably due to the use of variances inconfidence intervalsand in the performance measure.

An associated issue is the fact that the ratiooverestimates the run-time savings due to importance sampling since it does not include the extra computing time required to compute the weight function. Hence, some people evaluate the net run-time improvement by various means. Perhaps a more serious overhead to importance sampling is the time taken to devise and program the technique and analytically derive the desired weight function.

Multiple and adaptive importance sampling

edit

When different proposal distributions,,are jointly used for drawing the samplesdifferent proper weighting functions can be employed (e.g., see[7][8][9][10]). In an adaptive setting, the proposal distributions,,andare updated each iterationof the adaptive importance sampling algorithm. Hence, since a population of proposal densities is used, several suitable combinations of sampling and weighting schemes can be employed.[11][12][13][14][15][16][17]

See also

edit

Notes

edit
  1. ^Kloek, T.; van Dijk, H. K. (1978)."Bayesian Estimates of Equation System Parameters: An Application of Integration by Monte Carlo"(PDF).Econometrica.46(1):1–19.doi:10.2307/1913641.JSTOR1913641.
  2. ^Goertzle, G.(1949). "Quota Sampling and Importance Functions in Stochastic Solution of Particle Problems".Technical Report ORNL-434, Oak Ridge National Laboratory.Aecd; 2793.hdl:2027/mdp.39015086443671.
  3. ^Kahn, H.;Harris, T. E.(1949). "Estimation of Particle Transmission by Random Sampling".Monte Carlo Method.Applied Mathematics Series.12.National Bureau of Standards.:27–30.
  4. ^Burda, Yuri; Grosse, Roger; Salakhutdinov, Ruslan (2016). "Importance Weighted Autoencoders".Proceedings of the 4th International Conference on Learning Representations (ICLR).arXiv:1509.00519.
  5. ^Rubinstein, R. Y., &Kroese, D. P.(2011). Simulation and the Monte Carlo method (Vol. 707). John Wiley & Sons.
  6. ^Martino, Luca; Elvira, Víctor; Louzada, Francisco (2017). "Effective sample size for importance sampling based on discrepancy measures".Signal Processing.131:386–401.arXiv:1602.03572.doi:10.1016/j.sigpro.2016.08.025.S2CID26317735.
  7. ^Veach, Eric; Guibas, Leonidas J. (1995-01-01)."Optimally combining sampling techniques for Monte Carlo rendering".Proceedings of the 22nd annual conference on Computer graphics and interactive techniques - SIGGRAPH '95.New York, NY, USA: ACM. pp.419–428.CiteSeerX10.1.1.127.8105.doi:10.1145/218380.218498.ISBN978-0-89791-701-8.S2CID207194026.
  8. ^Owen, Art; Associate, Yi Zhou (2000-03-01). "Safe and Effective Importance Sampling".Journal of the American Statistical Association.95(449):135–143.CiteSeerX10.1.1.36.4536.doi:10.1080/01621459.2000.10473909.ISSN0162-1459.S2CID119761472.
  9. ^Elvira, V.; Martino, L.; Luengo, D.; Bugallo, M.F. (2015-10-01). "Efficient Multiple Importance Sampling Estimators".IEEE Signal Processing Letters.22(10):1757–1761.arXiv:1505.05391.Bibcode:2015ISPL...22.1757E.doi:10.1109/LSP.2015.2432078.ISSN1070-9908.S2CID14504598.
  10. ^Elvira, Víctor; Martino, Luca; Luengo, David; Bugallo, Mónica F. (2017). "Improving population Monte Carlo: Alternative weighting and resampling schemes".Signal Processing.131:77–91.arXiv:1607.02758.doi:10.1016/j.sigpro.2016.07.012.S2CID205171823.
  11. ^Cappé, O.; Guillin, A.; Marin, J. M.; Robert, C. P. (2004-12-01). "Population Monte Carlo".Journal of Computational and Graphical Statistics.13(4):907–929.doi:10.1198/106186004X12803.ISSN1061-8600.S2CID119690181.
  12. ^Martino, L.; Elvira, V.; Luengo, D.; Corander, J. (2017-05-01). "Layered adaptive importance sampling".Statistics and Computing.27(3):599–623.arXiv:1505.04732.doi:10.1007/s11222-016-9642-5.ISSN0960-3174.S2CID2508031.
  13. ^Cappé, Olivier; Douc, Randal; Guillin, Arnaud; Marin, Jean-Michel; Robert, Christian P. (2008-04-25). "Adaptive importance sampling in general mixture classes".Statistics and Computing.18(4):447–459.arXiv:0710.4242.doi:10.1007/s11222-008-9059-x.ISSN0960-3174.S2CID483916.
  14. ^Cornuet, Jean-Marie; Marin, Jean-Michel;Mira, Antonietta;Robert, Christian P. (2012-12-01). "Adaptive Multiple Importance Sampling".Scandinavian Journal of Statistics.39(4):798–812.arXiv:0907.1254.doi:10.1111/j.1467-9469.2011.00756.x.ISSN1467-9469.S2CID17191248.
  15. ^Martino, L.; Elvira, V.; Luengo, D.; Corander, J. (2015-08-01). "An Adaptive Population Importance Sampler: Learning From Uncertainty".IEEE Transactions on Signal Processing.63(16):4422–4437.Bibcode:2015ITSP...63.4422M.CiteSeerX10.1.1.464.9395.doi:10.1109/TSP.2015.2440215.ISSN1053-587X.S2CID17017431.
  16. ^Bugallo, Mónica F.; Martino, Luca; Corander, Jukka (2015-12-01)."Adaptive importance sampling in signal processing".Digital Signal Processing.Special Issue in Honour of William J. (Bill) Fitzgerald.47:36–49.doi:10.1016/j.dsp.2015.05.014.
  17. ^Bugallo, M. F.; Elvira, V.; Martino, L.; Luengo, D.; Miguez, J.; Djuric, P. M. (July 2017). "Adaptive Importance Sampling: The past, the present, and the future".IEEE Signal Processing Magazine.34(4):60–79.Bibcode:2017ISPM...34...60B.doi:10.1109/msp.2017.2699226.ISSN1053-5888.S2CID5619054.

References

edit
  • Arouna, Bouhari (2004). "Adaptative Monte Carlo Method, A Variance Reduction Technique".Monte Carlo Methods and Their Applications.10(1):1–24.doi:10.1515/156939604323091180.S2CID21949573.
  • Bucklew, James Antonio (2004).Introduction to Rare Event Simulation.New York: Springer-Verlag.
  • Doucet, A.; de Freitas, N.; Gordon, N. (2001).Sequential Monte Carlo Methods in Practice.Springer.ISBN978-0-387-95146-1.
  • Ferrari, M.; Bellini, S. (2001). "Importance sampling simulation of turbo product codes".ICC 2001. IEEE International Conference on Communications. Conference Record (Cat. No.01CH37240).Vol. 9. pp.2773–2777.doi:10.1109/ICC.2001.936655.ISBN978-0-7803-7097-5.S2CID5158473.
  • Mazonka, Oleg (2016)."Easy as Pi: The Importance Sampling Method".Journal of Reference.16.
  • Oberg, Tommy (2001).Modulation, Detection, and Coding.New York: John Wiley & Sons.
  • Press, WH; Teukolsky, SA; Vetterling, WT; Flannery, BP (2007)."Section 7.9.1 Importance Sampling".Numerical Recipes: The Art of Scientific Computing(3rd ed.). New York: Cambridge University Press.ISBN978-0-521-88068-8.Archived fromthe originalon 2011-08-11.Retrieved2011-08-12.
  • Ripley, B. D. (1987).Stochastic Simulation.Wiley & Sons.
  • Smith, P. J.; Shafi, M.; Gao, H. (1997). "Quick simulation: A review of importance sampling techniques in communication systems".IEEE Journal on Selected Areas in Communications.15(4):597–613.doi:10.1109/49.585771.
  • Srinivasan, R. (2002).Importance sampling – Applications in communications and detection.Berlin: Springer-Verlag.
edit