不确定性量化方法学习-概率分布

Binary Variables
Multinomial Variables
The Gaussian Distribution
The Exponential Family

Binary Variables

Coin flipping: heads=1, tails=0 $$p(x=1|\mu) = \mu$$

Bernoulli Distribution

One coin flipping $$Bern(x|\mu) = \mu^x(1-\mu)^{1-x}$$

Bernoulli Family

来自瑞士巴塞尔的商人与学者家族

雅各布·伯努利（1654–1705），在概率方面的贡献突出，以伯努利分布而闻名。
约翰·伯努利（1667–1748），雅各布的弟弟，以最速降线而闻名，教授过欧拉。变分法开创者。把洛必达法则卖给洛必达。
丹尼尔·伯努利（1700–1782），约翰的小儿子，在流体力学方面贡献突出，以伯努利定律而闻名。

Probability Density Function

For discrete distributions, the pdf is also known as the probability mass function (pmf). $$ f(x | p)= \begin{cases}1-p, & x=0 \newline p, & x=1 \end{cases} $$

Cumulative Distribution Function

$$ F(x | p)= \begin{cases}1-p, & x=0 \newline p, & x=1 \end{cases} $$

Descriptive Statistics

$$ \begin{aligned} \mathbb{E}[x] &=\mu \newline \operatorname{var}[x] &=\mu(1-\mu) \end{aligned} $$

Examples

The Bernoulli distribution is a special case of the binomial distribution, where N = 1.

1p = 0.8;
2x = 0:1;
3y = binopdf(0:1,1,p);
4bar(x,y,1)
5xlabel('Observation')
6ylabel('Probability')

Binomial Distribution

N coin flips: $p(m heads|N,\mu)$

$$ \operatorname{Bin}(m \mid N, \mu)=\left(\begin{array}{l} N \newline m \end{array}\right) \mu^{m}(1-\mu)^{N-m} $$

Probability Density Function

For discrete distributions, the pdf is also known as the probability mass function (pmf). $$ f(x \mid N, p)=\left(\begin{array}{c} N \newline x \end{array}\right) p^{x}(1-p)^{N-x} \quad ; \quad x=0,1,2, \ldots, N $$

Cumulative Distribution Function

$$ F(x \mid N, p)=\sum_{i=0}^{x}\left(\begin{array}{c} N \newline i \end{array}\right) p^{i}(1-p)^{N-i} \quad ; \quad x=0,1,2, \ldots, N $$

Descriptive Statistics

$$ \begin{aligned} \mathbb{E}[m] &=N\mu \newline \operatorname{var}[m] &=N\mu(1-\mu) \end{aligned} $$

Examples

The Bernoulli distribution is a special case of the binomial distribution, where N = 1.

1x = 0:10;
2y = binopdf(x,10,0.25);
3bar(x,y,1)
4xlabel('Observation')
5ylabel('Probability')

MLE(maximum likelihood estimator) for binomial

max $\ln p(\mathcal{D}|\mu)$ The MLE is just the sample mean: $\mu_{ML} = 1/N \sum_{n=1}^{N}x_n$

Over-fitting problem

Underestimate the variance

Bayes treatment of Binary Variables

the beta distribution

$$ \operatorname{Beta}(\mu \mid a, b)=\frac{\Gamma(a+b)}{\Gamma(a) \Gamma(b)} \mu^{a-1}(1-\mu)^{b-1} $$

Gamma function is defined by

Gamma function

$$ \Gamma(z)=\int_{0}^{\infty} x^{z-1} e^{-x} d x $$ It is known as the Euler integral of the second kind.

Given that $\Gamma(1)=1$ and $\Gamma(n+1)=n \Gamma(n)$, $\Gamma(n)=1 \cdot 2 \cdot 3 \cdots(n-1)=(n-1) !$

we can use beta function here as well

Beta function

$$ \mathrm{B}(x, y)=\int_{0}^{1} t^{x-1}(1-t)^{y-1} d t $$

The beta function is also called the Euler integral of the first kind.

It is symmetric,$B(x,y) = B(y,x)$
It is related to Gamma function $B(x,y) =\Gamma(x)\Gamma(y)/\Gamma(x+y))$

then we can rewrite the Beta distribution with Beta function

$$ \operatorname{Beta}(\mu \mid a, b)=\frac{1}{B(x,y)} \mu^{a-1}(1-\mu)^{b-1} $$

The Beta distribution provides the conjugate prior for the Bernoulli distribution. ???

Descriptive Statistics

$$ \begin{aligned} \mathbb{E}[\mu] &=\frac{a}{a+b}\newline \operatorname{var}[\mu] &=\frac{ab}{(a+b)^2(a+b+1)} \end{aligned} $$

hyperparameters

a and b are also called hyperparameters.

 1X = 0:.01:1;
 2y1 = betapdf(X,0.1,0.1);
 3y2 = betapdf(X,1,1);
 4y3 = betapdf(X,2,3);
 5y4 = betapdf(X,8,4);
 6
 7figure
 8plot(X,y1,'Color','r','LineWidth',2)
 9hold on
10plot(X,y2,'LineStyle','-.','Color','b','LineWidth',2)
11plot(X,y3,'LineStyle',':','Color','g','LineWidth',2)
12plot(X,y4,'LineStyle',':','Color','k','LineWidth',2)
13legend({'a = b = 0.1','a = b = 1','a =2,b = 3','a =8,b = 4'},'Location','NorthEast');
14hold off

Prior·Likelihood = Posterior

Posterior =

$$ \begin{aligned} p\left(\mu \mid a_{0}, b_{0}, \mathcal{D}\right) & \propto p(\mathcal{D} \mid \mu) p\left(\mu \mid a_{0}, b_{0}\right) \newline &=\left(\prod_{n=1}^{N} \mu^{x_{n}}(1-\mu)^{1-x_{n}}\right) \operatorname{Beta}\left(\mu \mid a_{0}, b_{0}\right) \newline & \propto \mu^{m+a_{0}-1}(1-\mu)^{(N-m)+b_{0}-1} \newline & \propto \operatorname{Beta}\left(\mu \mid a_{N}, b_{N}\right) \end{aligned} $$

where $a_N = a_0 +m$, $b_N = b_0 +(N-m)$

 1X = 0:.01:1;
 2y1 = betapdf(X,2,2);
 3y2 = betapdf(X,3,2);
 4y3 = 1/1 * X;
 5y4 = y1.*y3;
 6
 7figure
 8subplot(2,2,1)
 9plot(X,y1,'Color','r','LineWidth',2);title('prior,a = 2, b = 2');ylim([0,2]);
10subplot(2,2,2)
11plot(X,y2,'LineStyle','-.','Color','b','LineWidth',2);title('Beta a = 3, b = 2');ylim([0,2]);
12subplot(2,2,3)
13plot(X,y3,'LineStyle',':','Color','g','LineWidth',2);title('likelihood function');ylim([0,2]);
14subplot(2,2,4)
15plot(X,y4,'LineStyle','--','Color','k','LineWidth',2);title('posterior');ylim([0,2]);

不确定性量化方法学习-概率分布

Table Of Contents

Binary Variables

Bernoulli Distribution

Bernoulli Family

Probability Density Function

Cumulative Distribution Function

Descriptive Statistics

Examples

Binomial Distribution

Probability Density Function

Cumulative Distribution Function

Descriptive Statistics

Examples

MLE(maximum likelihood estimator) for binomial

Over-fitting problem

Bayes treatment of Binary Variables

the beta distribution

Gamma function

Beta function

Descriptive Statistics

hyperparameters

Prior·Likelihood = Posterior

Mulitinomial Variables

1-of-K scheme

600 Words|This article has been read times

不确定性量化方法学习-概率分布

Table Of Contents

Binary Variables

Bernoulli Distribution

Bernoulli Family

Probability Density Function

Cumulative Distribution Function

Descriptive Statistics

Examples

Binomial Distribution

Probability Density Function

Cumulative Distribution Function

Descriptive Statistics

Examples

MLE(maximum likelihood estimator) for binomial

Over-fitting problem

Bayes treatment of Binary Variables

the beta distribution

Gamma function

Beta function

Descriptive Statistics

hyperparameters

Prior·Likelihood = Posterior

Mulitinomial Variables

1-of-K scheme

Related Posts

600 Words|This article has been read times