@daixuan1996 2015-01-03T08:25:09.000000Z 字数 11139 阅读 1682

概率统计笔记 CH4

Continuous Random Variables and Probability Distributions

连续随机变量与概率分布

概率统计

Continuous random variables and probability density functions(概率密度函数)
- Continuous random variables(连续随机变量)
  - A random variable X is said to be continuous if its set of possible values is an entire interval of numbers-that is, if for some A
- Probability distributions for continuous variables
  - The total area of all rectangles is therefore 1.
  - Let X be a continuous rv. Then a probability distribution or probability density function - pdf(概率密度函数) of X is f(x) such that for any two numbers a and b with $a\le b$
    $P (a \leq X \leq b) = \int b a f (x) d x$ $P(a \le X \le b) = \int_a^b f(x)dx$
  - That is, the probability that X takes on a value in the interval [a,b] is the area under the graph of the density function, as illustrated in the figure below:
  - The graph of f(x) is often referred to as the density curve(密度曲线).
  - For f(x) to be a legitimate pdf, it must satisfy the following two conditions:
    1. $f(x) \ge 0\ for\ all\ x$
    2. $\int_{-\infty}^{\infty}f(x)dx =\ area\ under\ the\ entire\ graph\ of\ f(x) = 1$
  - A continuous rv X is said to have a uniform distribution(均匀分布) on the interval [A,B] if the pdf of X is
    $f (x; A, B) = ⎧ ⎩ ⎨ 1 B - A 0 A \leq x \leq B o t h e r w i s e$ $\begin{equation} f(x;A,B) = \begin{cases} \frac{1}{B-A} & A \le x \le B \\ 0 & otherwise \end{cases} \end{equation}$
  - If X is a continuous rv, then for any number c, P(X=c)=0. Furthermore, for any two numbers a and b with a < b, $P(a\le b) = P(a < X < b)$
Cumulative Distribution Functions and Expected Values
- The cumulative distribution function F(x) for a continuous rv X is defined for every number x by
  F(x)=P(X≤x)=∫x−∞f(y)dy
  - For each x, F(x) is the area under the density curve to the left of x.
- Using F(x) to Compute Probabilities
  - Let X be a continuous rv with cdf F(x). Then for any number a, $P(X>a) = 1-F(a)$ and for any two numbers a and b with a < b, $P(a\le X \le b) = F(b) - F(a)$
- Obtaining f(x) from F(x)
  - If X is a continuous rv with pdf f(x) and cdf F(x), then at every x at which the derivative(导数) F'(x) exists , $F'(x) = f(x)$ .
- Percentiles(百分位) of a Continuous Distribution
  - Let p be a number between 0 and 1 . The (100p)th percentile of the distribution of a continuous rv X , denoted by $\eta(p)$ , is defined by
    $p = F (η (p)) = \int η (p) - \infty f (y) d y$ $p = F(\eta(p)) = \int_{-\infty}^{\eta(p)}f(y)dy$
  - The median of a continuous distribution , denoted by $\tilde{\mu}$ , is the 50th percentile, so $\tilde{\mu}$ satisfies $0.5 = F(\tilde{\mu})$ . That is, half the area under the density curve is to the left of $\tilde{\mu}$ and half is to the right of $\tilde{\mu}$ .
- Expected Values for Continuous Random Variables
  - The expected or mean value of a continuous rv X with pdf f(x) is $\mu_x = E(X) = \int_{-\infty}^{\infty}xf(x)dx$
  - If X is a continuous rv with pdf f(x) and h(X) is any function of X, then $E[h(X)] = \mu_{h(x)} = \int_{-\infty}^{\infty}h(x)f(x)dx$
- The Variance of a Continuous Random Variable
  - The variance of a continuous random variable X with pdf f(x) and mean value μ is
    $σ 2 x = V (X) = \int + \infty - \infty (x - μ) 2 f (x) d x = E [(X - μ) 2]$ $\sigma_x^2 = V(X) = \int_{-\infty}^{+\infty}(x-\mu)^2 f(x)dx = E[(X-\mu)^2]$
  - As in the discrete case, we can calculate the variance following the formula $V (X) = E (X 2) - [E (X)] 2$ $V(X) = E(X^2) - [E(X)]^2$
The Normal Distribution(正态分布) ★☆
- Many numerical populations have distributions that can be fit very closely by an appropriate normal curve.
- A continuous rv X is said to have a normal distribution(正态分布) with parameters $\mu$ and $\sigma$ (or $\mu$ and $\sigma^2$ ), where $-\infty < \mu < +\infty$ and $0 < \sigma$ , if the pdf of X is
  
  f(x;μ,σ)=12π−−√σe−(x−μ)2/(2σ2) −∞<x<∞
  - The statement that X is normally distributed with parameters μ and σ2 is often abbreviated $X ~ N(\mu, \sigma^2)$ .
  - The cdf of normal distribution is
    $F (x) = \int x - \infty 1 2 π - - \sqrt σ e - ( t - μ ) 2 2 σ 2 d t$ $F(x) = \int_{-\infty}^{x}\frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{(t-\mu)^2}{2\sigma^2}}dt$
  - To compute probability of $X\in(a,b)$
    $P (a \leq X \leq b) = \int a a 1 2 π - - \sqrt σ e - ( t - μ ) 2 2 σ 2 d x$ $P(a\le X\le b) = \int_a^a \frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{(t-\mu)^2}{2\sigma^2}}dx$
- The standard normal distribution(标准正态分布)
  - A random variable that has a normal distribution with a mean of zero and standard deviation of one is said to have a standard normal probability distribution ( $\mu=0,\sigma=1$ ).
  - The density function of standard normal distribution is
    $f (z; 0, 1) = 1 2 π - - \sqrt e - z 2 / 2, - \infty < z < \infty$ $f(z;0,1) = \frac{1}{\sqrt{2\pi}}e^{-z^2/2} \ \ \ ,-\infty < z < \infty$
  - The corresponding distribution function is
    $Φ (z) = \int - \infty z f (t) d t = \int - \infty z 1 2 π - - \sqrt e - t 2 / 2 d t$ $\Phi(z) = \int_{-\infty}{z}f(t)dt = \int_{-\infty}{z}\frac{1}{\sqrt{2\pi}}e^{-t^2/2}dt$
  - The Properties of standard normal distribution
    1. $\Phi(-z)=1-\Phi(z)$
    2. The density function φ(z) achieved maximum $\frac{1}{\sqrt{2\pi}}$ at z=0
    3. $\Phi(0) = 0.5$
    4. Its mean $\mu$ =0, its variance $\sigma^2$ =1
    5. $P(|X| \le z) = 2\Phi(z)-1$
      $P(|X| \ge z) = 2[1-\Phi(z)]$
- Percentiles of the Standard Normal Distribution
  - $P(X\le 99th percentile) = 0.99$
- $z_\alpha$ notation
  - $z_\alpha$ will denote the values on the measurement axis for which $\alpha$ of the area under the z curve lies to the right of $z_\alpha$
  - $P(X \ge z_\alpha) = \alpha$
- Nonstandard normal distributions
  - If X has the normal distribution with mean $\mu$ and standard deviation $\sigma$ , then $Z = \frac{X-\mu}{\sigma}$ has a standard normal distribution.
  - So the probabilities:
    $P (a \leq X \leq b) = P (a - μ σ \leq X \leq b - μ σ) Φ (b - μ σ) - Φ (a - μ σ)$ $P(a \le X \le b) = P(\frac{a-\mu}{\sigma}\le X\le \frac{b-\mu}{\sigma}) \Phi\left(\frac{b-\mu}{\sigma}\right) - \Phi\left(\frac{a-\mu}{\sigma}\right)$
- Percentiles of an arbitrary normal distribution
  - (100p)th percentile for normal( $\mu,\sigma)$ = $\mu$ +[(100p)th for standard normal]· $\sigma$
- The Normal Distribution and Discrete Populations
  - The correction for discreteness of the underlying distribution in the previous example is often called a continuity correction(连续校正).
  - P(X ≥ 125) -> P(X ≥ 124.5) P(X ≤ 125) -> P(X ≤ 125.5)
- The Normal Approximation to the Binomial Distribution
  - Let X be a binominal rv based on n trials with success probability p . Then if the binomial probability histogram is not too skewed, X has approximately a normal distribution with $\mu = np$ and $\sigma=\sqrt{npq}$ . In particular, for x = a possible value of X,
    $P (X \leq x) = B (x; n, p) \approx Φ (x + .5 - n p n p q - - - \sqrt)$ $P(X\le x) = B(x;n,p) \approx \Phi\left(\frac{x+.5-np}{\sqrt{npq}}\right)$
    The 0.5 is continuity correction.
  - In practice, the approximation is adequate provided that both np ≥ 10 and nq ≥ 10.
The Gamma Distribution and Its Relatives
- For $\alpha > 0$ , the gamma function is defined by
  
  Γ(α)=∫∞0xα−1e−xdx
  - The most important properties of the gamma function are the following: :
    1. For any $\alpha > 1,\ \Gamma(\alpha)=(\alpha-1)\Gamma(\alpha-1)$
    2. For any positive integer n, $\Gamma(n) = (n-1)!$
    3. $\Gamma(\frac12) = \sqrt{\pi}$
  - If we let $\begin{equation} f(x;\alpha) = \begin{cases} \frac{x^{\alpha-1}e^{-x}}{\Gamma(\alpha)} & x \ge 0 \\ 0 & otherwise \end{cases} \end{equation}$ , then the function satisfies the two properties of a pdf.
- The Family of Gamma Distributions
  - A continuous random variable X is said to have a gamma distribution if the pdf of X is $f (x; α β) = ⎧ ⎩ ⎨ 1 β α Γ ( α ) x α - 1 e - 1 β 0 x \geq 0 o t h e r w i s e$ $\begin{equation} f(x;\alpha \beta) = \begin{cases} \frac{1}{\beta^{\alpha}\Gamma(\alpha)}x^{\alpha-1}e{-\frac{1}{\beta}} & x \ge 0 \\ 0 & otherwise \end{cases} \end{equation}$
    where the parameters $\alpha$ and $\beta$ satisfy $\alpha>0$ , $\beta>0$ .
  - The standard gamma distribution has $\beta$ = 1.
  - $E(X) = \mu = \alpha\beta$
  - $V(X) = \sigma^2 = \alpha\beta^2$
  - When X is a standard gamma rv, the cdf of X
    $F (X; α) = \int x 0 y α - 1 e - y Γ ( α ) d y x > 0$ $F(X;\alpha) = \int_0^x\frac{y^{\alpha-1}e^{-y}}{\Gamma(\alpha)}dy\ \ \ x>0$
    is called the incomplete gamma function(不完全伽玛函数).
  - $P(X\le x) = F(x;\alpha,\beta)=F\left(\frac{x}{\beta};\alpha\right)$
- The Exponential Distribution(指数分布)
  - X is said to have an exponential distribution if the pdf of X is $\begin{equation} f(x;\lambda)= \begin{cases} \lambda e^{-\lambda x} & x \ge 0 \\ 0 & otherwise \end{cases} \end{equation}\ where\ \lambda>0$
    In fact, exponential distribution is a special gamma distribution!
  - $\mu = \alpha\beta = \frac1\lambda$
  - $\sigma^2 = \alpha\beta^2 = \frac{1}{\lambda^2}$
  - cdf of X is $$ F (x; λ) = {1 - e - λ x 0 x \geq 0 o t h e r w i s e$ $$\begin{equation} F(x;\lambda)= \begin{cases} 1-e^{-\lambda x} & x \ge 0 \\ 0 & otherwise \end{cases} \end{equation}$
- Application of the Exponential Distribution
  - Suppose that the number of events occurring in any time interval of length t has a Poisson distribution with parameter $αt$ and that numbers of occurrences in nonoverlapping intervals are independent of one another. Then the distribution of elapsed(消逝) time between the occurrence of two successive events is exponential with parameter $λ = α$ .
  - Another important application of the exponential distribution is to model the distribution of component lifetime. A partial reason for the popularity of such applications is the “memoryless” property of the exponential distribution.
    $P(X\ge t+t_0|X\ge t_0) = \frac{P(X\ge t+t_0)\bigcap(X\ge t_0)}{P(X\ge t_0)} = \frac{P(X\ge t+t_0)}{P(X\ge t_0)}=\frac{1-F(t+t_0;\lambda)}{1-F(t_0;\lambda)} = e^{-\lambda t}$
  - Thus, the distribution of additional lifetime is exactly the same as the original distribution of lifetime, so at each point in time the component shows no effect of wear. In other words, the distribution of remaining lifetime is independent of current age.
- The Chi-Squared Distribution(卡方分布)
  - Let ν be a positive integer. Then a random variable X is said to have a chi-squared distribution with parameter $ν$ if the pdf of X is the gamma density with $α = ν/2$ and $β = 2$ . The pdf of a chi-squared rv is thus $f (x; v) = ⎧ ⎩ ⎨ 1 2 v / 2 Γ ( v / 2 ) x v / 2 - 1 e - x / 2 0 x \geq 0 x < 0$ $\begin{equation} f(x;v)= \begin{cases} \frac{1}{2^{v/2}\Gamma(v/2)}x^{v/2 - 1}e^{-x/2} & x \ge 0 \\ 0 & x < 0 \end{cases} \end{equation}$
  - The parameter $ν$ is called the number of degrees of freedom - df (自由度数) of X . The symbol $\chi^2$ is often used in place of “chi-squared”.
Other Continuous Distribution
- The Weibull Distribution(威布尔分布)
  - A random variable X is said to have a Weibull distribution with parameters α and β (α > 0, β > 0) if the cdf of X is
    $f (x; α, β) = {α β x α - 1 e - (x / β) α 0 x \geq 0 x < 0$ $\begin{equation} f(x;\alpha,\beta)= \begin{cases} \frac{\alpha}{\beta}x^{\alpha-1}e^{-(x/\beta)^\alpha} & x \ge 0 \\ 0 & x < 0 \end{cases} \end{equation}$
  - $\mu = \beta\ \Gamma\left(1+\frac{1}{\alpha}\right)$
  - $\sigma^2 = \beta^2\lbrace\Gamma\left(1+\frac{2}{\alpha}\right)-\lbrack\Gamma\left(1+\frac{1}{\alpha}\right)\rbrack^2\rbrace$
  - cdf: $f (x; α, β) = {1 - e - (x / β) α 0 x \geq 0 x < 0$ $\begin{equation} f(x;\alpha,\beta)= \begin{cases} 1-e^{-(x/\beta)^\alpha} & x \ge 0 \\ 0 & x < 0 \end{cases} \end{equation}$
- The Lognormal Distribution(对数正态分布)
  - A nonnegative rv X is said to have a lognormal distribution if the rv Y = ln(X) has a normal distribution. The resulting pdf of a lognormal rv when ln(X) is normally distributed with parameters $μ$ and $σ$ is
    $f (x; μ, σ) = ⎧ ⎩ ⎨ 1 2 π - - \sqrt σ x e - (l n (x) - μ) 2 / (x σ 2) 0 x \geq 0 x < 0$ $\begin{equation} f(x;\mu,\sigma)= \begin{cases} \frac{1}{\sqrt{2\pi}\sigma x}e^{-(ln(x)-\mu)^2/(x\sigma^2)} & x \ge 0 \\ 0 & x < 0 \end{cases} \end{equation}$
  - $E(X) = e^{\mu+\sigma^2/2}$
  - $V(X) = e^{2\mu+\sigma^2}·\left(e^{\sigma^2} -1\right)$
  - cdf:
    $F (x; μ, σ) = P (X \leq x) = P [l n (X) \leq l n (x)] = P (Z \leq l n ( x ) - μ σ = Φ (l n ( x ) - μ σ)$ $F(x;\mu,\sigma)=P(X\le x)=P[ln(X)\le ln(x)] = P(Z\le \frac{ln(x)-\mu}{\sigma} = \Phi\left(\frac{ln(x)-\mu}{\sigma}\right)$
- The Beta Distribution(贝塔分布)
  - A random variable X is said to have a beta distribution with parameters $α$ , $β$ , A, and B if the pdf of X is
  - The case A=0, B=1 gives the standard beta distribution.
  - $\mu = A+(B-A)·\frac{\alpha}{\alpha+\beta}$
  - $\sigma^2 = \frac{(B-A)^2\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)}$
Probability Plot(概率图)
- Sample Percentiles
  - Order the n sample observations from the smallest to the largest. Then the $i$ th smallest observation in the list is taken to be the $[100(i-.5)/n]$ th sample percentile.
- Probability Plot
- Normal Probability Plot(正态概率图)
  - A plot of the n pairs ([100(i-.5)/n]th z percentile, ith smallest observation) on a two-dimensional coordinate system is called a normal probability plot.