[关闭]
@daixuan1996 2015-01-07T14:40:32.000000Z 字数 10818 阅读 1145

概率统计笔记 CH5

Joint Probability Distributions and Random Samples

联合概率分布与随机样本

概率统计


  1. Jointly Distributed Random Variables(联合分布随机变量)

    • The joint probability mass function for two discrete random variables

      • Let X and Y be two discrete random variables defined on the sample space Ω of an experiment. The joint probability mass function(联合概率质量函数) p(x,y) is defined for each pair of numbers (x,y) by
        p(x,y)=P(X=x and Y=y)

        Let A be any set consisting of pairs of (x,y) values.
        p[(X,Y)A]=(x,y)Ap(x,y)
      • The marginal probability mass functions(边缘概率质量函数) of X and Y, denoted by pX(x) and pY(y), respectively, are given by
        pX(x)=yp(x,y), pY(y)=xp(x,y)
    • The joint probability density function for two continuous random variables

      • Let X and Y be two continuous random variables. Then f(x,y) is the joint probability density function(联合概率密度函数)for X and Y if for any two-dimensional set A
        P[(X,Y)A]=Af(x,y)dxdy

        In particular, if A is the two-dimensional rectangle {(x,y):a ≤ x ≤ b, c ≤ y ≤ d}, then
        P[(X,Y)A]=P(axb,cyd)=badcf(x,y)dydx

        For f(x,y) to be a candidate for a joint pdf, it must satisfy f(x,y)≥0, and
        ++f(x,y)dxdy=1
      • The marginal probability density functions(边缘概率密度函数) of X and Y, denoted by fX(x) and fY(y), respectively, are given by
        fX(x)=+f(x,y)dy  for<x<+

        fY(y)=+f(x,y)dx  for<y<+
    • Independent Random Variable

      • Two random variables X and Y are said to be independent if for every pair of x and y values,
        p(x,y)=pX(x)pY(y)   when X and Y are discrete
        f(x,y)=fX(x)fY(y)   when X and Y are continuous
        Otherwise, X and Y are said to be dependent.
    • More than two random variables

      • If X1,X2,,Xn are all discrete rv's, the joint pmf of the variables is the function
        p(x1,x2,...,xn)=P(X1=x1,X2=x2,...,Xn=xn)

        If the variables are continuous, the joint pdf of X1,X2,,Xn is the function f(x1,x2,,xn) such that for any n intervals [a1,b1], …, [an,bn],
        P(a1X1b1,...,anXnbn)=b1a1...bnanf(x1,...,xn)dxn...dx1
      • The rv’s X1,X2,,Xn are said to be independent if for every subset of the variables and all possible values of these variables, the joint pmf or pdf of the subset is equal to the product of the marginal pmf's or pdf's.
      • The rv’s X1, X2, …, Xn are independent iff for all possible values of these variables, the joint pmf or pdf of these variables is equal to the product of the marginal pmf’s or pdf’s.
    • Conditional Distributions

      • Let X and Y be two continuous rv's with joint pdf f(x,y) and marginal X pdf fX(x). Then for any X value x for which fX(x)>0,the conditional probability density function of Y given that X=x is
        fY|X(y|x)=f(x,y)fX(x),  <y<

        If X and Y are discrete, the conditional probability mass function of Y when X=x is:
        PY|X(Y=y|X=x)=P(Y=y,X=x)P(X=x)
  2. Expected Values, Covariance(协方差), and Correlation(相关性)

    • Expected Values
      • Let X and Y be jointly distributed rv's with pmf p(x,y) or pdf f(x,y) according to whether the variables are discrete or continuous. Then the expected value of a function h(X,Y), denoted by E[h(X,Y)] or μh(X,Y), is given by
        E[h(X,Y)]=xyh(x,y)p(x,y)   for discrete X and Y

        E[h(X,Y)]=h(x,y)f(x,y)dxdy   for continuous X and Y

      • The method of computing E[h(X1,…, Xn)], the expected value of a function h(X1,,Xn) of n random variables is similar to that for two random variables.
      • If X and Y are independent random variables, then E[XY]=E[X]E[Y]
        Note that the converse is not true: If E[XY] = E[X]E[Y], we are not sure that X and Y are independent.
      • For any constant a and b, regardless of the relationship between X and Y: E[aX+bY]=aE[X]+bE[Y]
      • In general,

        E[i=1naiXi]=i=1naiE[Xi]

      • Covariance(协方差)

      • The covariance between two rv’s X and Y is defined as
        Cov(X,Y)=E[(XμX)(YμY)]=xy(xμX)(yμY)p(x,y)(xμX)(yμY)f(x,y)dxdyX,Y discreteX,Y continuous
        1. For a strong positive relationship, Cov(X,Y) should be quite positive.
        2. For a strong negative relationship, Cov(X,Y) should be quite negative.
        3. If X and Y are not strongly related, positive and negative products will tend to cancel one another, yielding a covariance near 0.
          covariance
      • Cov(X,Y)=E(XY)μXμY
        当X=Y时,协方差等于方差
    • Correlation(相关性)

      • The correlation coefficient(相关系数)of X and Y, denoted by Corr(X,Y), ρX,Y or just ρ, is defined by
        ρX,Y=Cov(X,Y)σXσY
        1. If a and c are either both positive or both negative -> Corr(aX+b, cY+d) = Corr(X,Y)
        2. For any two rv's X and Y, -1 ≤ Corr(X,Y) ≤ 1.
        3. If X and Y are independent, then ρ = 0, but ρ = 0 does not imply independence.
        4. ρ = 1 or –1 iff Y=aX+b for some numbers a and b with a ≠ 0.
  3. Statistics(统计量) and Their Distributions

    • Consider selecting different samples of size n from the same population distribution. Because of uncertainty, before the data becomes available we view each observation in a sample as a random variable and denote the sample by X1,X2,,Xn.
      The variation in observed sample values in turn implies that the value of any function of the sample observations, such as the sample mean, sample standard deviation, or sample fourth spread, also varies from sample to sample.

    • A Statistics(统计量) is any quantity whose value can be calculated from sample data.

      • Prior to obtaining data, there is uncertainty as to what value of any particular statistic will result. Therefore, a statistic is a random variable.
      • A statistic will be denoted by an uppercase letter; a lowercase letter is used to represent the calculated or observed value of the statistic.
      • The probability distribution of a statistic is sometimes referred to as its sampling distribution(抽样分布). It describes how the statistic varies in value across all samples that might be selected.
    • Random Samples(随机样本)

      • The probability distribution of any particular statistic depends not only on the population distribution (normal, uniform, etc.) and the sample size n but also on the method of sampling.
      • A sampling method often encountered (at last approximately) in practice is random sampling(随机抽样).
      • The rv's X1,X2,,Xn are said to form a (simple) random samples(随机样本) of size n if
        1. The Xis are independent rv's
        2. Every Xi has the same probability distribution.
          When conditions 1 and 2 are satisfied, we say that the Xi's are independent and identically distributed (iid, 独立同分布).
      • What's more:
        1. Sampling with replacement or from an infinite population is random sampling.
        2. Sampling without replacement from a finite population is generally considered not random sampling. But if the sample size n is much smaller than the population size N (n/N ≤ 0.05), it is approximately random sampling.
    • Deriving the Sampling Distribution of a Statistic

      • Probability rules can be used to obtain the distribution of a statistic provided that it is a "fairly simple" function of the Xi's and either there are relatively few different X values in the population or else the population distribution has a "nice" form.
        example
        example
    • Simulation Experiments

      • This method is usually used when a derivation via probability rules is too difficult or complicated to be carried out. Such an experiment is virtually always done with the aid of a computer.
      • The following characteristics of an experiment must be specified:
        1. The statistic of interest
        2. The population distribution
        3. The sample size n
        4. The number of replications k
      • The larger the value of k, the better the approximation will tend to be.In practice, k = 500 or 1000 is usually enough if the statistic is "fairly simple".
  4. The Distribution of the Sample Mean ☆★

    • Let X1,X2,,Xn be a random sample from a distribution with mean μ and standard deviation σ. Then

      1. E(X¯)=μX¯=μ
      2. V(X¯)=σ2X¯=σ2n   and σX¯=σn
        • In addition, with T0=X1+...+Xn (the sample total), E(T0)=nμ, V(T0)=nσ2 and σT0=nσ
    • The Case of a Normal Population Distribution

      • Let X1,X2,,Xn be a random sample from a normal distribution with mean μ and standard deviation σ. Then for any n, X¯ is normally distributed with mean μ and standard deviation σ/n, as is T0 with mean nμ and standard deviation nσ.
    • The Central Limit Theorem - CLT(中心极限定理)

      • Let X1,X2,,Xn be a random sample from a distribution with mean μ and variance σ2. Then if n is sufficiently large, X¯ has approximately a normal distribution with μX¯=μ and σ2$X¯$=σ2/n, and T0 also has approximately a normal distribution with μX¯=nμ,  σ2T0=nσ2. The larger the value of n, the better the approximation.
      • Rule of thumb: If n > 30, the Central Limit Theorem can be used.
      • The CLT is applicable whether the variable of interest is discrete or continuous.
    • Other Applications of the Central Limit Theorem

      • The CLT can be used to justify the normal approximation to the binomial distribution.
      • Using the approximation only if both np ≥ 10 and n(1-p) ≥ 10 ensures that n is large enough to overcome any skewness in the underlying Benoulli distribution.
      • Let X1,X2,,Xn be a random sample from a distribution for which only positive values are possible [P(Xi>0)=1]. Then if n is sufficiently large, the product Y = X1X2Xn has approximately a lognormal distribution.
      • To verify this, ln(Y)=ln(X1)+ln(X2)+...+ln(Xn)
  5. The Distribution of a Linear Combination

    • Given a collection of n random variables X1,X2,,Xn and n numerical constants a1,a2,,an, the rv

      Y=a1X1+...+anXn=i=1naiXi

      is called a linear combination(线性组合) of the Xis.

      • Let X1,X2,,Xn have mean values μ1,μ2,,μn respectively, and variances of σ21,,σ2n, respectively.
        1. Whether or not the Xi’s are independent,
          E(i=1naiXi)=i=1naiE(Xi)=i=1naiμi
        2. If X1,X2,,Xn are independent,
          V(i=1naiXi)=i=1na2iV(Xi)=i=1na2iσ2i

          and
          σa1X1+...+anXn=a21σ21+...+a2nσ2n
        3. For any X1,X2,,Xn,
          V(i=1naiXi)=i=1nj=1naiajCov(Xi,Xj)
    • The Difference Between Two Random Variables

      • E(X1X2)=E(X1)E(X2) and, if X1 and X2 are independent, V(X1X2)=V(X1)+V(X2).
    • The Case of Normal Random Variables

      • If X1,X2,,Xn are independent, normally distributed rv's (with possibly different means and/or variances), then any linear combination of the Xis also has a normal distribution.
        In particular, the difference X1X2 between two independent, normally distributed variables is itself normally distributed.

Copyright © 2015 by Xuan Dai. All rights reserved.

添加新批注
在作者公开此批注前,只有你和作者可见。
回复批注