[关闭]
@daixuan1996 2015-01-10T17:47:25.000000Z 字数 9567 阅读 1062

概率统计笔记 CH7

Statistical Intervals Based on a Single Sample

基于单次采样的统计区间

概率统计

  • A point estimate, because it is a single number, by itself provides no information about the precision and reliability of estimation.
  • An alternative to reporting a single sensible value for the parameter being estimated is to calculate and report an entire interval of plausible values—an interval estimate or confidence interval - CI(置信区间). A confidence interval is always calculated by first selecting a confidence level(置信度), which is a measure of the degree of reliability of the interval. Information about the precision of an interval estimate is conveyed by the width of the interval.

  1. Basic Properties of Confidence Intervals

    • Suppose that the parameter of interest is a population mean μ and that

      1. The population distribution is normal.
      2. The value of the population standard deviation σ is known.
        • Normality of the population distribution is often a reasonable assumption. However, if the value of μ is unknown, it is implausible that the value of σ would be available.
          example1-01
          example1-02
        • If after observing X1=x1,...Xn=xn, we compute the observed sample mean x¯. The resulting fixed interval is called a 95% confidence interval for μ. This CI can be expressed either as
          x¯1.96σn,x¯+1.96σn  is a 95% CI for μ

          or as x¯1.96σn<μ<1.96σn  with 95% confidence.
          A concise expression for the interval is x¯±1.96σ/n.
    • Other Levels of Confidence

      • A 100(1α)% confidence interval for the mean μ of a normal population when the value of σ is known is given by
        x¯zα/2σn,x¯+zα/2σn

        or, equivalently, by x¯±zα/2σ/n
        For example, the 99% CI is x¯±2.58σ/n
    • Confidence Level, Precision, and Choice of Sample Size

      • Why settle for a confidence level of 95% when a level of 99% is achievable? Because the price paid for the higher confidence level is a wider interval.
        The width of the interval may be thought to specify its precision or accuracy. Then it is inversely related to the confidence level (or reliability), but positively related to the sample size n.
        An appealing strategy is to specify both the desired confidence level and interval width and then determine the necessary sample size n.
        example2
      • The general formula for the sample size n necessary to ensure an interval width w is obtained from w=2zα/2σ/n as
        n=2zα/2σw2

        The smaller the desired width w, the large n must be.
        The half-width 1.96σ/n of the 95% CI is sometimes called the bound on the error of estimation associated with a 95% confidence level.
    • Deriving a Confidence interval

      • Let X1,X2,...,Xn denote a sample on which the CI for a parameter θ is to be based. Suppose a random variable h(X1,X2,,Xn;θ) satisfying the following two properties can be found:
        1. The variable depends functionally on both X1,X2,...,Xn and θ.
        2. The probability distribution of the variable does not depend on θ or on any other unknown parameters.
      • In order to determine a 100(1α) CI of θ, we proceed as follows:
        P(a<h(X1,X2,,Xn;θ)<b)=1α

        Because of the second property, a and b do not depend on θ. In the normal example, we had a=zα/2 and b=zα/2. Suppose we can isolate θ in the inequation:
        P(l(X1,X2,,Xn)<θ<u(X1,X2,,Xn))=1α

        So a 100(1α)% CI is [l(X1,X2,,Xn),u(X1,X2,,Xn)]. In the normal example, l(X1,X2,,Xn)=X¯¯¯zα/2σ/n and u(X1,X2,,Xn)=X¯¯¯+zα/2σ/n
        In general, the form of the h function is suggested by examining the distribution of an appropriate estimator θ^.
        example3-01
        example3-02
  2. Large-sample Confidence Intervals for a Population Mean and Proportion

    • A Large-Sample Interval for μ

      • Let X1,X2,,Xn be a random sample from a population having a mean μ and standard deviation σ. Provided that n is large, the Central Limit Theorem (CLT) implies that X has approximately a normal distribution whatever the nature of the population distribution.
      • If n is sufficiently large, the standardized variable
        Z=X¯¯¯μS/n

        has approximately a standard normal distribution.
        This implies that
        x¯±zα/2sn

        is a large-sample confidence interval for μ with confidence level approximately 100(1α)%.
      • This formula is valid regardless of the shape of the population distribution. Generally speaking, n > 40 will be sufficient to justify the use of this interval.
    • A Large-Sample Confidence interval for a Population Proportion

      • A confidence interval for a population proportion p with confidence level approximately 100(1α)% has
        lower confidence limit()=p^+z2α/22nzα/2p^q^n+z2α/24n21+(z2α/2)/n

        upper confidence limit()=p^+z2α/22nzα/2p^q^n+z2α/24n21+(z2α/2)/n

        The traditional approximate confidence limits under a large sample size:
        p^±zα/2p^q^/n
      • For an interval with a desired degree of precision, equate the width of the CI for p to a prespecified width w. It gives a quadratic equation for the sample size n.
        The solution is too long...Neglecting the terms in the numerator involving w2 gives
        n4z2α/2p^q^w2

        This expression is what results from equating the width of the traditional interval to w.
    • One-Sided Confidence Intervals (Confidence Bounds, 置信界限)
      • Sometimes one may want a CI with only a lower bound or an upper bound.
        For example, under the 100(1α)% confidence level and with a large sample, we have, approximately,
        P(X¯¯¯μS/n<zα)1α

        Rearranging the inequation in the parentheses, for a given sample, we obtain
        μ>x¯zαsn

        which is a one-sided CI(amounting to a lower confidence bound here).
        An upper confidence bound can be obtained similarly.
      • A large-sample upper confidence bound for μ is
        μ<x¯+zαsn

        and a large-sample lower confidence bound for μ is
        μ>x¯zαsn

        A one-sided confidence bound for p results from replacing zα/2 by zα and ± by either + or – in the CI formula for p.
  3. Intervals Based on a Normal Population Distribution

    • Intro

      • Assumption: X1,X2,,Xn constitutes a random sample from a normal distribution with both μ and σ unknown.
      • Theorem: Let X1,X2,,Xn be a random sample from a normal distribution with parameters μ and σ2. Then the rv
        (n1)S2σ2=(XiX¯¯¯)2σ2

        has a chi-squared probability distribution with n-1df(自由度).
      • Theorem: Suppose rv's X and Y are independent, X follows a standard normal distribution, Y follows a chi-squared distribution with k degrees of freedom. Then the function of random variable
        T=XY/k

        has t distribution with k degrees of freedom.
      • Theorem: When X¯¯¯ is the mean of a random sample of size n from a normal distribution with mean μ, the rv
        T=X¯¯¯μS/n

        has a t distribution with n-1 degrees of freedom(df).
    • Properties of t Ditributions

      • A t distribution is governed by only one parameter, the number of degrees of freedom of the distribution.
      • Let tv denote the density function curve for v df.
        1. Each tv curve is bell-shaped and centered at 0.
        2. Each tv curve is more spread out than the standard normal curve.
        3. As v increases, the spread of the corresponding tv curve decreases.
        4. As v, the sequence of tv curves approaches the standard normal curve.
      • Notation: Let tα,v = the number on the measurement axis for which the area under the t curve with v df to the right of tα,v is α; tα,v is called a t critical value(临界值).
        t
    • The One-Sample t Confidence Interval

      • The standardized variable T has a t distribution with n-1 df, and the area under the corresponding t density curve between tα/2,n1 and tα/2,n1 is 1α, so
        P(tα/2,n1<T<tα/2,n1)=1α
      • Let x¯ and s be the sample mean and sample standard deviation computed from the results of a random sample from a normal population with mean μ. Then a 100(1α)% confidence interval for μ is
        x¯tα/2,n1sn,x¯+tα/2,n1sn

        or, more compactly, x¯±tα/2,n1sn.
        An upper confidence bound with 100(1α)% confidence level for μ is x¯+tα,n1s/n. Replacing + by – gives a lower confidence bound for μ.
        example4
    • A Prediction Interval(预测区间) for a Single Future Value

      • A prediction interval - PI for a single observation to be selected from a normal population distribution is
        x¯±tα/2,n1s11n

        The prediction level(预测水平) is 100(1α)%.
    • Tolerance Intervals(容许区间)

      tolerance

  4. Confidence Intervals for the Variance and Standard Deviation of a Normal Population

    • A 100(1α)% confidence interval for the variance σ2 of a normal population has lower limit
      (n1)s2/χ2α/2,n1

      and upper limit
      (n1)s2/χ21α/2,n1

      A confidence interval for σ has lower and upper limits that are the square roots of the corresponding limits in the interval for σ2.

    summary


Copyright © 2015 by Xuan Dai. All rights reserved.

添加新批注
在作者公开此批注前,只有你和作者可见。
回复批注