@daixuan1996
2015-01-10T17:47:25.000000Z
字数 9567
阅读 1062
概率统计
- A point estimate, because it is a single number, by itself provides no information about the precision and reliability of estimation.
- An alternative to reporting a single sensible value for the parameter being estimated is to calculate and report an entire interval of plausible values—an interval estimate or confidence interval - CI(置信区间). A confidence interval is always calculated by first selecting a confidence level(置信度), which is a measure of the degree of reliability of the interval. Information about the precision of an interval estimate is conveyed by the width of the interval.
Basic Properties of Confidence Intervals
Suppose that the parameter of interest is a population mean
- The population distribution is normal.
- The value of the population standard deviation
σ is known.
- Normality of the population distribution is often a reasonable assumption. However, if the value of
μ is unknown, it is implausible that the value ofσ would be available.
- If after observing
X1=x1,...Xn=xn , we compute the observed sample meanx¯ . The resulting fixed interval is called a 95% confidence interval forμ . This CI can be expressed either as
⟮x¯−1.96⋅σn√,x¯+1.96⋅σn√⟯ is a 95% CI for μ
or asx¯−1.96⋅σn√<μ<1.96⋅σn√ with 95% confidence .
A concise expression for the interval isx¯±1.96⋅σ/n√ .
Other Levels of Confidence
- A
100(1−α)% confidence interval for the meanμ of a normal population when the value ofσ is known is given by
⟮x¯−zα/2⋅σn√,x¯+zα/2⋅σn√⟯
or, equivalently, byx¯±zα/2⋅σ/n√
For example, the99% CI isx¯±2.58⋅σ/n√
Confidence Level, Precision, and Choice of Sample Size
- Why settle for a confidence level of 95% when a level of 99% is achievable? Because the price paid for the higher confidence level is a wider interval.
The width of the interval may be thought to specify its precision or accuracy. Then it is inversely related to the confidence level (or reliability), but positively related to the sample size n.
An appealing strategy is to specify both the desired confidence level and interval width and then determine the necessary sample size n.
- The general formula for the sample size n necessary to ensure an interval width w is obtained from
w=2⋅zα/2⋅σ/n√ as
n=⟮2zα/2⋅σw⟯2
The smaller the desired width w, the large n must be.
The half-width1.96σ/n√ of the95% CI is sometimes called the bound on the error of estimation associated with a95% confidence level.
Deriving a Confidence interval
- Let
X1,X2,...,Xn denote a sample on which the CI for a parameterθ is to be based. Suppose a random variableh(X1,X2,…,Xn;θ) satisfying the following two properties can be found:
- The variable depends functionally on both
X1,X2,...,Xn andθ .- The probability distribution of the variable does not depend on θ or on any other unknown parameters.
- In order to determine a
100(1−α) CI ofθ , we proceed as follows:
P(a<h(X1,X2,…,Xn;θ)<b)=1−α
Because of the second property,a andb do not depend onθ . In the normal example, we hada=−zα/2 andb=zα/2 . Suppose we can isolateθ in the inequation:
P(l(X1,X2,…,Xn)<θ<u(X1,X2,…,Xn))=1−α
So a100(1−α)% CI is[l(X1,X2,…,Xn),u(X1,X2,…,Xn)] . In the normal example,l(X1,X2,…,Xn)=X¯¯¯−zα/2⋅σ/n√ andu(X1,X2,…,Xn)=X¯¯¯+zα/2⋅σ/n√
In general, the form of the h function is suggested by examining the distribution of an appropriate estimatorθ^ .
Large-sample Confidence Intervals for a Population Mean and Proportion
A Large-Sample Interval for
- Let
X1,X2,…,Xn be a random sample from a population having a meanμ and standard deviationσ . Provided that n is large, the Central Limit Theorem (CLT) implies that X has approximately a normal distribution whatever the nature of the population distribution.- If n is sufficiently large, the standardized variable
Z=X¯¯¯−μS/n√
has approximately a standard normal distribution.
This implies that
x¯±zα/2⋅sn√
is a large-sample confidence interval forμ with confidence level approximately100(1−α)% .- This formula is valid regardless of the shape of the population distribution. Generally speaking, n > 40 will be sufficient to justify the use of this interval.
A Large-Sample Confidence interval for a Population Proportion
- A confidence interval for a population proportion p with confidence level approximately
100(1−α)% has
lower confidence limit(置信下限)=p^+z2α/22n−zα/2p^q^n+z2α/24n2−−−−−−−−√1+(z2α/2)/n
upper confidence limit(置信上限)=p^+z2α/22n−zα/2p^q^n+z2α/24n2−−−−−−−−√1+(z2α/2)/n
The traditional approximate confidence limits under a large sample size:
p^±zα/2p^q^/n−−−−−√ - For an interval with a desired degree of precision, equate the width of the CI for p to a prespecified width w. It gives a quadratic equation for the sample size n.
The solution is too long...Neglecting the terms in the numerator involvingw2 gives
n≈4z2α/2p^q^w2
This expression is what results from equating the width of the traditional interval to w.
- Sometimes one may want a CI with only a lower bound or an upper bound.
For example, under the100(1−α)% confidence level and with a large sample, we have, approximately,
P(X¯¯¯−μS/n√<zα)≈1−α
Rearranging the inequation in the parentheses, for a given sample, we obtain
μ>x¯−zα⋅sn√
which is a one-sided CI(amounting to a lower confidence bound here).
An upper confidence bound can be obtained similarly.- A large-sample upper confidence bound for
μ is
μ<x¯+zα⋅sn√
and a large-sample lower confidence bound forμ is
μ>x¯−zα⋅sn√
A one-sided confidence bound for p results from replacingzα/2 byzα and± by either + or – in the CI formula for p.
Intervals Based on a Normal Population Distribution
Intro
- Assumption:
X1,X2,…,Xn constitutes a random sample from a normal distribution with bothμ andσ unknown.- Theorem: Let
X1,X2,…,Xn be a random sample from a normal distribution with parametersμ andσ2 . Then the rv
(n−1)S2σ2=∑(Xi−X¯¯¯)2σ2
has a chi-squared probability distribution with n-1df(自由度).- Theorem: Suppose rv's X and Y are independent, X follows a standard normal distribution, Y follows a chi-squared distribution with k degrees of freedom. Then the function of random variable
T=XY/k−−−−√
has t distribution with k degrees of freedom.- Theorem: When
X¯¯¯ is the mean of a random sample of size n from a normal distribution with meanμ , the rv
T=X¯¯¯−μS/n√
has a t distribution with n-1 degrees of freedom(df).
Properties of t Ditributions
- A t distribution is governed by only one parameter, the number of degrees of freedom of the distribution.
- Let
tv denote the density function curve forv df.
- Each
tv curve is bell-shaped and centered at 0.- Each
tv curve is more spread out than the standard normal curve.- As
v increases, the spread of the correspondingtv curve decreases.- As
v→∞ , the sequence oftv curves approaches the standard normal curve.- Notation: Let
tα,v = the number on the measurement axis for which the area under the t curve withv df to the right oftα,v isα ;tα,v is called a t critical value(临界值).
The One-Sample t Confidence Interval
- The standardized variable T has a t distribution with n-1 df, and the area under the corresponding t density curve between
−tα/2,n−1 andtα/2,n−1 is1−α , so
P(−tα/2,n−1<T<tα/2,n−1)=1−α - Let
x¯ and s be the sample mean and sample standard deviation computed from the results of a random sample from a normal population with meanμ . Then a100(1−α)% confidence interval forμ is
⟮x¯−tα/2,n−1⋅sn√,x¯+tα/2,n−1⋅sn√⟯
or, more compactly,x¯±tα/2,n−1⋅sn√ .
An upper confidence bound with100(1−α)% confidence level forμ isx¯+tα,n−1⋅s/n√ . Replacing + by – gives a lower confidence bound forμ .
A Prediction Interval(预测区间) for a Single Future Value
- A prediction interval - PI for a single observation to be selected from a normal population distribution is
x¯±tα/2,n−1⋅s11n−−√
The prediction level(预测水平) is100(1−α)% .
Tolerance Intervals(容许区间)
Confidence Intervals for the Variance and Standard Deviation of a Normal Population
Copyright © 2015 by Xuan Dai. All rights reserved.