@daixuan1996
2015-01-07T14:40:32.000000Z
字数 10818
阅读 1145
概率统计
Jointly Distributed Random Variables(联合分布随机变量)
The joint probability mass function for two discrete random variables
- Let X and Y be two discrete random variables defined on the sample space Ω of an experiment. The joint probability mass function(联合概率质量函数) p(x,y) is defined for each pair of numbers (x,y) by
p(x,y)=P(X=x and Y=y)
Let A be any set consisting of pairs of (x,y) values.
p[(X,Y)∈A]=∑∑(x,y)∈Ap(x,y) - The marginal probability mass functions(边缘概率质量函数) of X and Y, denoted by
pX(x) andpY(y) , respectively, are given by
pX(x)=∑yp(x,y), pY(y)=∑xp(x,y)
The joint probability density function for two continuous random variables
- Let X and Y be two continuous random variables. Then f(x,y) is the joint probability density function(联合概率密度函数)for X and Y if for any two-dimensional set A
P[(X,Y)∈A]=∫∫Af(x,y)dxdy
In particular, if A is the two-dimensional rectangle {(x,y):a ≤ x ≤ b, c ≤ y ≤ d}, then
P[(X,Y)∈A]=P(a≤x≤b,c≤y≤d)=∫ba∫dcf(x,y)dydx
For f(x,y) to be a candidate for a joint pdf, it must satisfy f(x,y)≥0, and
∫+∞−∞∫+∞−∞f(x,y)dxdy=1 - The marginal probability density functions(边缘概率密度函数) of X and Y, denoted by
fX(x) andfY(y) , respectively, are given by
fX(x)=∫+∞−∞f(x,y)dy for−∞<x<+∞
fY(y)=∫+∞−∞f(x,y)dx for−∞<y<+∞
Independent Random Variable
- Two random variables X and Y are said to be independent if for every pair of x and y values,
p(x,y)=pX(x)⋅pY(y) when X and Y are discrete
f(x,y)=fX(x)⋅fY(y) when X and Y are continuous
Otherwise, X and Y are said to be dependent.
More than two random variables
- If
X1,X2,…,Xn are all discrete rv's, the joint pmf of the variables is the function
p(x1,x2,...,xn)=P(X1=x1,X2=x2,...,Xn=xn)
If the variables are continuous, the joint pdf ofX1,X2,…,Xn is the functionf(x1,x2,…,xn) such that for any n intervals[a1,b1] , …,[an,bn] ,
P(a1≤X1≤b1,...,an≤Xn≤bn)=∫b1a1...∫bnanf(x1,...,xn)dxn...dx1 - The rv’s
X1,X2,…,Xn are said to be independent if for every subset of the variables and all possible values of these variables, the joint pmf or pdf of the subset is equal to the product of the marginal pmf's or pdf's.- The rv’s X1, X2, …, Xn are independent iff for all possible values of these variables, the joint pmf or pdf of these variables is equal to the product of the marginal pmf’s or pdf’s.
Conditional Distributions
- Let X and Y be two continuous rv's with joint pdf
f(x,y) and marginal X pdffX(x) . Then for any X value x for whichfX(x)>0 ,the conditional probability density function of Y given that X=x is
fY|X(y|x)=f(x,y)fX(x), −∞<y<∞
If X and Y are discrete, the conditional probability mass function of Y when X=x is:
PY|X(Y=y|X=x)=P(Y=y,X=x)P(X=x)
Expected Values, Covariance(协方差), and Correlation(相关性)
- Let X and Y be jointly distributed rv's with pmf p(x,y) or pdf f(x,y) according to whether the variables are discrete or continuous. Then the expected value of a function h(X,Y), denoted by
E[h(X,Y)] orμh(X,Y) , is given by
E[h(X,Y)]=∑x∑yh(x,y)⋅p(x,y) for discrete X and Y
E[h(X,Y)]=∫∞−∞∫−∞∞h(x,y)⋅f(x,y)dxdy for continuous X and Y - The method of computing E[h(X1,…, Xn)], the expected value of a function
h(X1,…,Xn) of n random variables is similar to that for two random variables.- If X and Y are independent random variables, then
E[XY]=E[X]E[Y]
Note that the converse is not true: If E[XY] = E[X]E[Y], we are not sure that X and Y are independent.- For any constant a and b, regardless of the relationship between X and Y:
E[aX+bY]=aE[X]+bE[Y] In general,
E[∑i=1naiXi]=∑i=1naiE[Xi] Covariance(协方差)
- The covariance between two rv’s X and Y is defined as
Cov(X,Y)=E[(X−μX)(Y−μY)]=⎧⎩⎨⎪⎪⎪⎪∑x∑y(x−μX)(y−μY)p(x,y)∫∞−∞∫−∞∞(x−μX)(y−μY)f(x,y)dxdyX,Y discreteX,Y continuous
- For a strong positive relationship, Cov(X,Y) should be quite positive.
- For a strong negative relationship, Cov(X,Y) should be quite negative.
- If X and Y are not strongly related, positive and negative products will tend to cancel one another, yielding a covariance near 0.
Cov(X,Y)=E(XY)−μXμY
当X=Y时,协方差等于方差
Correlation(相关性)
- The correlation coefficient(相关系数)of X and Y, denoted by Corr(X,Y), ρX,Y or just ρ, is defined by
ρX,Y=Cov(X,Y)σX⋅σY
- If a and c are either both positive or both negative -> Corr(aX+b, cY+d) = Corr(X,Y)
- For any two rv's X and Y, -1 ≤ Corr(X,Y) ≤ 1.
- If X and Y are independent, then
ρ = 0, butρ = 0 does not imply independence.ρ = 1 or –1 iffY=aX+b for some numbers a and b with a ≠ 0.
Statistics(统计量) and Their Distributions
Consider selecting different samples of size n from the same population distribution. Because of uncertainty, before the data becomes available we view each observation in a sample as a random variable and denote the sample by
The variation in observed sample values in turn implies that the value of any function of the sample observations, such as the sample mean, sample standard deviation, or sample fourth spread, also varies from sample to sample.
A Statistics(统计量) is any quantity whose value can be calculated from sample data.
- Prior to obtaining data, there is uncertainty as to what value of any particular statistic will result. Therefore, a statistic is a random variable.
- A statistic will be denoted by an uppercase letter; a lowercase letter is used to represent the calculated or observed value of the statistic.
- The probability distribution of a statistic is sometimes referred to as its sampling distribution(抽样分布). It describes how the statistic varies in value across all samples that might be selected.
Random Samples(随机样本)
- The probability distribution of any particular statistic depends not only on the population distribution (normal, uniform, etc.) and the sample size n but also on the method of sampling.
- A sampling method often encountered (at last approximately) in practice is random sampling(随机抽样).
- The rv's
X1,X2,…,Xn are said to form a (simple) random samples(随机样本) of size n if
- The
X′is are independent rv's- Every
Xi has the same probability distribution.
When conditions 1 and 2 are satisfied, we say that the Xi's are independent and identically distributed (iid, 独立同分布).- What's more:
- Sampling with replacement or from an infinite population is random sampling.
- Sampling without replacement from a finite population is generally considered not random sampling. But if the sample size n is much smaller than the population size N (n/N ≤ 0.05), it is approximately random sampling.
Deriving the Sampling Distribution of a Statistic
- Probability rules can be used to obtain the distribution of a statistic provided that it is a "fairly simple" function of the Xi's and either there are relatively few different X values in the population or else the population distribution has a "nice" form.
Simulation Experiments
- This method is usually used when a derivation via probability rules is too difficult or complicated to be carried out. Such an experiment is virtually always done with the aid of a computer.
- The following characteristics of an experiment must be specified:
- The statistic of interest
- The population distribution
- The sample size n
- The number of replications k
- The larger the value of k, the better the approximation will tend to be.In practice, k = 500 or 1000 is usually enough if the statistic is "fairly simple".
The Distribution of the Sample Mean ☆★
Let
E(X¯)=μX¯=μ V(X¯)=σ2X¯=σ2n and σX¯=σn√
- In addition, with
T0=X1+...+Xn (the sample total),E(T0)=nμ, V(T0)=nσ2 andσT0=n√σ
The Case of a Normal Population Distribution
- Let
X1,X2,…,Xn be a random sample from a normal distribution with meanμ and standard deviationσ . Then for any n,X¯ is normally distributed with meanμ and standard deviationσ/n√ , as isT0 with meannμ and standard deviationn√σ .
The Central Limit Theorem - CLT(中心极限定理)
- Let
X1,X2,…,Xn be a random sample from a distribution with meanμ and varianceσ2 . Then if n is sufficiently large,X¯ has approximately a normal distribution withμX¯=μ andσ2$X¯$=σ2/n , andT0 also has approximately a normal distribution withμX¯=nμ, σ2T0=nσ2 . The larger the value of n, the better the approximation.- Rule of thumb: If n > 30, the Central Limit Theorem can be used.
- The CLT is applicable whether the variable of interest is discrete or continuous.
Other Applications of the Central Limit Theorem
- The CLT can be used to justify the normal approximation to the binomial distribution.
- Using the approximation only if both np ≥ 10 and n(1-p) ≥ 10 ensures that n is large enough to overcome any skewness in the underlying Benoulli distribution.
- Let
X1,X2,…,Xn be a random sample from a distribution for which only positive values are possible [P(Xi>0)=1 ]. Then if n is sufficiently large, the product Y =X1X2⋅…⋅Xn has approximately a lognormal distribution.- To verify this,
ln(Y)=ln(X1)+ln(X2)+...+ln(Xn)
The Distribution of a Linear Combination
Given a collection of n random variables
- Let
X1,X2,…,Xn have mean valuesμ1,μ2,…,μn respectively, and variances ofσ21,…,σ2n , respectively.
- Whether or not the Xi’s are independent,
E(∑i=1naiXi)=∑i=1naiE(Xi)=∑i=1naiμi - If
X1,X2,…,Xn are independent,
V(∑i=1naiXi)=∑i=1na2iV(Xi)=∑i=1na2iσ2i
and
σa1X1+...+anXn=a21σ21+...+a2nσ2n−−−−−−−−−−−−−√ - For any
X1,X2,…,Xn ,
V(∑i=1naiXi)=∑i=1n∑j=1naiajCov(Xi,Xj)
The Difference Between Two Random Variables
E(X1−X2)=E(X1)−E(X2) and, ifX1 andX2 are independent,V(X1−X2)=V(X1)+V(X2) .
The Case of Normal Random Variables
- If
X1,X2,…,Xn are independent, normally distributed rv's (with possibly different means and/or variances), then any linear combination of theX′is also has a normal distribution.
In particular, the differenceX1−X2 between two independent, normally distributed variables is itself normally distributed.
Copyright © 2015 by Xuan Dai. All rights reserved.