@daixuan1996 2015-01-07T06:40:32.000000Z 字数 10818 阅读 1190

概率统计笔记 CH5

Joint Probability Distributions and Random Samples

联合概率分布与随机样本

概率统计

Jointly Distributed Random Variables(联合分布随机变量)
- The joint probability mass function for two discrete random variables
  - Let X and Y be two discrete random variables defined on the sample space Ω of an experiment. The joint probability mass function(联合概率质量函数) p(x,y) is defined for each pair of numbers (x,y) by
    $p (x, y) = P (X = x a n d Y = y)$ $p(x, y) = P(X = x\ and\ Y = y)$
    Let A be any set consisting of pairs of (x,y) values.
    $p [(X, Y) \in A] = \sum \sum (x, y) \in A p (x, y)$ $p[(X,Y)\in A] = \sum\sum_{(x,y)\in A}{p(x,y)}$
  - The marginal probability mass functions(边缘概率质量函数) of X and Y, denoted by $p_X(x)$ and $p_Y(y)$ , respectively, are given by
    $p X (x) = \sum y p (x, y), p Y (y) = \sum x p (x, y)$ $p_X (x) = \sum_y p(x,y),\ p_Y(y)=\sum_x p(x,y)$
- The joint probability density function for two continuous random variables
  - Let X and Y be two continuous random variables. Then f(x,y) is the joint probability density function（联合概率密度函数）for X and Y if for any two-dimensional set A
    $P [(X, Y) \in A] = \int \int A f (x, y) d x d y$ $P[(X,Y)\in A] = \int\int_A f(x,y)dxdy$
    In particular, if A is the two-dimensional rectangle {(x,y):a ≤ x ≤ b, c ≤ y ≤ d}, then
    $P [(X, Y) \in A] = P (a \leq x \leq b, c \leq y \leq d) = \int b a \int d c f (x, y) d y d x$ $P[(X,Y)\in A] = P(a≤x≤b,c≤y≤d) = \int_a^b\int_c^d f(x,y)dydx$
    For f(x,y) to be a candidate for a joint pdf, it must satisfy f(x,y)≥0, and
    $\int + \infty - \infty \int + \infty - \infty f (x, y) d x d y = 1$ $\int_{-\infty}^{+\infty}\int_{-\infty}^{+\infty} f(x,y)dxdy = 1$
  - The marginal probability density functions(边缘概率密度函数) of X and Y, denoted by $f_X(x)$ and $f_Y(y)$ , respectively, are given by
    $f X (x) = \int + \infty - \infty f (x, y) d y f o r - \infty < x < + \infty$ $f_X(x)=\int_{-\infty}^{+\infty}f(x,y)dy\ \ for -\infty < x < +\infty$
    $f Y (y) = \int + \infty - \infty f (x, y) d x f o r - \infty < y < + \infty$ $f_Y(y)=\int_{-\infty}^{+\infty}f(x,y)dx\ \ for -\infty < y < +\infty$
- Independent Random Variable
  - Two random variables X and Y are said to be independent if for every pair of x and y values,
    $p(x,y) = p_X(x)·p_Y(y)\ \ \ when\ X\ and\ Y\ are\ discrete$
    $f(x,y) = f_X(x)·f_Y(y)\ \ \ when\ X\ and\ Y\ are\ continuous$
    Otherwise, X and Y are said to be dependent.
- More than two random variables
  - If $X_1, X_2, …, X_n$ are all discrete rv's, the joint pmf of the variables is the function
    $p (x 1, x 2, . . ., x n) = P (X 1 = x 1, X 2 = x 2, . . ., X n = x n)$ $p(x_1, x_2,...,x_n) = P(X_1=x_1, X_2=x_2,...,X_n=x_n)$
    If the variables are continuous, the joint pdf of $X_1, X_2, …, X_n$ is the function $f(x_1, x_2, …, x_n)$ such that for any n intervals $[a_1, b_1]$ , …, $[a_n, b_n]$ ,
    $P (a 1 \leq X 1 \leq b 1, . . ., a n \leq X n \leq b n) = \int b 1 a 1 . . . \int b n a n f (x 1, . . ., x n) d x n . . . d x 1$ $P(a_1≤X_1≤b_1,...,a_n≤X_n≤b_n) = \int_{a_1}^{b_1}...\int_{a_n}^{b_n}f(x_1,...,x_n)dx_n...dx_1$
  - The rv’s $X_1, X_2, …, X_n$ are said to be independent if for every subset of the variables and all possible values of these variables, the joint pmf or pdf of the subset is equal to the product of the marginal pmf's or pdf's.
  - The rv’s X1, X2, …, Xn are independent iff for all possible values of these variables, the joint pmf or pdf of these variables is equal to the product of the marginal pmf’s or pdf’s.
- Conditional Distributions
  - Let X and Y be two continuous rv's with joint pdf $f(x,y)$ and marginal X pdf $f_X(x)$ . Then for any X value x for which $f_X(x)>0$ ,the conditional probability density function of Y given that X=x is
    $f Y | X (y | x) = f ( x , y ) f X ( x ), - \infty < y < \infty$ $f_{Y|X}(y|x) = \frac{f(x,y)}{f_X(x)},\ \ -\infty < y < \infty$
    If X and Y are discrete, the conditional probability mass function of Y when X=x is:
    $P Y | X (Y = y | X = x) = P ( Y = y , X = x ) P ( X = x )$ $P_{Y|X}(Y=y|X=x) = \frac{P(Y=y,X=x)}{P(X=x)}$
Expected Values, Covariance(协方差), and Correlation(相关性)
- Expected Values
  - Let X and Y be jointly distributed rv's with pmf p(x,y) or pdf f(x,y) according to whether the variables are discrete or continuous. Then the expected value of a function h(X,Y), denoted by $E[h(X,Y)]$ or $μ_{h(X,Y)}$ , is given by
    $E [h (X, Y)] = \sum x \sum y h (x, y) \cdot p (x, y) f o r d i s c r e t e X a n d Y$ $E[h(X,Y)]=\sum_x\sum_y h(x,y)·p(x,y)\ \ \ for\ discrete\ X\ and\ Y$
    $E [h (X, Y)] = \int \infty - \infty \int - \infty \infty h (x, y) \cdot f (x, y) d x d y f o r c o n t i n u o u s X a n d Y$ $E[h(X,Y)]=\int_{-\infty}^{\infty}\int_{-\infty}{\infty} h(x,y)·f(x,y)dxdy\ \ \ for\ continuous\ X\ and\ Y$
  - The method of computing E[h(X1,…, Xn)], the expected value of a function $h(X_1, …, X_n)$ of n random variables is similar to that for two random variables.
  - If X and Y are independent random variables, then $E[XY] = E[X]E[Y]$
    Note that the converse is not true: If E[XY] = E[X]E[Y], we are not sure that X and Y are independent.
  - For any constant a and b, regardless of the relationship between X and Y: $E[aX+bY] = aE[X] + bE[Y]$
  - In general,
    
    $E [\sum i = 1 n a i X i] = \sum i = 1 n a i E [X i]$ $E[\sum_{i=1}^n a_iX_i] = \sum_{i=1}^n a_iE[X_i]$
  - Covariance(协方差)
  - The covariance between two rv’s X and Y is defined as
    $C o v (X, Y) = E [(X - μ X) (Y - μ Y)] = ⎧ ⎩ ⎨ ⎪ ⎪ ⎪ ⎪ \sum x \sum y (x - μ X) (y - μ Y) p (x, y) \int \infty - \infty \int - \infty \infty (x - μ X) (y - μ Y) f (x, y) d x d y X, Y d i s c r e t e X, Y c o n t i n u o u s$ $\begin{equation} Cov(X,Y)=E[(X-\mu_X)(Y-\mu_Y)]= \begin{cases} \sum_x\sum_y(x-\mu_X)(y-\mu_Y)p(x,y) & X,Y\ discrete \\ \int_{-\infty}^{\infty}\int_{-\infty}{\infty}(x-\mu_X)(y-\mu_Y)f(x,y)dxdy & X,Y\ continuous \end{cases} \end{equation}$
  - 1. For a strong positive relationship, Cov(X,Y) should be quite positive.
    2. For a strong negative relationship, Cov(X,Y) should be quite negative.
    3. If X and Y are not strongly related, positive and negative products will tend to cancel one another, yielding a covariance near 0.
  - $Cov(X, Y) = E(XY) - \mu_X\mu_Y$
    当X=Y时，协方差等于方差
- Correlation(相关性)
  - The correlation coefficient（相关系数）of X and Y, denoted by Corr(X,Y), ρX,Y or just ρ, is defined by
    $ρ X, Y = C o v ( X , Y ) σ X \cdot σ Y$ $\rho_{X,Y} = \frac{Cov(X,Y)}{\sigma_X·\sigma_Y}$
  - 1. If a and c are either both positive or both negative -> Corr(aX+b, cY+d) = Corr(X,Y)
    2. For any two rv's X and Y, -1 ≤ Corr(X,Y) ≤ 1.
    3. If X and Y are independent, then $ρ$ = 0, but $ρ$ = 0 does not imply independence.
    4. $ρ$ = 1 or –1 iff $Y = aX+b$ for some numbers a and b with a ≠ 0.
Statistics(统计量) and Their Distributions
- Consider selecting different samples of size n from the same population distribution. Because of uncertainty, before the data becomes available we view each observation in a sample as a random variable and denote the sample by $X_1, X_2, …, X_n$ .
  The variation in observed sample values in turn implies that the value of any function of the sample observations, such as the sample mean, sample standard deviation, or sample fourth spread, also varies from sample to sample.
- A Statistics(统计量) is any quantity whose value can be calculated from sample data.
  - Prior to obtaining data, there is uncertainty as to what value of any particular statistic will result. Therefore, a statistic is a random variable.
  - A statistic will be denoted by an uppercase letter; a lowercase letter is used to represent the calculated or observed value of the statistic.
  - The probability distribution of a statistic is sometimes referred to as its sampling distribution(抽样分布). It describes how the statistic varies in value across all samples that might be selected.
- Random Samples(随机样本)
  - The probability distribution of any particular statistic depends not only on the population distribution (normal, uniform, etc.) and the sample size n but also on the method of sampling.
  - A sampling method often encountered (at last approximately) in practice is random sampling(随机抽样).
  - The rv's X1,X2,…,Xn are said to form a (simple) random samples(随机样本) of size n if
    1. The $X_i's$ are independent rv's
    2. Every $X_i$ has the same probability distribution.
      When conditions 1 and 2 are satisfied, we say that the Xi's are independent and identically distributed (iid, 独立同分布).
  - What's more:
    1. Sampling with replacement or from an infinite population is random sampling.
    2. Sampling without replacement from a finite population is generally considered not random sampling. But if the sample size n is much smaller than the population size N (n/N ≤ 0.05), it is approximately random sampling.
- Deriving the Sampling Distribution of a Statistic
  - Probability rules can be used to obtain the distribution of a statistic provided that it is a "fairly simple" function of the Xi's and either there are relatively few different X values in the population or else the population distribution has a "nice" form.
- Simulation Experiments
  - This method is usually used when a derivation via probability rules is too difficult or complicated to be carried out. Such an experiment is virtually always done with the aid of a computer.
  - The following characteristics of an experiment must be specified:
    1. The statistic of interest
    2. The population distribution
    3. The sample size n
    4. The number of replications k
  - The larger the value of k, the better the approximation will tend to be．In practice, k = 500 or 1000 is usually enough if the statistic is "fairly simple".
The Distribution of the Sample Mean ☆★
- Let $X_1, X_2, …, X_n$ be a random sample from a distribution with mean $\mu$ and standard deviation $\sigma$ . Then
  1. $E(\bar{X}) = \mu_{\bar{X}} = \mu$
  2. V(X¯)=σ2X¯=σ2n and σX¯=σn√
    - In addition, with $T_0 = X_1+...+X_n$ (the sample total), $E(T_0)=n\mu,\ V(T_0)=n\sigma^2$ and $\sigma_{T_0}=\sqrt{n}\sigma$
- The Case of a Normal Population Distribution
  - Let $X_1, X_2, …, X_n$ be a random sample from a normal distribution with mean $\mu$ and standard deviation $\sigma$ . Then for any n, $\bar{X}$ is normally distributed with mean $\mu$ and standard deviation $\sigma/\sqrt{n}$ , as is $T_0$ with mean $n\mu$ and standard deviation $\sqrt{n}\sigma$ .
- The Central Limit Theorem - CLT(中心极限定理)
  - Let $X_1, X_2, …, X_n$ be a random sample from a distribution with mean $\mu$ and variance $\sigma^2$ . Then if n is sufficiently large, $\bar{X}$ has approximately a normal distribution with $\mu_{\bar{X}} = \mu$ and $\sigma_{$\bar{X}$}^2 = \sigma^2 /n$ , and $T_0$ also has approximately a normal distribution with $\mu_{\bar{X}} = n\mu,\ \ \sigma_{T_0}^2 = n\sigma^2$ . The larger the value of n, the better the approximation.
  - Rule of thumb: If n > 30, the Central Limit Theorem can be used.
  - The CLT is applicable whether the variable of interest is discrete or continuous.
- Other Applications of the Central Limit Theorem
  - The CLT can be used to justify the normal approximation to the binomial distribution.
  - Using the approximation only if both np ≥ 10 and n(1-p) ≥ 10 ensures that n is large enough to overcome any skewness in the underlying Benoulli distribution.
  - Let $X_1, X_2, …, X_n$ be a random sample from a distribution for which only positive values are possible [ $P(X_i > 0) = 1$ ]. Then if n is sufficiently large, the product Y = $X_1 X_2 · … · X_n$ has approximately a lognormal distribution.
  - To verify this, $ln(Y)=ln(X_1)+ln(X_2)+...+ln(X_n)$
The Distribution of a Linear Combination
- Given a collection of n random variables $X_1, X_2, …, X_n$ and n numerical constants $a_1, a_2, …, a_n$ , the rv
  
  Y=a1X1+...+anXn=∑i=1naiXi
  
  is called a linear combination(线性组合) of the X′is.
  - Let X1,X2,…,Xn have mean values μ1,μ2,…,μn respectively, and variances of σ21,…,σ2n, respectively.
    1. Whether or not the Xi’s are independent,
      $E (\sum i = 1 n a i X i) = \sum i = 1 n a i E (X i) = \sum i = 1 n a i μ i$ $E(\sum_{i=1}^{n}a_iX_i) = \sum_{i=1}^{n}a_iE(X_i)=\sum_{i=1}^{n}a_i\mu_i$
    2. If $X_1, X_2, …, X_n$ are independent,
      $V (\sum i = 1 n a i X i) = \sum i = 1 n a 2 i V (X i) = \sum i = 1 n a 2 i σ 2 i$ $V(\sum_{i=1}^{n}a_iX_i) = \sum_{i=1}^{n}a_i^2 V(X_i) = \sum_{i=1}^{n}a_i^2\sigma_i^2$
      and
      $σ a 1 X 1 + . . . + a n X n = a 21 σ 21 + . . . + a 2 n σ 2 n - - - - - - - - - - - - - \sqrt$ $\sigma_{a_1X_1+...+a_nX_n} = \sqrt{a_1^2\sigma_1^2+...+a_n^2\sigma_n^2}$
    3. For any $X_1, X_2, …, X_n$ ,
      $V (\sum i = 1 n a i X i) = \sum i = 1 n \sum j = 1 n a i a j C o v (X i, X j)$ $V(\sum_{i=1}^{n}a_iX_i) = \sum_{i=1}^n\sum_{j=1}^{n}a_i a_j Cov(X_i,X_j)$
- The Difference Between Two Random Variables
  - $E(X_1-X_2) = E(X_1) - E(X_2)$ and, if $X_1$ and $X_2$ are independent, $V(X_1-X_2) = V(X_1)+V(X_2)$ .
- The Case of Normal Random Variables
  - If $X_1, X_2, …, X_n$ are independent, normally distributed rv's (with possibly different means and/or variances), then any linear combination of the $X_i's$ also has a normal distribution.
    In particular, the difference $X_1-X_2$ between two independent, normally distributed variables is itself normally distributed.

概率统计笔记 CH5

Joint Probability Distributions and Random Samples

联合概率分布与随机样本

内容目录