@daixuan1996
2014-12-25T00:42:00.000000Z
字数 2072
阅读 1162
概率统计
Populations(总体), Samples(样本), Processes
- A population is the set of all elements of interest in a particular study.
- A sample is a subset of the population.
- Data are the facts and figures that are collected, analyzed, and summarized for presentation and interpretation.
- The elements are the entities(存在) on which data are collected.
- A variable is a characteristic of interest for the elements.
- The set of measurements collected for a particular element is called an observation(观察值).
Pictorial and Tabular Methods in Descriptive Statistics
- Notation——eg, {1, 2, 3}
- Stem-and-Leaf Display(茎叶图)
画茎叶图时,在右上标注Stem与Leaf的意义- Dotplots(点图),适合小数据
- Histograms(直方图)
- Dicrete 离散的
frequency(频数) && relative frequency(相对频数,即频率)- Continuous 连续的
class frequency && class relative frequency
number of classes≈number of observations−−−−−−−−−−−−−−−−−−−√ - Histograms
- symmertric (
μ=μ˜ )- positive skewed (
μ<μ˜ )- negative skewed (
μ>μ˜ )
Measures of Location
- Mean 均值
x¯=∑ni=0xin - Median 中位数
x˜ - Trimmed Means 截尾平均值
eg. xtr(10)¯ 的意思是去掉最小的百分之十和最大的百分之十之后剩余的百分之八十的平均值- Quartiles(四分位数)与Percentiles(百分位数)
这两者与中位数大同小异。中位数是一分为二,四分位数一分为四,百分位数一分为百……eg. 99thpercentile 的意思是将最小的百分之99%与最大的1%分开的那个数。
Measures of Variability
- Range 极差 最大的数减最小的数
- Sample Variance 样本方差
s2=∑(xi−x¯)n−1=Sxxn−1
n-1是为了无偏估计
Sxx=∑(xi−x¯)2=∑x2i−(∑xi)2n - Sample Standard Variance 样本标准差
s=s2−−√ - Population Variance 总体方差
σ2=∑Ni=1(xi−x¯)N - Sample Standard Variance 总体标准差
σ=σ2−−√ - Boxplots 箱线图,可以描述一些数据集合的突出特征
- center
- spread
- extent and nature of any departure from symmetry
- identification of “outliers(离群值)”
fs=upper fourth−lower fourth
大于最靠近的四分点1.5fs 倍的则称为outlier,大于3fs 则称为extreme的,在1.5fs 与3fs 中的则称为mild的。
smallest x−lower fourth−median−upper fourth−largest x−outlier
Copyright © 2014 by Xuan Dai. All rights reserved.