@Channelchan
2018-01-30T17:55:30.000000Z
字数 3398
阅读 828
回归参数
用Orange对1_UP_DOWN数据进行观察,构建机器学习多模型绩效体系,评估该数据用哪个模型与参数最佳,最后按precision排序输出完整分类的绩效,提交截图。
用DataView读取某一股票,(000001.SZ)日级别的close_adj,pe,pb,ps,pcf因子,起始时间为2017-01-03,结束时间为2017-12-29。
close_adj pb pcf pe ps
2017-01-03 959.585597 0.8822 -0.6374 7.1933 1.6356
2017-01-04 959.585597 0.8822 -0.6374 7.1933 1.6356
2017-01-05 960.633180 0.8832 -0.6381 7.2011 1.6374
2017-01-06 956.442850 0.8793 -0.6353 7.1697 1.6302
2017-01-09 958.538015 0.8813 -0.6367 7.1854 1.6338
用DataView读取所有股票,并读取它们的日级别的close_adj,pe,pb,ps,pcf因子,将索引修改成datetime格式,做成Panel,并转成DataFrame显示出来。
close_adj pb pcf pe ps
major symbol
2017-01-03 000001.SZ 959.585597 0.8822 -0.6374 7.1933 1.6356
000002.SZ 2752.580748 2.2792 3.5916 12.6296 1.1703
000008.SZ 205.675439 4.3392 -323.3723 141.3076 20.2589
000009.SZ 76.551081 4.9545 53.6505 28.4941 4.6209
000012.SZ 200.848529 2.9722 11.8065 37.8691 3.1838
数据添加与采样,对第三题中得到的多重索引的表格进行重采样resample().last,将数据缩减到周级别,并在column轴上添加一个周收益W_return项,再添加涨为1,跌为-1的列。
close_adj pb pcf pe ps \
major symbol
2017-01-08 000001.SZ 956.442850 0.8793 -0.6353 7.1697 1.6302
000002.SZ 2740.630325 2.2693 3.5761 12.5748 1.1652
000008.SZ 208.758369 4.4042 -328.2194 143.4257 20.5625
000009.SZ 76.917004 4.9782 53.9070 28.6303 4.6430
000012.SZ 200.496163 2.9670 11.7858 37.8026 3.1783
W_Return Classify
major symbol
2017-01-08 000001.SZ 0.003286 1.0
000002.SZ 0.056686 1.0
000008.SZ -0.028481 -1.0
000009.SZ -0.041865 -1.0
000012.SZ 0.051845 1.0
用决策树分类
TrainTest/Processing
X与y的设置,X = ['pe', 'pb', 'ps', 'pcf'], y=['Classify'],对数据进行TrainTestSplit,顺序不可打乱,再从sklearn.preprocessing导入minmax_scaler,运用pandas里groupby的方法在时间横截面上对数据进行缩放, 如DataFrame().groupby(level=0).transform(minmax_scale)。
pe pb ps pcf
major symbol
2017-01-08 000001.SZ 0.000598 0.025787 0.001830 0.954595
000002.SZ 0.001048 0.066552 0.001249 0.954708
000008.SZ 0.011953 0.129161 0.025501 0.945778
000009.SZ 0.002386 0.145995 0.005597 0.956063
000012.SZ 0.003150 0.087013 0.003766 0.954929
Plot
画出max_depth=3的DecisionTree的树图与重要性的图。
Performance
查看score与classification_report,输出预测值,做成Series,打印出预测为1的股票。
Accuracy of Decision Tree classifier on training set: 0.53
Accuracy of Decision Tree classifier on test set: 0.60
precision recall f1-score support
Down 0.60 0.98 0.74 3649
Up 0.53 0.03 0.05 2484
avg / total 0.57 0.60 0.46 6133
major symbol
2017-09-17 603288.SH 1.0
2017-09-24 002027.SZ 1.0
002074.SZ 1.0
002085.SZ 1.0
002174.SZ 1.0
002236.SZ 1.0
002415.SZ 1.0
002466.SZ 1.0
002508.SZ 1.0
002555.SZ 1.0
002568.SZ 1.0
002653.SZ 1.0
002841.SZ 1.0
300033.SZ 1.0
600233.SH 1.0
600519.SH 1.0
603288.SH 1.0
603833.SH 1.0
2017-10-15 000559.SZ 1.0
002027.SZ 1.0
002085.SZ 1.0
002174.SZ 1.0
002236.SZ 1.0
002415.SZ 1.0
002466.SZ 1.0
002468.SZ 1.0
002508.SZ 1.0
002568.SZ 1.0
002653.SZ 1.0
002841.SZ 1.0
...
2017-12-03 002508.SZ 1.0
002841.SZ 1.0
300033.SZ 1.0
600887.SH 1.0
2017-12-10 002027.SZ 1.0
002236.SZ 1.0
002294.SZ 1.0
002466.SZ 1.0
002508.SZ 1.0
002555.SZ 1.0
002841.SZ 1.0
300033.SZ 1.0
600887.SH 1.0
2017-12-17 002027.SZ 1.0
002236.SZ 1.0
002294.SZ 1.0
002466.SZ 1.0
002508.SZ 1.0
002555.SZ 1.0
002841.SZ 1.0
300033.SZ 1.0
600887.SH 1.0
2017-12-24 002027.SZ 1.0
002236.SZ 1.0
002294.SZ 1.0
002466.SZ 1.0
002508.SZ 1.0
002841.SZ 1.0
300033.SZ 1.0
600887.SH 1.0
Length: 125, dtype: float64
TimeSeriesSplit
对000001.SZ一只股票进行TimeSeriesSplit,
X_reg = df[['pe', 'pb', 'ps', 'pcf']]
y_reg = df['close_adj']
设置max_train_size=5,输出train_index。
preprocessing
对TimeSeriesSplit获得的X_train进行MinMaxScaler(),输出X_train_scaled。
CV
用RidgeCV进行优化,参数范围在0.1到0.0001之间,获取r2与alpha,输出r2与alpha的mean。
用Polynomial设置Degree=5对X_reg进行升维
参数interaction_only=True
输出升维后的结果
用Pipiline执行升维Polynomial设置Degree=5与Lasso(alpha=0.1)模型结合实例化运行
训练X_reg, y_reg
计算X_reg, y_reg数据pipeline模型的score
将所有答案整理到一个压缩文件中,名称为(姓名_组别)