[关闭]
@Channelchan 2018-04-19T10:24:42.000000Z 字数 17589 阅读 51801

因子预处理方法、多因子合成

目录

  1. 因子比较与筛选
  2. 常用的因子预处理方法-调整正负、去极值、行业市值中性化、标准化
  3. 多因子组合方法
  1. from jaqs_fxdayu.data import DataView
  2. import warnings
  3. warnings.filterwarnings("ignore")
  4. dataview_folder = './Factor'
  5. dv = DataView()
  6. dv.load_dataview(dataview_folder)
  7. dv.add_formula("momentum", "Return(close_adj, 20)", is_quarterly=False, add_data=True)
Dataview loaded successfully.
symbol 000001.SZ 000002.SZ 000008.SZ 000009.SZ 000012.SZ 000024.SZ 000027.SZ 000039.SZ 000046.SZ 000059.SZ ... 601992.SH 601997.SH 601998.SH 603000.SH 603160.SH 603288.SH 603699.SH 603858.SH 603885.SH 603993.SH
trade_date
20140102 -0.100735 -0.085812 -0.057592 -0.006342 -0.100442 -0.051708 -0.068143 0.012426 -0.074534 -0.089580 ... -0.140442 NaN -0.065375 0.104574 NaN NaN NaN NaN NaN -0.084892
20140103 -0.111690 -0.102975 -0.052910 -0.040881 -0.116740 -0.078923 -0.082474 0.048699 -0.091097 -0.111111 ... -0.167112 NaN -0.075426 0.105497 NaN NaN NaN NaN NaN -0.091437
20140106 -0.121896 -0.137255 -0.095643 -0.059129 -0.165380 -0.111576 -0.106164 0.011311 -0.098121 -0.134470 ... -0.214003 NaN -0.085575 0.132137 NaN NaN NaN NaN NaN -0.123726
20140107 -0.118271 -0.138051 -0.109342 -0.060228 -0.174342 -0.122535 -0.104991 0.039841 -0.095745 -0.139847 ... -0.200000 NaN -0.088020 0.076545 NaN NaN NaN NaN NaN -0.118594
20140108 -0.115124 -0.144175 -0.159346 -0.063224 -0.179235 -0.160665 -0.093103 0.066347 -0.081023 -0.156604 ... -0.216033 NaN -0.085575 0.118630 NaN NaN NaN NaN NaN -0.127941
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...

917 rows × 472 columns

  1. import numpy as np
  2. def mask_index_member():
  3. df_index_member = dv.get_ts('index_member')
  4. mask_index_member = ~(df_index_member >0) #定义信号过滤条件-非指数成分
  5. return mask_index_member
  6. def limit_up_down():
  7. # 定义可买卖条件——未停牌、未涨跌停
  8. trade_status = dv.get_ts('trade_status')
  9. mask_sus = trade_status == u'停牌'
  10. # 涨停
  11. dv.add_formula('up_limit', '(close - Delay(close, 1)) / Delay(close, 1) > 0.095', is_quarterly=False, add_data=True)
  12. # 跌停
  13. dv.add_formula('down_limit', '(close - Delay(close, 1)) / Delay(close, 1) < -0.095', is_quarterly=False, add_data=True)
  14. can_enter = np.logical_and(dv.get_ts('up_limit') < 1, ~mask_sus) # 未涨停未停牌
  15. can_exit = np.logical_and(dv.get_ts('down_limit') < 1, ~mask_sus) # 未跌停未停牌
  16. return can_enter,can_exit
  17. mask = mask_index_member()
  18. can_enter,can_exit = limit_up_down()

接下来,我们对pb、pe、ps、float_mv、momentum五个因子进行比较、筛选

  1. from jaqs_fxdayu.research.signaldigger import multi_factor
  2. ic = dict()
  3. factors_dict = {signal:dv.get_ts(signal) for signal in ["pb","pe","ps","float_mv","momentum"]}
  4. for period in [5, 15, 30]:
  5. ic[period]=multi_factor.get_factors_ic_df(factors_dict,
  6. price=dv.get_ts("close_adj"),
  7. high=dv.get_ts("high_adj"), # 可为空
  8. low=dv.get_ts("low_adj"),# 可为空
  9. n_quantiles=5,# quantile分类数
  10. mask=mask,# 过滤条件
  11. can_enter=can_enter,# 是否能进场
  12. can_exit=can_exit,# 是否能出场
  13. period=period,# 持有期
  14. benchmark_price=dv.data_benchmark, # 基准价格 可不传入,持有期收益(return)计算为绝对收益
  15. commission = 0.0008,
  16. )
Nan Data Count (should be zero) : 0;  Percentage of effective data: 59%
Nan Data Count (should be zero) : 0;  Percentage of effective data: 59%
Nan Data Count (should be zero) : 0;  Percentage of effective data: 59%
Nan Data Count (should be zero) : 0;  Percentage of effective data: 59%
Nan Data Count (should be zero) : 0;  Percentage of effective data: 59%
Nan Data Count (should be zero) : 0;  Percentage of effective data: 59%
Nan Data Count (should be zero) : 0;  Percentage of effective data: 59%
Nan Data Count (should be zero) : 0;  Percentage of effective data: 59%
Nan Data Count (should be zero) : 0;  Percentage of effective data: 59%
Nan Data Count (should be zero) : 0;  Percentage of effective data: 59%
Nan Data Count (should be zero) : 0;  Percentage of effective data: 59%
Nan Data Count (should be zero) : 0;  Percentage of effective data: 59%
Nan Data Count (should be zero) : 0;  Percentage of effective data: 59%
Nan Data Count (should be zero) : 0;  Percentage of effective data: 59%
Nan Data Count (should be zero) : 0;  Percentage of effective data: 59%
  1. import pandas as pd
  2. ic_mean_table = pd.DataFrame(data=np.nan,columns=[5,15,30],index=["pb","pe","ps","float_mv","momentum"])
  3. ic_std_table = pd.DataFrame(data=np.nan,columns=[5,15,30],index=["pb","pe","ps","float_mv","momentum"])
  4. ir_table = pd.DataFrame(data=np.nan,columns=[5,15,30],index=["pb","pe","ps","float_mv","momentum"])
  5. for signal in ["pb","pe","ps","float_mv","momentum"]:
  6. for period in [5, 15, 30]:
  7. ic_mean_table.loc[signal,period]=ic[period][signal].mean()
  8. ic_std_table.loc[signal,period]=ic[period][signal].std()
  9. ir_table.loc[signal,period]=ic[period][signal].mean()/ic[period][signal].std()
  10. print(ic_mean_table)
  11. print(ic_std_table)
  12. print(ir_table)
                5         15        30
pb       -0.045915 -0.077320 -0.117015
pe       -0.038581 -0.064546 -0.095852
ps       -0.030755 -0.054824 -0.084486
float_mv -0.000019  0.008395  0.027459
momentum -0.049655 -0.062483 -0.058243
                5         15        30
pb        0.229864  0.258433  0.245398
pe        0.211099  0.222946  0.215375
ps        0.178680  0.198704  0.193410
float_mv  0.220951  0.228495  0.223991
momentum  0.203428  0.212723  0.208409
                5         15        30
pb       -0.199748 -0.299188 -0.476838
pe       -0.182763 -0.289515 -0.445044
ps       -0.172122 -0.275909 -0.436822
float_mv -0.000084  0.036740  0.122589
momentum -0.244090 -0.293731 -0.279466

可视化比较

  1. %matplotlib inline
  2. ic_mean_table.plot(kind="barh",xerr=ic_std_table,figsize=(15,5))
<matplotlib.axes._subplots.AxesSubplot at 0x7f3dcfae95c0>

output_7_1.png-7.3kB

  1. %matplotlib inline
  2. ir_table.plot(kind="barh",figsize=(15,5))
<matplotlib.axes._subplots.AxesSubplot at 0x7f3dd0abfd30>

output_9_1.png-5.9kB

因子预处理

保留momentum、ps、pe、pb 进一步处理并尝试构建组合因子

  1. from jaqs_fxdayu.research.signaldigger import process
  2. factor_dict = dict()
  3. index_member = dv.get_ts("index_member")
  4. for name in ["pb","pe","ps","momentum"]:
  5. signal = -1*dv.get_ts(name) # 调整符号
  6. process.winsorize(factor_df=signal,alpha=0.05,index_member=index_member)#去极值
  7. signal = process.standardize(signal,index_member) #z-score标准化 保留排序信息和分布信息
  8. # signal = process.rank_standardize(signal,index_member) #因子在截面排序并归一化到0-1(只保留排序信息)
  9. # # 行业市值中性化
  10. # signal = process.neutralize(signal,
  11. # group=dv.get_ts("sw1"),# 行业分类标准
  12. # float_mv = dv.get_ts("float_mv"), #流通市值 可为None 则不进行市值中性化
  13. # index_member=index_member,# 是否只处理时只考虑指数成份股
  14. # )
  15. factor_dict[name] = signal

多因子组合

对筛选后的因子进行组合,一般有以下常规处理:
* 因子间存在较强同质性时,先使用施密特正交化方法对因子做正交化处理,用得到的正交化残差作为因子(也可以不使用,正交化会破坏因子的经济学逻辑,并剔除一些信息)
* 因子组合加权,常规的方法有:等权重、以某个时间窗口的滚动平均ic为权重、以某个时间窗口的滚动ic_ir为权重、最大化上个持有期的ic_ir为目标处理权重、最大化上个持有期的ic为目标处理权重
* 注:因为计算IC需要用到下一期股票收益,因此在动态加权方法里,实际上使用的是前一期及更早的IC值(向前推移了holding_period)计算当期的权重

  1. # 因子间存在较强同质性时,使用施密特正交化方法对因子做正交化处理,用得到的正交化残差作为因子
  2. new_factors = multi_factor.orthogonalize(factors_dict=factor_dict,
  3. standardize_type="rank",#输入因子标准化方法,有"rank"(排序标准化),"z_score"(z-score标准化)两种("rank"/"z_score"
  4. winsorization=False,#是否对输入因子去极值
  5. index_member=index_member) # 是否只处理指数成分股
  1. new_factors
{'momentum': symbol      000001.SZ  000002.SZ  000008.SZ  000009.SZ  000012.SZ  000024.SZ  \
 trade_date                                                                     
 20140102     0.715719   0.511706        NaN   0.354515   0.290970   0.709030   
 20140103     0.668896   0.491639        NaN   0.351171   0.264214   0.655518   
 20140106     0.722408   0.488294        NaN   0.354515   0.230769   0.645485   
 20140107     0.725753   0.488294        NaN   0.384615   0.190635   0.605351   
 20140108     0.745819   0.498328        NaN   0.367893   0.200669   0.471572   
 ...               ...        ...        ...        ...        ...        ...   
 [917 rows x 472 columns],
 'pb': symbol      000001.SZ  000002.SZ  000008.SZ  000009.SZ  000012.SZ  000024.SZ  \
 trade_date                                                                     
 20140102     0.244147   0.247492        NaN   0.719064   0.418060   0.411371   
 20140103     0.244147   0.220736        NaN   0.698997   0.404682   0.364548   
 20140106     0.311037   0.204013        NaN   0.688963   0.284281   0.331104   
 20140107     0.331104   0.204013        NaN   0.698997   0.284281   0.290970   
 20140108     0.357860   0.210702        NaN   0.688963   0.304348   0.173913   
 ...               ...        ...        ...        ...        ...        ...   

 [917 rows x 472 columns],
 'pe': symbol      000001.SZ  000002.SZ  000008.SZ  000009.SZ  000012.SZ  000024.SZ  \
 trade_date                                                                     
 20140102     0.441472   0.404682        NaN   0.869565   0.892977   0.347826   
 20140103     0.321070   0.451505        NaN   0.916388   0.909699   0.461538   
 20140106     0.327759   0.471572        NaN   0.886288   0.913043   0.421405   
 20140107     0.301003   0.451505        NaN   0.896321   0.909699   0.454849   
 20140108     0.301003   0.454849        NaN   0.919732   0.909699   0.508361   
 ...               ...        ...        ...        ...        ...        ...   

 [917 rows x 472 columns]}

用正交化前的因子,分别进行等权、以某个时间窗口的滚动平均ic为权重、以某个时间窗口的滚动ic_ir为权重、最大化上个持有期的ic_ir为目标处理权重、最大化上个持有期的ic为目标处理权重的加权组合方式,然后测试组合因子表现

  1. # rollback_period代表滚动窗口所用到的天数,即用前多少期的数据来计算现阶段的因子权重。 通常建议设置时间在半年以上,可以获得相对稳定的预期结果
  2. # 多因子组合-动态加权参数配置
  3. props = {
  4. 'price':dv.get_ts("close_adj"),
  5. 'high':dv.get_ts("high_adj"), # 可为空
  6. 'low':dv.get_ts("low_adj"),# 可为空
  7. 'ret_type': 'return',#可选参数还有upside_ret/downside_ret 则组合因子将以优化潜在上行、下行空间为目标
  8. 'benchmark_price': dv.data_benchmark, # 为空计算的是绝对收益 不为空计算相对收益
  9. 'period': 30, # 30天的持有期
  10. 'mask': mask,
  11. 'can_enter': can_enter,
  12. 'can_exit': can_exit,
  13. 'forward': True,
  14. 'commission': 0.0008,
  15. "covariance_type": "shrink", # 协方差矩阵估算方法 还可以为"simple"
  16. "rollback_period": 120} # 滚动窗口天数
  1. comb_factors = dict()
  2. for method in ["equal_weight","ic_weight","ir_weight","max_IR","max_IC"]:
  3. comb_factors[method] = multi_factor.combine_factors(factor_dict,
  4. standardize_type="rank",
  5. winsorization=False,
  6. weighted_method=method,
  7. props=props)
  8. print(method)
  9. print(comb_factors[method].dropna(how="all").head())
equal_weight
symbol      000001.SZ  000002.SZ  000008.SZ  000009.SZ  ...
trade_date                                                                     
20140102     0.762542   0.819398        NaN   0.143813   ...
20140103     0.745819   0.822742        NaN   0.187291   ...
20140106     0.712375   0.842809        NaN   0.190635   ...
20140107     0.705686   0.849498        NaN   0.190635   ...
20140108     0.678930   0.842809        NaN   0.204013   ...
[5 rows x 472 columns]
Nan Data Count (should be zero) : 0;  Percentage of effective data: 59%
Nan Data Count (should be zero) : 0;  Percentage of effective data: 59%
Nan Data Count (should be zero) : 0;  Percentage of effective data: 59%
Nan Data Count (should be zero) : 0;  Percentage of effective data: 59%
ic_weight
symbol      000001.SZ  000002.SZ  000008.SZ  000009.SZ  ...
trade_date                                                                     
20140812     0.775920   0.826087        NaN   0.297659   ...
20140813     0.755853   0.789298        NaN   0.311037   ...
20140814     0.762542   0.799331        NaN   0.307692   ...
20140815     0.762542   0.852843        NaN   0.153846   ...
20140818     0.765886   0.913043        NaN   0.083612   ...

[5 rows x 472 columns]
Nan Data Count (should be zero) : 0;  Percentage of effective data: 59%
Nan Data Count (should be zero) : 0;  Percentage of effective data: 59%
Nan Data Count (should be zero) : 0;  Percentage of effective data: 59%
Nan Data Count (should be zero) : 0;  Percentage of effective data: 59%
ir_weight
symbol      000001.SZ  000002.SZ  000008.SZ  000009.SZ  ...
trade_date                                                  
20140812     0.769231   0.859532        NaN   0.311037   ...
20140813     0.732441   0.819398        NaN   0.331104   ...
20140814     0.732441   0.819398        NaN   0.331104   ...
20140815     0.739130   0.872910        NaN   0.170569   ...
20140818     0.735786   0.933110        NaN   0.073579   ...

[5 rows x 472 columns]
Nan Data Count (should be zero) : 0;  Percentage of effective data: 59%
Nan Data Count (should be zero) : 0;  Percentage of effective data: 59%
Nan Data Count (should be zero) : 0;  Percentage of effective data: 59%
Nan Data Count (should be zero) : 0;  Percentage of effective data: 59%
max_IR
symbol      000001.SZ  000002.SZ  000008.SZ  000009.SZ  ...
trade_date                                                                     
20140813     0.334448   0.468227        NaN   0.678930   ...
20140814     0.374582   0.478261        NaN   0.678930   ...
20140815     0.414716   0.655518        NaN   0.384615   ...
20140818     0.421405   0.739130        NaN   0.163880   ...
20140819     0.505017   0.765886        NaN   0.120401   ...

[5 rows x 472 columns]
Nan Data Count (should be zero) : 0;  Percentage of effective data: 59%
Nan Data Count (should be zero) : 0;  Percentage of effective data: 59%
Nan Data Count (should be zero) : 0;  Percentage of effective data: 59%
Nan Data Count (should be zero) : 0;  Percentage of effective data: 59%
max_IC
symbol      000001.SZ  000002.SZ  000008.SZ  000009.SZ  ...
trade_date                                                 
20140221     0.030100   0.324415        NaN   0.903010   ...
20140224     0.020067   0.163880        NaN   0.956522   ...
20140225     0.193980   0.672241        NaN   0.451505   ...
20140226     0.341137   0.903010        NaN   0.120401   ...
20140227     0.471572   0.799331        NaN   0.170569   ...

[5 rows x 472 columns]

比较组合前和组合后的因子在30日持有期下的表现(统一到2014年9月后进行比较)

  1. period = 30
  2. ic_30 = multi_factor.get_factors_ic_df(comb_factors,
  3. price=dv.get_ts("close_adj"),
  4. high=dv.get_ts("high_adj"), # 可为空
  5. low=dv.get_ts("low_adj"),# 可为空
  6. n_quantiles=5,# quantile分类数
  7. mask=mask,# 过滤条件
  8. can_enter=can_enter,# 是否能进场
  9. can_exit=can_exit,# 是否能出场
  10. period=period,# 持有期
  11. benchmark_price=dv.data_benchmark, # 基准价格 可不传入,持有期收益(return)计算为绝对收益
  12. commission = 0.0008,
  13. )
  14. ic_30 = pd.concat([ic_30,-1*ic[30].drop("float_mv",axis=1)],axis=1)
  15. ic_30.head()
Nan Data Count (should be zero) : 0;  Percentage of effective data: 59%
Nan Data Count (should be zero) : 0;  Percentage of effective data: 49%
Nan Data Count (should be zero) : 0;  Percentage of effective data: 49%
Nan Data Count (should be zero) : 0;  Percentage of effective data: 49%
Nan Data Count (should be zero) : 0;  Percentage of effective data: 57%
equal_weight ic_weight ir_weight max_IR max_IC pb pe ps momentum
trade_date
20140102 NaN NaN NaN NaN NaN NaN NaN NaN
20140103 -0.047022 NaN NaN NaN NaN -0.053452 -0.018868 -0.004815
20140106 -0.075316 NaN NaN NaN NaN -0.085169 -0.053065 -0.018863
20140107 0.027397 NaN NaN NaN NaN 0.026080 0.023327 0.056947
20140108 0.131598 NaN NaN NaN NaN 0.084470 0.081776 0.158562
  1. ic_30_mean = dict()
  2. ic_30_std = dict()
  3. ir_30 = dict()
  4. for name in ic_30.columns:
  5. ic_30_mean[name]=ic_30[name].loc[20140901:].mean()
  6. ic_30_std[name]=ic_30[name].loc[20140901:].std()
  7. ir_30[name] = ic_30_mean[name]/ic_30_std[name]
  1. import datetime
  2. trade_date = pd.Series(ic_30.index)
  3. trade_date = trade_date.apply(lambda x: datetime.datetime.strptime(str(x), '%Y%m%d'))
  4. ic_30.index = trade_date

可视化比较

  1. pd.Series(ic_30_mean).plot(kind="barh",xerr=pd.Series(ic_30_std),figsize=(15,5))
<matplotlib.axes._subplots.AxesSubplot at 0x7f3dd0055780>

output_23_1.png-8.2kB

  1. print(ic_30_mean["equal_weight"])
  2. print(ic_30_mean["ic_weight"])
  3. print(ic_30_mean["pe"])
0.12073068936069978
0.10512971417144427
0.10438725026905876
  1. pd.Series(ir_30).plot(kind="barh",figsize=(15,5))
<matplotlib.axes._subplots.AxesSubplot at 0x7f3dcfc3d0f0>

output_25_1.png-7kB

  1. print(ir_30["equal_weight"])
  2. print(ir_30["ic_weight"])
  3. print(ir_30["pe"])
0.5686486313274169
0.48476009429523187
0.47457282064911344
  1. ic_30[["equal_weight","ic_weight","pe"]].plot(kind="line",figsize=(15,5),)
<matplotlib.axes._subplots.AxesSubplot at 0x7f3dcfbb4780>

output_27_1.png-83kB

  1. ic_30.loc[datetime.date(2017,1,3):,][["equal_weight","ic_weight","pe"]].plot(kind="line",figsize=(15,5),)
<matplotlib.axes._subplots.AxesSubplot at 0x7f3dcfd1f5c0>

output_28_1.png-60.6kB

查看等权合成因子的详情报告

  1. import matplotlib.pyplot as plt
  2. from jaqs_fxdayu.research.signaldigger.analysis import analysis
  3. from jaqs_fxdayu.research import SignalDigger
  4. obj = SignalDigger()
  5. obj.process_signal_before_analysis(signal=comb_factors["equal_weight"],
  6. price=dv.get_ts("close_adj"),
  7. high=dv.get_ts("high_adj"), # 可为空
  8. low=dv.get_ts("low_adj"),# 可为空
  9. n_quantiles=5,# quantile分类数
  10. mask=mask,# 过滤条件
  11. can_enter=can_enter,# 是否能进场
  12. can_exit=can_exit,# 是否能出场
  13. period=30,# 持有期
  14. benchmark_price=dv.data_benchmark, # 基准价格 可不传入,持有期收益(return)计算为绝对收益
  15. commission = 0.0008,
  16. )
  17. obj.create_full_report()
  18. plt.show()
Nan Data Count (should be zero) : 0;  Percentage of effective data: 59%


Value of signals of Different Quantiles Statistics
               min       max      mean       std  count    count %
quantile                                                          
1         0.000000  0.538462  0.103261  0.060651  51770  20.146791
2         0.180602  0.628763  0.307896  0.059901  51396  20.001245
3         0.371237  0.695652  0.509498  0.059105  51370  19.991127
4         0.565217  0.846154  0.708293  0.057892  51396  20.001245
5         0.755853  1.000000  0.904357  0.056350  51032  19.859591
Figure saved: /home/xinger/Desktop/qtc/多因子选股模型/returns_report.pdf
Information Analysis
                 ic
IC Mean       0.125
IC Std.       0.209
t-stat(IC)   17.758
p-value(IC)   0.000
IC Skew      -0.171
IC Kurtosis  -0.764
Ann. IR       0.597
Figure saved: /home/xinger/Desktop/qtc/多因子选股模型/information_report.pdf



<matplotlib.figure.Figure at 0x7f3dd00771d0>

output_30_2.png-418.5kB

output_30_3.png-163.1kB

  1. print(analysis(obj.signal_data,is_event=False,period=30))
{'ic':                 return_ic  upside_ret_ic  downside_ret_ic
IC Mean      1.247252e-01      -0.012185     2.604833e-01
IC Std.      2.090651e-01       0.202819     1.871202e-01
t-stat(IC)   1.775782e+01      -1.788293     4.143583e+01
p-value(IC)  1.404327e-60       0.074071    1.873265e-209
IC Skew     -1.714002e-01       0.297758    -5.568998e-01
IC Kurtosis -7.644106e-01      -0.779115     6.324959e-02
Ann. IR      5.965856e-01      -0.060079     1.392064e+00, 'ret':              long_ret  long_short_ret  top_quantile_ret  bottom_quantile_ret  \
t-stat       8.345612       11.909596         33.564165           -18.003623   
p-value      0.000000        0.000000          0.000000             0.000000   
skewness     0.050514        0.302533          2.210412             1.401058   
kurtosis     6.671800        1.616995         14.291204             6.855082   
Ann. Ret     0.054801        0.086260          0.119795            -0.090039   
Ann. Vol     0.069935        0.077139          0.283879             0.400644   
Ann. IR      0.783600        1.118236          0.421994            -0.224736   
occurance  916.000000      916.000000      51032.000000         51770.000000   

              tmb_ret  all_sample_ret  
t-stat      12.542852        7.774882  
p-value      0.000000        0.000000  
skewness     0.228420        1.562832  
kurtosis     1.431146       10.060084  
Ann. Ret     0.210238        0.014646  
Ann. Vol     0.178517        0.336206  
Ann. IR      1.177695        0.043562  
occurance  916.000000   256964.000000  , 'space':                long_space  top_quantile_space  bottom_quantile_space  \
Up_sp Mean       0.127680            0.126570               0.133127   
Up_sp Std        0.089336            0.142698               0.160436   
Up_sp IR         1.429214            0.886984               0.829783   
Up_sp Pct5       0.029497            0.000000               0.000000   
Up_sp Pct25      0.074793            0.034881               0.033683   
Up_sp Pct50      0.105318            0.085674               0.086263   
Up_sp Pct75      0.147014            0.166145               0.172919   
Up_sp Pct95      0.331937            0.396358               0.424710   
Up_sp Occur    916.000000        51032.000000           51770.000000   
Down_sp Mean    -0.080231           -0.067920              -0.106705   
Down_sp Std      0.079146            0.079477               0.110319   
Down_sp IR      -1.013709           -0.854579              -0.967242   
Down_sp Pct5    -0.289758           -0.249430              -0.355477   
Down_sp Pct25   -0.090315           -0.086358              -0.139463   
Down_sp Pct50   -0.055103           -0.041721              -0.074544   
Down_sp Pct75   -0.035151           -0.016211              -0.032333   
Down_sp Pct95   -0.013951           -0.000800              -0.000800   
Down_sp Occur  916.000000        51032.000000           51770.000000   

                tmb_space  all_sample_space  
Up_sp Mean       0.235616          0.128369  
Up_sp Std        0.128020          0.144884  
Up_sp IR         1.840460          0.886010  
Up_sp Pct5       0.103259          0.000000  
Up_sp Pct25      0.154600          0.035285  
Up_sp Pct50      0.199942          0.086829  
Up_sp Pct75      0.276546          0.170993  
Up_sp Pct95      0.507797          0.398458  
Up_sp Occur    916.000000     256964.000000  
Down_sp Mean    -0.203198         -0.086153  
Down_sp Std      0.108915          0.097361  
Down_sp IR      -1.865656         -0.884883  
Down_sp Pct5    -0.422703         -0.302457  
Down_sp Pct25   -0.274500         -0.110766  
Down_sp Pct50   -0.164386         -0.055024  
Down_sp Pct75   -0.130797         -0.022274  
Down_sp Pct95   -0.084976         -0.000453  
Down_sp Occur  916.000000     256964.000000  }

进一步测试下等权合成因子的绝对收益效果

  1. obj.process_signal_before_analysis(signal=comb_factors["equal_weight"],
  2. price=dv.get_ts("close_adj"),
  3. high=dv.get_ts("high_adj"), # 可为空
  4. low=dv.get_ts("low_adj"),# 可为空
  5. n_quantiles=5,# quantile分类数
  6. mask=mask,# 过滤条件
  7. can_enter=can_enter,# 是否能进场
  8. can_exit=can_exit,# 是否能出场
  9. period=30,# 持有期
  10. #benchmark_price=dv.data_benchmark, # 基准价格 可不传入,持有期收益(return)计算为绝对收益
  11. commission = 0.0008,
  12. )
  13. obj.create_full_report()
  14. plt.show()
Nan Data Count (should be zero) : 0;  Percentage of effective data: 59%


Value of signals of Different Quantiles Statistics
               min       max      mean       std  count    count %
quantile                                                          
1         0.000000  0.538462  0.103261  0.060651  51770  20.146791
2         0.180602  0.628763  0.307896  0.059901  51396  20.001245
3         0.371237  0.695652  0.509498  0.059105  51370  19.991127
4         0.565217  0.846154  0.708293  0.057892  51396  20.001245
5         0.755853  1.000000  0.904357  0.056350  51032  19.859591
Figure saved: /home/xinger/Desktop/qtc/多因子选股模型/returns_report.pdf
Information Analysis
                 ic
IC Mean       0.126
IC Std.       0.209
t-stat(IC)   17.882
p-value(IC)   0.000
IC Skew      -0.174
IC Kurtosis  -0.764
Ann. IR       0.601
Figure saved: /home/xinger/Desktop/qtc/多因子选股模型/information_report.pdf



<matplotlib.figure.Figure at 0x7f3db201da90>

output_33_2.png-419.4kB
output_33_3.png-163.4kB

将Quantile5的选股结果保存成excel

  1. excel_data = obj.signal_data[obj.signal_data['quantile']==5]["quantile"].unstack().replace(np.nan, 0).replace(5, 1)
  2. print (excel_data.head())
  3. excel_data.to_excel('./equal_weight_quantile_5.xlsx')
symbol      000001.SZ  000002.SZ  000012.SZ  000024.SZ  ...
trade_date                                                                     
20140103          0.0        1.0        0.0        0.0        ...
20140106          0.0        1.0        0.0        0.0        ...
20140107          0.0        1.0        0.0        0.0        ...
20140108          0.0        1.0        0.0        0.0        ...
20140109          0.0        1.0        0.0        0.0        ...

[5 rows x 245 columns]
添加新批注
在作者公开此批注前,只有你和作者可见。
回复批注