[关闭]
@Channelchan 2018-04-19T09:31:16.000000Z 字数 7532 阅读 53303

因子分析(by+alphalens)

目录

  1. Alphalens是什么?
  2. 使用Alphalens的数据准备工作?
  3. 如何用Alphalens测试选股因子效果?
  4. 什么是信息系数?
  5. 如何通过信息系数的可视化呈现进一步观测因子效果?
  6. 更多:因子在不同板块的选股能力比较分析

Alphalen是什么?

Alphalens是一个Python第三方库,专门用于选股因子alpha(α)的绩效分析。

下载方式: pip install alphalens

官方网站: http://quantopian.github.io/alphalens/index.html

使用Alphalens的数据准备工作?

下面以沪深300成分股为例,构造选股因子(factor)并

  1. from jaqs_fxdayu.data import DataView # 可以视为一个轻量级的数据库,数据格式基于pandas,方便数据的调用和处理
  2. from jaqs_fxdayu.data import RemoteDataService # 数据服务,用于下载数据
  3. import os
  4. import warnings
  5. warnings.filterwarnings("ignore")
  6. dataview_folder = './Factor'
  7. if not (os.path.isdir(dataview_folder)):
  8. os.makedirs(dataview_folder)
  9. # 数据下载
  10. def save_dataview():
  11. data_config = {
  12. "remote.data.address": "tcp://data.tushare.org:8910",
  13. "remote.data.username": "18566262672",
  14. "remote.data.password": "eyJhbGciOiJIUzI1NiJ9.eyJjcmVhdGVfdGltZSI6IjE1MTI3MDI3NTAyMTIiLCJpc3MiOiJhdXRoMCIsImlkIjoiMTg1NjYyNjI2NzIifQ.O_-yR0zYagrLRvPbggnru1Rapk4kiyAzcwYt2a3vlpM"
  15. }
  16. ds = RemoteDataService()
  17. ds.init_from_config(data_config)
  18. dv = DataView()
  19. props = {'start_date': 20140101, 'end_date': 20140201, 'universe': '000300.SH',
  20. 'fields': "pb,pe,ps,float_mv,sw1",
  21. 'freq': 1}
  22. dv.init_from_config(props, ds)
  23. dv.prepare_data()
  24. dv.save_dataview(dataview_folder) # 保存数据文件到指定路径,方便下次直接加载
  25. # save_dataview()
  1. # 加载数据
  2. dv = DataView()
  3. dv.load_dataview(dataview_folder)
Dataview loaded successfully.
  1. dv.fields
['pb',
 'pe',
 'ps',
 'float_mv',
 'sw1',
 'open_adj',
 'high_adj',
 'low_adj',
 'close_adj',
 'open',
 'high',
 'low',
 'close',
 'vwap',
 'vwap_adj',
 'adjust_factor',
 'index_member',
 'index_weight',
 'sw1',
 'volume',
 '_daily_adjust_factor',
 '_limit']
  1. import pandas as pd
  2. from datetime import datetime
  3. factor = dv.get_ts("pb")
  4. factor.index = pd.Index(map(lambda x: datetime.strptime(str(x),"%Y%m%d") , factor.index)) #索引调整为datetime日期格式
  5. factor = factor.stack()#处理成MultiIndex格式(alphalens分析因子必须的格式)
  6. factor.head()
            symbol   
2014-01-02  000001.SZ    1.0563
            000002.SZ    1.2891
            000008.SZ    4.8981
            000009.SZ    3.5794
            000012.SZ    2.3725
dtype: float64
  1. def change_index(df):
  2. df.index = pd.Index(map(lambda x: datetime.strptime(str(x),"%Y%m%d") , df.index)) #索引调整为datetime日期格式
  3. return df
  1. prices = dv.get_ts("close_adj") #获取价格
  2. prices = change_index(prices)
  3. prices.head()
symbol 000001.SZ 000002.SZ 000008.SZ 000009.SZ 000012.SZ 000024.SZ 000027.SZ 000039.SZ 000046.SZ 000059.SZ ... 601992.SH 601997.SH 601998.SH 603000.SH 603160.SH 603288.SH 603699.SH 603858.SH 603885.SH 603993.SH
2014-01-02 714.073976 933.813512 45.579438 42.049039 129.513427 105.238518 43.384556 349.062344 130.893481 7.614983 ... 6.813780 NaN 4.611372 79.319265 NaN NaN NaN NaN NaN 6.541190
2014-01-03 696.557852 916.282595 45.326219 40.930713 127.447568 103.445262 42.353478 354.474163 128.550869 7.339742 ... 6.422065 NaN 4.539693 78.055898 NaN NaN NaN NaN NaN 6.438341
2014-01-06 681.377212 874.208394 43.097891 40.572849 120.296520 97.911786 41.401715 342.748554 126.501083 6.988046 ... 6.133433 NaN 4.468013 77.762796 NaN NaN NaN NaN NaN 6.191504
2014-01-07 679.041729 868.364755 42.490165 40.483383 119.660872 95.759878 41.243088 353.121208 124.451297 6.865717 ... 6.143741 NaN 4.456067 79.743757 NaN NaN NaN NaN NaN 6.191504
2014-01-08 686.632049 867.196028 41.679864 39.767655 119.343047 93.146848 41.718969 351.542761 126.208256 6.835135 ... 5.947883 NaN 4.468013 81.866214 NaN NaN NaN NaN NaN 6.098940

5 rows × 472 columns

  1. import alphalens
  2. #计算目标股票池每只股票的持有期收益,和对应因子值的quantile分类
  3. factor_data = alphalens.utils.get_clean_factor_and_forward_returns(factor, prices, quantiles=5, periods=(1,5,10))
  4. factor_data.head()
1 5 10 factor factor_quantile
date asset
2014-01-02 000001.SZ -0.024530 -0.033524 -0.044154 1.0563 1
000002.SZ -0.018773 -0.066333 -0.100125 1.2891 2
000008.SZ -0.005556 -0.102222 -0.054444 4.8981 4
000009.SZ -0.026596 -0.075532 -0.057447 3.5794 4
000012.SZ -0.015951 -0.093252 -0.039264 2.3725 3

如何用Alphalens测试选股因子效果?

  1. import matplotlib.pyplot as plt
  2. mean_return_by_q, std_err_by_q = alphalens.performance.mean_return_by_quantile(factor_data, by_date=True, demeaned=True)
  3. alphalens.plotting.plot_cumulative_returns_by_quantile(mean_return_by_q, 1)
  4. alphalens.plotting.plot_cumulative_returns_by_quantile(mean_return_by_q, 5)
  5. alphalens.plotting.plot_cumulative_returns_by_quantile(mean_return_by_q, 10)
  6. plt.show()

output_11_0.png-104.7kB
output_11_1.png-79.7kB
output_11_2.png-71.9kB

什么是信息系数?

度量变量的预测值与实际值之间的相关关系。信息系数是用来评估金融分析师预测技能的一种表现方法。

系数在-1到1之间,越大表示正相关程度强。

这里,我们可以用信息系数来评价一个选股因子效力的好坏。也即,用因子值大小与下一期股票收益大小的相关程度来评估因子。

经验来看,通常|mean(IC)|>0.02可以作为判定一个选股因子有收益预测效力的标准

什么是spearman相关系数?

chart1.png-3.4kB
其中d为秩次差。

什么是秩次差?

A = [1,3,5,7,9]
B = [3,2,4,5,1]

A的排序是1,2,3,4,5

B的排序是3,2,4,5,1

d为排序相减

chart2.png-2.4kB

chart3.png-3.4kB

接下来我们要计算的因子IC值,即为当期因子值与下期股票收益值的spearman相关系数。

  1. ic = alphalens.performance.factor_information_coefficient(factor_data)
  2. ic.head()
1 5 10
date
2014-01-02 0.325391 0.367524 0.338448
2014-01-03 0.086336 0.231268 0.277532
2014-01-06 0.216727 0.159411 0.203103
2014-01-07 0.376174 0.151324 0.135532
2014-01-08 -0.041003 0.161179 0.002910

如何通过信息系数的可视化呈现进一步观测因子效果?

  1. alphalens.plotting.plot_ic_hist(ic) #因子IC分布
  2. alphalens.plotting.plot_ic_ts(ic) #因子IC时间序列曲线
  3. plt.show()

output_16_0.png-42kB

output_16_1.png-325kB

  1. # 月均IC热度图
  2. mean_monthly_ic = alphalens.performance.mean_information_coefficient(factor_data, by_time='M')
  3. alphalens.plotting.plot_monthly_ic_heatmap(mean_monthly_ic)#因子月均IC表现
  4. plt.show()

output_17_0.png-45.6kB

将Quantile1的选股结果保存成excel

  1. import numpy as np
  2. excel_data = factor_data[factor_data['factor_quantile']==1]["factor_quantile"].unstack().replace(np.nan, 0)
  3. excel_data.to_excel('./pb_quantile_1_by_alphalens.xlsx')
  4. excel_data.head()
asset 000001.SZ 000002.SZ 000024.SZ 000027.SZ 000039.SZ 000059.SZ 000063.SZ 000069.SZ 000157.SZ 000338.SZ ... 601898.SH 601901.SH 601918.SH 601939.SH 601985.SH 601988.SH 601991.SH 601992.SH 601997.SH 601998.SH
date
2014-01-02 1.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 1.0 0.0 ... 1.0 0.0 1.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0
2014-01-03 1.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 1.0 0.0 ... 1.0 0.0 1.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0
2014-01-06 1.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 1.0 0.0 ... 1.0 0.0 1.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0
2014-01-07 1.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 1.0 0.0 ... 1.0 0.0 1.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0
2014-01-08 1.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 1.0 0.0 ... 1.0 0.0 1.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0

5 rows × 192 columns

更多:因子在不同板块的选股能力比较分析

  1. sectors = dv.get_ts("sw1")
  2. sectors = change_index(sectors)
  3. sectors.head()
symbol 000001.SZ 000002.SZ 000008.SZ 000009.SZ 000012.SZ 000024.SZ 000027.SZ 000039.SZ 000046.SZ 000059.SZ ... 601992.SH 601997.SH 601998.SH 603000.SH 603160.SH 603288.SH 603699.SH 603858.SH 603885.SH 603993.SH
2014-01-02 480000 430000 460000 510000 610000 430000 410000 640000 430000 220000 ... 610000 480000 480000 720000 270000 340000 640000 370000 420000 240000
2014-01-03 480000 430000 460000 510000 610000 430000 410000 640000 430000 220000 ... 610000 480000 480000 720000 270000 340000 640000 370000 420000 240000
2014-01-06 480000 430000 460000 510000 610000 430000 410000 640000 430000 220000 ... 610000 480000 480000 720000 270000 340000 640000 370000 420000 240000
2014-01-07 480000 430000 460000 510000 610000 430000 410000 640000 430000 220000 ... 610000 480000 480000 720000 270000 340000 640000 370000 420000 240000
2014-01-08 480000 430000 460000 510000 610000 430000 410000 640000 430000 220000 ... 610000 480000 480000 720000 270000 340000 640000 370000 420000 240000

5 rows × 472 columns

  1. factor_data = alphalens.utils.get_clean_factor_and_forward_returns(factor,
  2. prices,
  3. groupby=sectors.stack(),
  4. quantiles=5,
  5. periods=(1, 5, 10))
  1. factor_data.head()
1 5 10 factor group factor_quantile
date asset
2014-01-02 000001.SZ -0.024530 -0.033524 -0.044154 1.0563 480000 1
000002.SZ -0.018773 -0.066333 -0.100125 1.2891 430000 2
000008.SZ -0.005556 -0.102222 -0.054444 4.8981 460000 4
000009.SZ -0.026596 -0.075532 -0.057447 3.5794 510000 4
000012.SZ -0.015951 -0.093252 -0.039264 2.3725 610000 3
  1. ic_by_sector = alphalens.performance.mean_information_coefficient(factor_data, by_group=True)
  2. ic_by_sector.head()
1 5 10
group
110000 -0.025498 -0.038887 -0.052598
210000 -0.054027 -0.094159 -0.118142
220000 -0.023304 -0.032256 -0.036310
230000 -0.053635 -0.073254 -0.094852
240000 -0.035358 -0.049735 -0.065802
  1. from matplotlib import pyplot as plt
  2. alphalens.plotting.plot_ic_by_group(ic_by_sector)
  3. plt.show()

output_25_0.png-28.6kB

添加新批注
在作者公开此批注前,只有你和作者可见。
回复批注