@Channelchan
2017-03-08T11:24:43.000000Z
字数 1735
阅读 26289
未分类
np.cov(X,Y)
np.corrcoef(X,Y)
np.cov(X,Y)[0,1]/(np.std(X)*np.std(Y))
计算000001、000005股票与深圳指数的相关性
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tushare as ts
asset1 = ts.get_k_data('000001', start='2016-01-01', end='2016-12-31', ktype='D',autype='qfq')
asset1.index = pd.to_datetime(asset1['date'], format='%Y-%m-%d')
asset1 = asset1['close']
asset2 = ts.get_k_data('000005', start='2016-01-01', end='2016-12-31', ktype='D',autype='qfq')
asset2.index = pd.to_datetime(asset2['date'], format='%Y-%m-%d')
asset2 = asset2['close']
benchmark = ts.get_hist_data('sh', start='2016-01-01', end='2016-12-31', ktype='D')[::-1]
benchmark = benchmark['close']
new = pd.concat([asset1, asset2, benchmark],join='inner', axis=1)
new.columns = ['asset1', 'asset2', 'benchmark']
print "Correlation coefficients"
print "000001 and benchmark: ", np.corrcoef(new['asset1'],new['benchmark'])[0,1]
print "000005 and benchmark: ", np.corrcoef(new['asset2'],new['benchmark'])[0,1]
print "000001 and 000005: ", np.corrcoef(new['asset1'],new['asset2'])[0,1]
print "000001 and 000005: ", np.cov(new['asset1'],new['asset2'])[0,1]/(np.std(new['asset1'])*np.std(new['asset2']))
Correlation coefficients
000001 and benchmark: 0.904350480115
000005 and benchmark: 0.329516731028
# 由于degree of freedom 结果会有不同
000001 and 000005: 0.138377116304
000001 and 000005: 0.138946569458
高相关性图表
plt.scatter(new['asset1'], new['benchmark'])
plt.show()
由于相关性会随着时间的变化而变化,目前计算出来的相关性不代表未来,因此我们需要通过调整不同周期来计算动态的相关性系数,并且计算相关系数的分布情况,以便对未来做区间估计。
周期为60天的动态相关性计算
rolling_correlation = new['asset1'].rolling(window=60).corr(new['benchmark'])
plt.subplot(2,1,1)
plt.plot(rolling_correlation)
plt.xlabel('Day')
plt.ylabel('60day Rolling Correlation')
plt.subplot(2,1,2)
plt.hist(rolling_correlation.dropna())
plt.show()
Determining related Strategies