@Channelchan 2017-05-06T01:56:30.000000Z 字数 2501 阅读 17951

Python 金融数据处理

Panel

三维数据处理

Panel

299 (items) x 488 (major_axis) x 5 (minor_axis)

数据格式转换

Dict/Series/DataFrame

#因为TA_Lib只能算np.array的格式，因此我们需要做遍历
mom20 = pd.DataFrame({name: ta.ROC(item.values, 20) for name, item in prices.iteritems()}, index=prices.index)

多索引

stack()

mom20 = mom20.stack()

迭代器

iterrows/iteritems

降维计算ATR

def ATR()
def ADX()

def panel_2_frame(panel, function, *args, **kwargs):
    if isinstance(panel, pd.Panel):
        return pd.DataFrame(
            {name: function(frame, *args, **kwargs) for name, frame in panel.iteritems()}
        )
    else:
        raise(TypeError("type of panel should be pandas.Panel"))
def atr(pn, period=10):
    if pn.isnull().values.any():
        pn.fillna(method='ffill',inplace=True)
    return panel_2_frame(pn, ta.abstract.ATR, period)
def adx(pn, period=10):
    if pn.isnull().values.any():
        pn.fillna(method='ffill',inplace=True)
    return panel_2_frame(pn, ta.abstract.ADX, period)

升维计算MACD

zip

pd.Panel({item: DataFrame})

columns = ['macd', 'macdsignal', 'macdhist']
print zip(columns, ta.MACD(prices.iloc[:, 0].values))

def MACD(series):
    return pd.DataFrame(dict(zip(columns, ta.MACD(series.values))), index=series.index)
panel = pd.Panel.from_dict({name: MACD(item) for name, item in prices.iteritems()})
print panel

计算理论最大收益

map(function, sequence, *sequence_1)

df_returns = pn.minor_xs('close').pct_change()[1:]
df_returns.fillna(value=0, inplace=True)
df_returns['q_20'] = df_returns.quantile(0.2, axis=1)
df_returns['q_80'] = df_returns.quantile(0.8, axis=1)
def qt20_mean(row):
    series = row[1]
    quant = series.pop('q_20')
    return series[series.values<quant].mean()
def qt80_mean(row):
    series = row[1]
    quant = series.pop('q_80')
    return series[series.values>quant].mean()
df_returns['qt20_mean'] = map(qt20_mean, df_returns.iterrows())
df_returns['qt80_mean'] = map(qt80_mean, df_returns.iterrows())
print(df_returns.qt80_mean-df_returns.qt20_mean)

条件处理

Factors: RSI(10)<40
Filter: mom2>0

mom2 = pd.DataFrame({name: ta.ROC(item.values, timeperiod=2) for name, item in prices.iteritems()}, index=prices.index)
RSI = pd.DataFrame({name: ta.RSI(item.values, timeperiod=10) for name, item in prices.iteritems()}, index=prices.index)
RSI = RSI[RSI<40]
RSI_F = RSI[mom2>0]
factor = RSI_F.stack()
factor = factor.reset_index()
factor.columns = ["datetime", "codes", "factor"]
factor["factor"] = 1
factor = factor.set_index(["datetime", "codes"])
print(factor)

import alphalens
import matplotlib.pyplot as plt
factor_data = alphalens.utils.get_clean_factor_and_forward_returns(factor, prices)
mean_return_by_q, std_err_by_q = alphalens.performance.mean_return_by_quantile(factor_data, by_date=True, demeaned=False)
# print mean_return_by_q
alphalens.plotting.plot_cumulative_returns_by_quantile(mean_return_by_q, 5)
plt.show()

Python 金融数据处理

三维数据处理

数据格式转换

多索引

迭代器

降维计算ATR

升维计算MACD

计算理论最大收益

条件处理

计算因子最大收益

Groupby

Resample

Python 金融数据处理

三维数据处理

数据格式转换

多索引

迭代器

降维计算ATR

升维计算MACD

计算理论最大收益

条件处理

计算因子最大收益

Groupby

Resample

内容目录