WorldQuant 101alpha因子构建及因子测试

标注表达式
alpha101
bigexpr
标签: #<Tag:0x00007f5bffeec890> #<Tag:0x00007f5bffeec728> #<Tag:0x00007f5bffeec480>

(iQuant) #1
作者:bigquant
阅读时间:5分钟
本文由BigQuant宽客学院推出,难度标签:☆☆☆

导语:本文目的是介绍如何使用bigexpr表达式对WorldQuant公开的101个alpha进行因子构建,并进行因子测试。

背景介绍

根据WorldQuant发表的论文《101 Formulaic Alphas 》 ,其中公式化地给出了101个alpha因子。与传统方法不一样的是,他们根据数据挖掘的方法构建了101个alpha,据说里面80%的因子仍然还行之有效并被运用在实盘项目中。

在BigQuant策略研究平台上,可通过表达式快速进行因子构建和数据标注,再也不需要自己手动编写冗长代码。

表达式简介

因为在机器学习和深度学习中,因子是一个很重要的概念,也被称为特征,开发AI算法的关键在于特征选择。如果是简单的基础因子,比如近5日收益率:$close\_5/close\_0-1$,因子构建比较简单,但是如果想构建近5日每日收益率和成交量的相关性这个因子就比较棘手,需要编写大量的代码来计算该因子。因此,我们设计了bigexpr表达式引擎

bigexpr是BigQuant开发的表达式计算引擎,通过编写简单的表达式,就可以对数据做任何运算,而无需编写代码。

bigexpr在平台上被广泛使用,M.advanced_auto_labeler 和 M.derived_feature_extractor 都已经由bigexpr驱动,您可以用表达式就可以定义标注目标和完成后特征抽取。

正如刚刚提到的近5日每日收益率和成交量的相关性因子可以这样定义:

$$correlation(close\_0/shift(close\_0,1)-1,volume\_0,5)$$
其中,$correlation$表示求相关系数,$close\_0$表示当天收盘价,$shift(close\_0,1)$表示前一日收盘价,$volume\_0$表示当天成交量。因此,可以看出,并不需要编写大量代码计算该因子,通过表达式即可快速构建。

函数说明

表达式引擎中有不少简单函数,对其中的部分函数进行解释:

  • 可分为横截面函数和时间序列函数两大类,其中时间序列函数名多为以$ts\_$开头
  • 大部分函数命名方式较为直观
  • $abs(x)$ 、$log(x)$分别表示$x$的绝对值和$x$的自然对数
  • $rank(x)$表示某股票$x$值在横截面上的升序排名序号,并将排名归一到[0,1]的闭区间
  • $delay(x,d)$表示$x$值在$d$天前的值
  • $delta(x,d)$表示$x$值的最新值减去$x$值在$d$天前的值
  • $correlation(x,y,d)$、$covariance(x,y,d)$分别表示$x$和$y$在长度为$d$的时间窗口上的皮尔逊相关系数和协方差
  • $ts\_min(x,d)$、$ts\_max(x,d)$、$ts\_argmax(x,d)$、$ts\_argmin(x,d)$、$ts\_rank(x)$、$sum(x,d)$、$stddev(x,d)$等均可以通过函数名称了解其作用

因子说明

BigQuant平台上系统因子超过2000个,包括了基本信息因子、量价因子、估值因子、财报因子、技术指标因子等。本文简单举若干因子进行介绍。

基本信息因子

点击查看部分因子
  • list_days # 上市天数
  • list_board_0 # 上市板
  • company_found_date_0 # 公司成立天数
  • industry_sw_level1_0 # 申万一级行业类别
  • st_status_0 # ST状态
  • in_sse50_0 # 是否属于上证50指数成分
  • in_csi300_0 # 是否属于沪深300指数成分

量价因子

点击查看部分因子
  • open_0 # 当日开盘价
  • open_1 # 前一日开盘价
  • close_0 # 当日收盘价
  • high_0 # 当日最高价
  • low_0 # 当日最低价
  • volume_0 # 当日成交量
  • amount_0 # 当日成交额
  • adjust_factor_0 # 复权因子

估值因子

点击查看部分因子
  • market_cap_0 # 总市值
  • rank_market_cap_0 # 总市值排序
  • pe_ttm_0 # 市盈率(TTM)
  • rank_pe_ttm_0 # 市盈率(TTM)升序百分比排名
  • pe_lyr_0 # 市盈率(LYR)
  • pb_lf_0 # 市净率(LF)
  • ps_ttm_0 # 市销率(TTM)

财报因子

点击查看部分因子
  • fs_net_profit_0 # 归属母公司股东的净利润
  • fs_net_profit_yoy_0 # 归属母公司股东的净利润同比增长率
  • fs_net_profit_qoq_0 # 归属母公司股东的净利润环比增长率
  • fs_roe_0 # 净资产收益率
  • fs_roa_0 # 总资产收益率
  • fs_gross_profit_margin_0 # 销售毛利率
  • fs_net_profit_margin_0 # 销售净利率
  • fs_eps_0 # 每股收益
  • fs_bps_0 # 每股净资产
  • fs_cash_ratio_0 # 现金比率

数据标注

和因子构建一样,数据标注也是机器学习算法中非常重要的一部分,更详细的文档为:自定义标注

之前没有表达式的时候,数据标注主要通过fast_auto_label实现,自从有了表达式以后,数据标注主要是通过advanced_auto_label实现。数据标注的整体思想和内容主要体现在label_expr上,label_expr是一个列表(list)。
具体实例代码,请点击下方 点击查看代码

点击查看代码
conf.label_expr = [
    # 计算相对收益
    'shift(close, -5) / shift(open, -1) - shift(benchmark_close, -5) / shift(benchmark_open, -1)',
    # 极值处理:用1%和99%分位的值做clip
    'clip(label, all_quantile(label, 0.01), all_quantile(label, 0.99))',
    # 将分数映射到分类,这里使用20个分类
    'all_wbins(label, 20)',
    # 过滤掉一字涨停的情况 (设置label为NaN,在后续处理和训练中会忽略NaN的label)
    'where(shift(high, -1) == shift(low, -1), NaN, label)'
]

m1 = M.advanced_auto_labeler.v1(
    instruments=conf.instruments, start_date=conf.start_date, end_date=conf.split_date,
    label_expr=conf.label_expr, benchmark='000300.SHA')

接下来,我们对示例代码做解释:

  • label_expr为一个list,列表里四个元素决定了标注的具体操作,详细文档见:表达式引擎
  • 计算未来一段时间的相对收益作为标注的原始依据,这里可以使用bigexpr表达式,快速完成数据标注
  • 使用clip和all_quantile函数做极值处理
  • 将原始数据离散化,这里可以采取等宽离散化或者等频离散化,两者各有优劣
  • 通过where函数过滤掉一字涨停的样本数据

单因子测试

这里我们以’shift(close_0,15) / close_0’因子为例,介绍如何进行单因子测试,开发基于单因子的AI策略。

 (补充:如果不想编写代码,建议参考三楼 Yoga 会飞的鱼的回答)
点击查看代码
## 基础配置
class conf:
    start_date = '2014-01-01'
    end_date='2017-07-17'
    split_date = '2015-01-01'
    instruments = D.instruments(start_date, end_date)
    hold_days = 5
    features = ['shift(close_0,15) / close_0']
    # 数据标注标注
    label_expr = [
    # 计算未来一段时间(hold_days)的相对收益
    'shift(close, -5) / shift(open, -1) - shift(benchmark_close, -5) / shift(benchmark_open, -1)',
    # 极值处理:用1%和99%分位的值做clip
    'clip(label, all_quantile(label, 0.01), all_quantile(label, 0.99))',
    # 将分数映射到分类,这里使用20个分类,这里采取等宽离散化
    'all_wbins(label, 20)',
    # 过滤掉一字涨停的情况 (设置label为NaN,在后续处理和训练中会忽略NaN的label)
    'where(shift(high, -1) == shift(low, -1), NaN, label)'
    ]

## 量化回测 https://bigquant.com/docs/#/develop?id=%E5%9B%9E%E6%B5%8B%E6%9C%BA%E5%88%B6
# 回测引擎:准备数据,只执行一次
def prepare(context):
    # context.start_date / end_date,回测的时候,为trader传入参数;在实盘运行的时候,由系统替换为实盘日期
    instruments = D.instruments()
    ## 在样本外数据上进行预测
    n0 = M.general_feature_extractor.v5(
        instruments=D.instruments(),
        start_date=context.start_date, end_date=context.end_date,
        features=conf.features)
    n1 = M.derived_feature_extractor.v1(
        data=n0.data,
        features= conf.features)
    n2 = M.transform.v2(data=n1.data, transforms=None, drop_null=True)
    n3 = M.stock_ranker_predict.v5(model=context.options['model'], data=n2.data)
    context.instruments = n3.instruments
    context.options['predictions'] = n3.predictions

# 回测引擎:初始化函数,只执行一次
def initialize(context):
    # 加载预测数据
    context.ranker_prediction = context.options['predictions'].read_df()
    # 系统已经设置了默认的交易手续费和滑点,要修改手续费可使用如下函数
    context.set_commission(PerOrder(buy_cost=0.0003, sell_cost=0.0013, min_cost=5))
    # 预测数据,通过options传入进来,使用 read_df 函数,加载到内存 (DataFrame)
    # 设置买入的股票数量,这里买入预测股票列表排名靠前的5只
    stock_count = 3
    # 每只的股票的权重,如下的权重分配会使得靠前的股票分配多一点的资金,[0.339160, 0.213986, 0.169580, ..]
    context.stock_weights = T.norm([1 / math.log(i + 2) for i in range(0, stock_count)])
    # 设置每只股票占用的最大资金比例
    context.max_cash_per_instrument = 0.2

# 回测引擎:每日数据处理函数,每天执行一次
def handle_data(context, data):
    # 按日期过滤得到今日的预测数据
    ranker_prediction = context.ranker_prediction[
        context.ranker_prediction.date == data.current_dt.strftime('%Y-%m-%d')]
    # 1. 资金分配
    # 平均持仓时间是hold_days,每日都将买入股票,每日预期使用 1/hold_days 的资金
    # 实际操作中,会存在一定的买入误差,所以在前hold_days天,等量使用资金;之后,尽量使用剩余资金(这里设置最多用等量的1.5倍)
    is_staging = context.trading_day_index < context.options['hold_days'] # 是否在建仓期间(前 hold_days 天)
    cash_avg = context.portfolio.portfolio_value / context.options['hold_days']
    cash_for_buy = min(context.portfolio.cash, (1 if is_staging else 1.5) * cash_avg)
    cash_for_sell = cash_avg - (context.portfolio.cash - cash_for_buy)
    positions = {e.symbol: p.amount * p.last_sale_price
                 for e, p in context.perf_tracker.position_tracker.positions.items()}
    # 2. 生成卖出订单:hold_days天之后才开始卖出;对持仓的股票,按StockRanker预测的排序末位淘汰
    if not is_staging and cash_for_sell > 0:
        equities = {e.symbol: e for e, p in context.perf_tracker.position_tracker.positions.items()}
        instruments = list(reversed(list(ranker_prediction.instrument[ranker_prediction.instrument.apply(
                lambda x: x in equities and not context.has_unfinished_sell_order(equities[x]))])))
        # print('rank order for sell %s' % instruments)
        for instrument in instruments:
            context.order_target(context.symbol(instrument), 0)
            cash_for_sell -= positions[instrument]
            if cash_for_sell <= 0:
                break
    # 3. 生成买入订单:按StockRanker预测的排序,买入前面的stock_count只股票
    buy_cash_weights = context.stock_weights
    buy_instruments = list(ranker_prediction.instrument[:len(buy_cash_weights)])
    max_cash_per_instrument = context.portfolio.portfolio_value * context.max_cash_per_instrument
    for i, instrument in enumerate(buy_instruments):
        cash = cash_for_buy * buy_cash_weights[i]
        if cash > max_cash_per_instrument - positions.get(instrument, 0):
            # 确保股票持仓量不会超过每次股票最大的占用资金量
            cash = max_cash_per_instrument - positions.get(instrument, 0)
        if cash > 0:
            price = data.current(context.symbol(instrument), 'price')
            lots = int(cash/price/100)
            context.order_lots(context.symbol(instrument), lots)

            
## 通过训练集数据训练模型            
# 数据标注
m1 = M.advanced_auto_labeler.v1(
                               instruments=conf.instruments, start_date=conf.start_date, end_date=conf.split_date,
                               label_expr=conf.label_expr, benchmark='000300.SHA', cast_label_int=True)                     
# 抽取基础特征           
m2_1 = M.general_feature_extractor.v5(
        instruments=D.instruments(),
        start_date=conf.start_date, end_date=conf.split_date,
        features=conf.features)

# 抽取衍生特征 
m2_2 = M.derived_feature_extractor.v1(
        data=m2_1.data,
        features= conf.features)

# 特征转换
m3 = M.transform.v2(data=m2_2.data, transforms=None, drop_null=True)

# 合并标注和特征数据
m4 = M.join.v2(data1=m1.data, data2=m3.data, on=['date', 'instrument'], sort=False)

# 开始训练模型
m5 = M.stock_ranker_train.v4(training_ds=m4.data, features=conf.features)

## 测试集上进行回测
m6 = M.trade.v3(
    instruments=None,
    start_date=conf.split_date,
    end_date=conf.end_date,
    prepare=prepare,
    initialize=initialize,
    handle_data=handle_data,
    order_price_field_buy='open',       
    order_price_field_sell='close',      
    capital_base=50001,               
    benchmark='000300.SHA',             
    options={'hold_days': conf.hold_days, 'model': m5.model_id},
    m_deps=np.random.rand()
)

101 Alphas列表

点击查看完整列表
Alpha_1   	'where(mean(amount_0,20)<volume_0,((-1*ts_rank(abs(delta(close_0,7)),60))*sign(delta(close_0,7))),-1)'

Alpha_2   'rank(ts_argmax(signedpower(where(close_0/shift(close_0,1)-1<0,std(close_0/shift(close_0,1)-1<0,20),close_0),2),5))-0.5'

Alpha_3     '-1*correlation(rank(delta(log(volume_0),2)),rank(((close_0-open_0)/open_0)),6)'

Alpha_4	    '-1*correlation(rank(open_0),rank(volume_0),10)'   

Alpha_5	    '-1*ts_rank(rank(low_0),9)'

Alpha_6	    'rank((open_0-(sum(amount_0/volume_0*adjust_factor_0,10)/10)))*(-1*abs(rank((close_0-amount_0/volume_0*adjust_factor_0))))'

Alpha_7	    '-1*correlation(open_0,volume_0,10)' 

Alpha_8	    'where(mean(amount_0,20)<volume_0,((-1*ts_rank(abs(delta(close_0,7)),60))*sign(delta(close_0,7))),-1)'  

Alpha_9	    '(-1*rank(((sum(open_0,5)*sum(close_0/shift(close_0,1)-1,5))-delay((sum(open_0,5)*sum(close_0/shift(close_0,1)-1,5)),10))))'

Alpha_10	'where(0<ts_min(delta(close_0,1),5),delta(close_0,1),where(ts_max(delta(close_0,1),5)<0,delta(close_0,1),-1*delta(close_0,1)))'

Alpha_11	'rank(where(0<ts_min(delta(close_0,1),4),delta(close_0,1),where(ts_max(delta(close_0,1),4)<0,delta(close_0,1),-1*delta(close_0,1))))'

Alpha_12	'(rank(ts_max((amount_0/volume_0*adjust_factor_0-close_0),3))+rank(ts_min((amount_0/volume_0*adjust_factor_0-close_0),3)))*rank(delta(volume_0,3))'

Alpha_13	'sign(delta(volume_0,1))*(-1*delta(close_0,1))'

Alpha_14	'-1*rank(covariance(rank(close_0),rank(volume_0),5))'

Alpha_15	'(-1*rank(delta(close_0/shift(close_0,1)-1,3)))*correlation(open_0,volume_0,10)'

Alpha_16	'-1*sum(rank(correlation(rank(high_0),rank(volume_0),3)),3)'

Alpha_17	'-1*rank(covariance(rank(high_0),rank(volume_0),5))'

Alpha_18	'((-1*rank(ts_rank(close_0,10)))*rank(delta(delta(close_0,1),1)))*rank(ts_rank((volume_0/mean(amount_0,20)),5))'

Alpha_19	'-1*rank(((std(abs((close_0-open_0)),5)+(close_0-open_0))+correlation(close_0,open_0,10)))'

Alpha_20	'(-1*sign(((close_0-delay(close_0,7))+delta(close_0,7))))*(1+rank((1+sum(close_0/shift(close_0,1)-1,250))))'

Alpha_21	'((-1*rank((open_0-delay(high_0,1))))*rank((open_0-delay(close_0,1))))*rank((open_0-delay(low_0,1)))'

Alpha_22	'where(sum(close_0,8)/8+stddev(close_0,8)<sum(close_0,2)/2,-1,where(mean(close_0,2)<mean(close_0,8)-std(close_0,8),1,where((1<volume_0/mean(amount_0,20)) |(volume_0/mean(amount_0,20)==1),1,-1)))'

Alpha_23	'-1*(delta(correlation(high_0,volume_0,5),5)*rank(std(close_0,20)))'

Alpha_24	'where(sum(high_0,20)/20<high_0,-1*delta(high_2,0),0)'

Alpha_25	'where((delta(mean(close_0,100),100)/delay(close_0,100)<0.05)  |(delta(mean(close_0,100),100)/delay(close_0,100)==0.05) ,-1*(close_0-ts_min(close_0,100)),-1*delta(close_0,2))'

Alpha_26	'rank(-1*(close_0/shift(close_0,1)-1)*mean(amount_0,20)*amount_0/volume_0*adjust_factor_0*(high_0-close_0))'

Alpha_27	'-1*ts_max(correlation(ts_rank(volume_0,5),ts_rank(high_0,5),5),3)'

Alpha_28	'where(0.5<rank((sum(correlation(rank(volume_0),rank(amount_0/volume_0*adjust_factor_0),6),2)/2.0)),-1,1)'

Alpha_29	'scale(correlation(mean(amount_0,20),low_0,5)+(high_0+low_0)*0.5-close_0)'   

Alpha_30    'min(product(rank(rank(scale(log(sum(ts_min(rank(rank((-1*rank(delta((close_0-1),5))))),2),1))))),1),5)+ts_rank(delay((-1*shift(close_0,1)/close_0-1),6),5)'

Alpha_31	'((1.0-rank(((sign((close_0-delay(close_0,1)))+sign((delay(close_0,1)-delay(close_0,2)))) +sign((delay(close_0,2)-delay(close_0,3))))))*sum(volume_0,5))/sum(volume_0,20)'

Alpha_32	'(rank(rank(rank(decay_linear((-1*rank(rank(delta(close_0,10)))),10))))+rank((-1*delta(close_0,3))))+sign(scale(correlation(mean(amount_0,20),low_0,12)))'

Alpha_33	'scale(((sum(close_0,7)/7)-close_0))+20*scale(correlation(amount_0/volume_0*adjust_factor_0,delay(close_0,5),230))'

Alpha_34	'rank((-1*((1-(open_0/close_0)))))'

Alpha_35	'rank(((1-rank((std(close_0/shift(close_0,1),2)/stddev(close_0/shift(close_0,1)-1,5))))+(1-rank(delta(close_0,1)))))' 

Alpha_36	'ts_rank(volume_0,32)*(1-ts_rank(((close_0+high_0)-low_0),16))*(1-ts_rank(close_0/shift(close_0,1)-1,32))'

Alpha_37	'((((2.21*rank(correlation((close_0-open_0),delay(volume_0,1),15)))+(0.7*rank((open_0-close_0))))+(0.73*rank(ts_rank(delay((-1*close_0/shift(close_0,1)-1),6),5))))+rank(abs(correlation(amount_0/volume_0*adjust_factor_0,mean(amount_0,20),6))))+(0.6*rank((((sum(close_0,200)/200)-open_0)*(close_0-open_0))))' 

Alpha_38	'rank(correlation(delay((open_0-close_0),1),close_0,200))+rank((open_0-close_0))'

Alpha_39	'(-1*rank(ts_rank(close_0,10)))*rank((close_0/open_0))'

Alpha_40	'((-1*rank((delta(close_0,7)*(1-rank(decay_linear((volume_0/mean(amount_0,20)),9))))))*(1 +rank(sum(close_0/shift(close_0,1),250))))'

Alpha_41	'((-1*rank(std(high_0,10)))*correlation(high_0,volume_0,10))'

Alpha_42	'(((high_0*low_0)**0.5)-amount_0/volume_0*adjust_factor_0)'

Alpha_43	'(rank((amount_0/volume_0*adjust_factor_0-close_0))/rank((amount_0/volume_0*adjust_factor_0+close_0)))'

Alpha_44	'(ts_rank((volume_0/mean(amount_0,20)),20)*ts_rank((-1*delta(close_0,7)),8))'

Alpha_45	'(-1*correlation(high_0,rank(volume_0),5))'

Alpha_46	'(-1*((rank((sum(delay(close_0,5),20)/20))*correlation(close_0,volume_0,2))*rank(correlation(sum(close_0,5),sum(close_0,20),2))))',

Alpha_47	'where((0.25<(((delay(close_0,20)-delay(close_0,10))/10)-((delay(close_0,10)-close_0)/10))),-1,where(((((delay(close_0,20)-delay(close_0,10))/10)-((delay(close_0,10)-close_0)/10))<0),1,((-1*1)*(close_0-delay(close_0,1)))))'

Alpha_48	'(((rank((1/close_0))*volume_0)/mean(amount_0,20))*((high_0*rank((high_0-close_0)))/(sum(high_0,5) /5)))-rank((amount_0/volume_0*adjust_factor_0-delay(amount_0/volume_0*adjust_factor_0,5)))' 

Alpha_49	'((correlation(delta(close_0,1),delta(delay(close_0,1),1),250)*delta(close_0,1))/close_0)/group_mean(industry_sw_level1_0,((correlation(delta(close_0,1),delta(delay(close_0,1),1),250)*delta(close_0,1))/close_0))/sum(((delta(close_0,1)/delay(close_0,1))**2),250)'    

Alpha_50	'where(((((delay(close_0,20)-delay(close_0,10))/10)-((delay(close_0,10)-close_0)/10))<(-1*0.1)),1,(close_0-delay(close_0,1))*(-1))   '

Alpha_51	'(-1*ts_max(rank(correlation(rank(volume_0),rank(amount_0/volume_0*adjust_factor_0),5)),5))'

Alpha_52	'where((((delay(close_0,20)-delay(close_0,10))/10)-((delay(close_0,10)-close_0)/10))<(-1*0.05),1,-1*(close_0-delay(close_0,1)))'

Alpha_53	'(((-1*ts_min(low_0,5))+delay(ts_min(low_0,5),5))*rank(((sum(close_0/shift(close_0,1),240)-sum(close_0/shift(close_0,1),20))/220)))*ts_rank(volume_0,5)'

Alpha_54	'(-1*delta((((close_0-low_0)-(high_0-close_0))/(close_0-low_0)),9))'

Alpha_55	'((-1*((low_0-close_0)*(open_0**5)))/((low_0-high_0)*(close_0** 5)))' 

Alpha_56	'-1*correlation(rank(((close_0-ts_min(low_0,12))/(ts_max(high_0,12)-ts_min(low_0,12)))),rank(volume_0),6)'

Alpha_57	'0-1*(1*(rank((sum(close_0/shift(close_0,1)-1,10)/sum(sum(close_0/shift(close_0,1)-1,2),3)))*rank(((close_0/shift(close_0,1)-1)*market_cap_0))))' 

Alpha_58	'(0-(1*((close_0-amount_0/volume_0*adjust_factor_0)/decay_linear(rank(ts_argmax(close_0,30)),2))))' 

Alpha_59	'(-1*ts_rank(decay_linear(correlation( amount_0/volume_0*adjust_factor_0/group_mean(industry_sw_level1_0,amount_0/volume_0*adjust_factor_0),volume_0,4),8),5))'

Alpha_60	'(0-(1*((2*scale(rank(((((close_0-low_0)-(high_0-close_0))/(high_0-low_0))*volume_0))))-scale(rank(ts_argmax(close_0,10))))))'

Alpha_61	'(rank((amount_0/volume_0*adjust_factor_0-ts_min(amount_0/volume_0*adjust_factor_0,16)))<rank(correlation(amount_0/volume_0*adjust_factor_0,mean(amount_0,180),18)))'

Alpha_62	'(rank(correlation(amount_0/volume_0*adjust_factor_0,sum(mean(amount_0,20),22),10))<rank(((rank(open_0)+rank(open_0))<(rank(((high_0+low_0)/2))+rank(high_0)))))*-1'

Alpha_63	'((rank(decay_linear(delta(close_0/group_mean(industry_sw_level1_0,close_0),2),8))-rank(decay_linear(correlation(((amount_0/volume_0*adjust_factor_0*0.318108)+(open_0*(1-0.318108))),sum(mean(amount_0,180),37),14),12)))*-1)'

Alpha_64	'((rank(correlation(sum(((open_0*0.178404)+(low_0*(1-0.178404))),13),sum(mean(amount_0,20),13),17))<rank(delta(((((high_0+low_0)/2)*0.178404)+(amount_0/volume_0*adjust_factor_0*(1-0.178404))),4)))*-1)'

Alpha_65	'((rank(correlation(((open_0*0.00817205)+(amount_0/volume_0*adjust_factor_0*(1-0.00817205))),sum(mean(amount_0,60),9),6))<rank((open_0-ts_min(open_0,14))))*-1)'

Alpha_66	'((rank(decay_linear(delta(amount_0/volume_0*adjust_factor_0,4),7))+ts_rank(decay_linear(((((low_0* 0.96633)+(low_0*(1-0.96633)))-amount_0/volume_0*adjust_factor_0)/(open_0-((high_0+low_0)/2))),11),7))*-1)'

Alpha_67	'((rank((high_0-ts_min(high_0,2)))**rank(correlation( amount_0/volume_0*adjust_factor_0 /group_mean(industry_sw_level1_0,amount_0/volume_0*adjust_factor_0),mean(amount_0,20)/group_mean(industry_sw_level1_0,mean(amount_0,20)),6)))*-1)'

Alpha_68	'((ts_rank(correlation(rank(high_0),rank(mean(amount_0,15)),9),14)<rank(delta(((close_0*0.518371)+(low_0*(1-0.518371))),1.06157)))*-1)'

Alpha_69	'((rank(ts_max(delta(amount_0/volume_0*adjust_factor_0/group_mean(industry_sw_level1_0,amount_0/volume_0*adjust_factor_0),3),5))**ts_rank(correlation(((close_0*0.490655)+(amount_0/volume_0*adjust_factor_0*(1-0.490655))),mean(amount_0,20),5),9))*-1)'

Alpha_70	'((rank(delta(amount_0/volume_0*adjust_factor_0,1))**ts_rank(correlation(  close_0/group_mean(industry_sw_level1_0,close_0),mean(amount_0,50),18),18))*-1)'

Alpha_71	'max(ts_rank(decay_linear(correlation(ts_rank(close_0,3),ts_rank(mean(amount_0,180),12),18),4),16),ts_rank(decay_linear((rank(((low_0+open_0)-(amount_0/volume_0*adjust_factor_0 +amount_0/volume_0*adjust_factor_0)))**2),16 ),4))'

Alpha_72	'(rank(decay_linear(correlation(((high_0+low_0)/2),mean(amount_0,40),9),10)) /rank(decay_linear(correlation(ts_rank(amount_0/volume_0*adjust_factor_0,4),ts_rank(volume_0,19),7),3)))'

Alpha_73	'(max(rank(decay_linear(delta(amount_0/volume_0*adjust_factor_0,5),3)),ts_rank(decay_linear(((delta(((open_0* 0.147155)+(low_0*(1-0.147155))),2 ) /((open_0* 0.147155)+(low_0*(1-0.147155))))*-1),3),17))*-1)'      

Alpha_74	'(rank(correlation(close_0,sum(mean(amount_0,30),37),15))<rank(correlation(rank(high_0*0.0261661+amount_0/volume_0*adjust_factor_0*(1-0.0261661)),rank(volume_0),11)))*-1'

Alpha_75	'rank(correlation(amount_0/volume_0*adjust_factor_0,volume_0,4 ))<rank(correlation(rank(low_0),rank(mean(amount_0,50)),12))'

Alpha_76	'max(rank(decay_linear(delta(amount_0/volume_0*adjust_factor_0,1),12)),ts_rank(decay_linear(ts_rank(correlation( low_0/group_mean(industry_sw_level1_0,low_0),mean(amount_0,81),8 ),20),17),19))*-1'

Alpha_77	'min(rank(decay_linear(((((high_0+low_0)/2)+high_0)-(amount_0/volume_0*adjust_factor_0+high_0)),20 )),rank(decay_linear(correlation(((high_0+low_0)/2),mean(amount_0,40),3),6)))'

Alpha_78	'rank(correlation(sum(((low_0*0.352233)+(amount_0/volume_0*adjust_factor_0*(1-0.352233))),20),sum(mean(amount_0,20),20),7))**rank(correlation(rank(amount_0/volume_0*adjust_factor_0),rank(volume_0),6))'

Alpha_79	'rank(delta((close_0*0.60733+open_0*(1-0.60733))/ group_mean(industry_sw_level1_0,(close_0*0.60733+open_0*(1-0.60733))),1))<rank(correlation(ts_rank(amount_0/volume_0*adjust_factor_0,4),ts_rank(mean(amount_0,150),9),115))'

Alpha_80	'(rank(sign(delta((open_0*0.868128+high_0*(1-0.868128))/group_mean(industry_sw_level1_0,(open_0*0.868128+high_0*(1-0.868128))),4)))**ts_rank(correlation(high_0,mean(amount_0,10),5),6))*-1'

Alpha_81	'(rank(log(product(rank((rank(correlation(amount_0/volume_0*adjust_factor_0,sum(mean(amount_0,10),50),8))**4)),15)))<rank(correlation(rank(amount_0/volume_0*adjust_factor_0),rank(volume_0),5)))*-1'

Alpha_82	'min(rank(decay_linear(delta(open_0,1.46063),15)),ts_rank(decay_linear(correlation( volume_0/group_mean(industry_sw_level1_0,volume_0),((open_0*0.634196) +(open_0*(1-0.634196))),17),7),13))*-1'

Alpha_83	'(rank(delay(((high_0-low_0)/(sum(close_0,5)/5)),2))*rank(rank(volume_0)))/(((high_0-low_0)/(sum(close_0,5)/5))/(amount_0/volume_0*adjust_factor_0-close_0))'

Alpha_84	'signedpower(ts_rank((amount_0/volume_0*adjust_factor_0-ts_max(amount_0/volume_0*adjust_factor_0,15)),20),delta(close_0,5))'

Alpha_85	'rank(correlation(((high_0*0.876703)+(close_0*(1-0.876703))),mean(amount_0,30),10))**rank(correlation(ts_rank(((high_0+low_0)/2),4),ts_rank(volume_0,10),7))'

Alpha_86	'(ts_rank(correlation(close_0,sum(mean(amount_0,20),15),6),20)<rank(((open_0+close_0)-(amount_0/volume_0*adjust_factor_0+open_0))))*-1'

Alpha_87	'max(rank(decay_linear(delta(((close_0*0.369701)+(amount_0/volume_0*adjust_factor_0*(1-0.369701))),2),3)),ts_rank(decay_linear(abs(correlation( mean(amount_0,81) /group_mean(industry_sw_level1_0,mean(amount_0,81)) ,close_0,14)),5),14))*-1'

Alpha_88	'min(rank(decay_linear(((rank(open_0)+rank(low_0))-(rank(high_0)+rank(close_0))),8)),ts_rank(decay_linear(correlation(ts_rank(close_0,8),ts_rank(mean(amount_0,60),21),8),7),3))'

Alpha_89	'ts_rank(decay_linear(correlation(((low_0*0.967285)+(low_0*(1-0.967285))),mean(amount_0,10),7),6),4)-ts_rank(decay_linear(delta( amount_0/volume_0*adjust_factor_0/group_mean(industry_sw_level1_0,amount_0/volume_0*adjust_factor_0),3),10),15)'

Alpha_90	'(rank((close_0-ts_max(close_0,5)))**ts_rank(correlation(mean(amount_0,40)/group_mean(industry_sw_level1_0,mean(amount_0,40)),low_0,5),3))*-1'

Alpha_91	'(ts_rank(decay_linear(decay_linear(correlation(close_0/group_mean(industry_sw_level1_0,close_0),volume_0,10),16),4),5)-rank(decay_linear(correlation(amount_0/volume_0*adjust_factor_0,mean(amount_0,30),4),3)))*-1'

Alpha_92	'min(ts_rank(decay_linear(((((high_0+low_0)/2)+close_0)<(low_0+open_0)),15),19),ts_rank(decay_linear(correlation(rank(low_0),rank(mean(amount_0,30)),8),7),7))'

Alpha_93	'ts_rank(decay_linear(correlation((amount_0/volume_0*adjust_factor_0)/group_mean(industry_sw_level1_0,amount_0/volume_0*adjust_factor_0) ,mean(amount_0,81),17),20),8)/rank(decay_linear(delta(((close_0*0.524434)+(amount_0/volume_0*adjust_factor_0*(1-0.524434))),3),16))'

Alpha_94	'(rank((amount_0/volume_0*adjust_factor_0-ts_min(amount_0/volume_0*adjust_factor_0,12)))**ts_rank(correlation(ts_rank(amount_0/volume_0*adjust_factor_0,20),ts_rank(mean(amount_0,60),4),18),3))*-1'

Alpha_95	'rank((open_0-ts_min(open_0,12)))<ts_rank((rank(correlation(sum(((high_0+low_0)/ 2),19),sum(mean(amount_0,40),19),13))**5),12)'

Alpha_96	'max(ts_rank(decay_linear(correlation(rank(amount_0/volume_0*adjust_factor_0),rank(volume_0),4),4),8),ts_rank(decay_linear(ts_argmax(correlation(ts_rank(close_0,7),ts_rank(mean(amount_0,60),4),4),13),14),13))*-1'     

Alpha_97	'(rank(decay_linear(delta(((low_0*0.721001)+(amount_0/volume_0*adjust_factor_0*(1-0.721001)))/group_mean(industry_sw_level1_0,(low_0*0.721001)+(amount_0/volume_0*adjust_factor_0*(1-0.721001))),3),20)) -ts_rank(decay_linear(ts_rank(correlation(ts_rank(low_0,8),ts_rank(mean(amount_0,60),17),5),16),16),7))*-1'

Alpha_98	'rank(decay_linear(correlation(amount_0/volume_0*adjust_factor_0,sum(mean(amount_0,5),26),5),7))-rank(decay_linear(ts_rank(ts_argmin(correlation(rank(open_0),rank(mean(amount_0,15)),21),9),7),8))'

Alpha_99	'(rank(correlation(sum(((high_0+low_0)/2),20),sum(mean(amount_0,60),20),9)) <rank(correlation(low_0,volume_0,6)))*-1'

Alpha_100	'-1*(((1.5*scale(rank(((((close_0-low_0)-(high_0-close_0))/(high_0-low_0))*volume_0))/group_mean(industry_sw_level2_0,rank(((((close_0-low_0)-(high_0-close_0))/(high_0-low_0))*volume_0)))))-scale((correlation(close_0,rank(mean(amount_0,20)),5)-rank(ts_argmin(close_0,30)))/group_mean(industry_sw_level2_0,(correlation(close_0,rank(mean(amount_0,20)),5)-rank(ts_argmin(close_0,30))))))*(volume_0/mean(amount_0,20)))'

Alpha_101	'(close_0-open_0)/((high_0-low_0)+0.001)' 


这里展示了WorldQuant公开的101个alpha及其表达式,感兴趣的朋友可以参考 单因子测试 的代码做实验,唯一需要修改的是将具体的因子变动下,希望大家能开发出可以稳定盈利的策略,发掘出新的alpha。

注:部分因子可能是布尔型因子,因子值要么是1,要么是-1,这样的单因子在传入StockRanker的时候可能会出错,导致模型训练失败。

小结: 了解上述方法过后,大家即可在策略研究平台上,通过表达式快速进行因子构建和数据标数。


   本文由BigQuant宽客学院推出,版权归BigQuant所有,转载请注明出处。

相关阅读


(Apollo) #2

期待大作


(会飞的鱼) #3
  1. 新建 > 可视化策略-AI选股策略

image

  1. 选择 输入特征列表

image

  1. 打开 代码编辑器窗口,输入特征

image

  1. 运行策略

image

补充:

  • 并不建议把282个因子全部放进特征列表进行测试
  • 在StockRanker算法下部分因子可能会训练失败,你可以修改为其他的机器学习算法
  • 对于BigStudio的使用参考 链接

(iQuant) #4

提示:
在WorldQuant 101 alphas 其中一部分因子取值很少,比如where表达式构建的因子只有两个取值。于是这样的因子如果单独传入AI算法训练,算法训练会报错,报错截图如下:

因子抽取结果截图如下:

这只是引起模型训练报错的一种情形,还有其他几种情形:

  • No data kept


  • We cannoy build a tree with gain = 负无穷大


  • No features available for trading



您可能会说,那怎么办呢?这些因子难道就不能放进模型了吗?其实不然,你只要不是单因子就行,如果是多因子,那么模型训练就是没有问题的了。


(11117) #5

请问有经验者,对于同样的特征因子和预处理方式,排序标注之后使用排序机器学习模型和2分类标注之后使用普通分类机器学习模型,哪种效果更好一些。前者属于激进型模型,后者属于稳健性模型。


(maxchen) #6

自定义因子 应改为 自定义标注


(Lingking) #7

请问下面这个该如何转化为因子?

VAR1:=EMA(EMA(CLOSE,9),9);
A:=(VAR1-REF(VAR1,1))/REF(VAR1,1)*1000;
A-REF(A,1)>5;


#8

ta_ema(ta_ema(close_0, 9),9)/shift(ta_ema(ta_ema(close_0, 9),9),1)*1000-shift(ta_ema(ta_ema(close_0, 9),9)/shift(ta_ema(ta_ema(close_0, 9),9),1)*1000,1)>5
您试试,这是个布尔型的因子
您也可以考虑sign(ta_ema(ta_ema(close_0, 9),9)/shift(ta_ema(ta_ema(close_0, 9),9),1)*1000-shift(ta_ema(ta_ema(close_0, 9),9)/shift(ta_ema(ta_ema(close_0, 9),9),1)*1000,1)-5)变成数值型 都可以尝试


(Lingking) #10

9),1)*1000,1)>3: invalid function: a_ema

KeyError Traceback (most recent call last)
in ()
58 features=m3.data,
59 date_col=‘date’,
—> 60 instrument_col='instrument’
61 )
62

KeyError: “[ ‘a_ema(ta_ema(close_0, 9),9)/shift(ta_ema(ta_ema(close_0, 9),9),1)*1000-shift(ta_ema(ta_ema(close_0, 9),9)/shift(ta_ema(ta_ema(close_0, 9),9),1)*1000,1)>3’] not in index”


(copen) #11

是 ta_ema ,不是a_ema.

image