自定义函数构建因子

自定义
衍生因子
标签: #<Tag:0x00007f5bff235188> #<Tag:0x00007f5bff234f80>

(iQuant) #1

我们先来回顾一下,当我们构建因子时,目前有几种方式。

这里以过去5日成交总额因子举例说明:

  1. 根据因子库默认因子构建

    $$(avg\_amount\_5)*5$$

  2. 运算符构建因子
    $$amount\_0+amount\_1+amount\_2+amount\_3+amount\_4$$

  3. 表达式引擎构建因子
    表达式引擎构建因子比较灵活,下列三种方式效果相同。
    $$sum(amount\_0,5)$$
    $$amount\_0+shift(amount\_0,1)+shift(amount\_0,2)+shift(amount\_0,3)+shift(amount\_0,4)$$
    $$amount\_0+delay(amount\_0,1)+delay(amount\_0,2)+delay(amount\_0,3)+delay(amount\_0,4)$$

但是在策略开发过程中,有些因子比较难以构建,比如个股相对于上证50、或者中证800的超额收益率,这样的因子用刚刚提到的方式很难实现,因此今天又必要介绍如何通过自定义函数构建因子。本文以个股相对于中证800的超额收益率为例。

流程

  1. 拖入证券代码列表模块

    这里我们时间跨度为1年左右,为了减少运行时间,只输入了10只股票。

  2. 拖入 输入特征列表模块,并输入特征

  3. 拖入 基础特征抽取模块,并连线。
    image
    基础特征抽取结果如下:

  4. 拖入 输入特征列表模块,并输入个股相对中证800的超额收益率的特征名称

  5. 拖入衍生特征抽取模块,计算出个股相对中证800的超额收益率的特征名称


    这里是全文最主要的地方,理解清楚就能非常自由灵活地构建因子。
    这里详细讲解下:

bigquant_run = {
    'relative_ret':  relative_ret
}

代码解释:
在这里自定义表达式函数,为字典格式,例:{‘user_rank’:user_rank},字典的key是方法名称,字符串类型,字典的value是方法的引用,更多文档参考:高级特征抽取

def relative_ret(df, close_0):
    return df.groupby('instrument', group_keys=False).apply(calcu_relative_ret)

代码解释:
计算特征需要先按股票代码进行groupby然后单独计算每只股票的特征数据。

def calcu_relative_ret(df):
    start_date = min(m2.data.read_df().date).strftime('%Y-%m-%d')
    end_date = max(m2.data.read_df().date).strftime('%Y-%m-%d')
    hs800_df = D.history_data(
    '000906.SHA',
    start_date=(pd.to_datetime(start_date) - datetime.timedelta(days=10)).strftime('%Y-%m-%d'),  # 多取几天的数据
    end_date=end_date)[['date', 'close']].rename(columns={'close': 'hs800_close'})
    df = df[['date', 'close_0']].reset_index().merge(hs800_df, on='date', how='left').set_index('index')
    return df['close_0'].pct_change() - df['hs800_close'].pct_change()

代码解释:
先根据开始日期和结束日期,通过平台的数据接口获取中证800的指数数据,然后将按股票代码groupby后的数据(个股数据)与中证800数据合并,最后在一个数据框中计算出超额收益率。

查看个股超额收益率特征:

  1. 拖入衍生特征抽取模块,继续抽取其他特征

    查看最终的特征数据,该特征数据可直接和标注数据合并,训练模型。

欢迎大家 克隆进行研究:

克隆策略

    {"Description":"实验创建于2018/1/23","Summary":"","Graph":{"EdgesInternal":[{"DestinationInputPortId":"-38:instruments","SourceOutputPortId":"-4129:data"},{"DestinationInputPortId":"-38:features","SourceOutputPortId":"-315:data"},{"DestinationInputPortId":"-52:features","SourceOutputPortId":"-315:data"},{"DestinationInputPortId":"-45:features","SourceOutputPortId":"-31:data"},{"DestinationInputPortId":"-45:input_data","SourceOutputPortId":"-38:data"},{"DestinationInputPortId":"-52:input_data","SourceOutputPortId":"-45:data"}],"ModuleNodes":[{"Id":"-4129","ModuleId":"BigQuantSpace.instruments.instruments-v2","ModuleParameters":[{"Name":"start_date","Value":"2017-01-01","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"end_date","Value":"2018-01-01","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"market","Value":"CN_STOCK_A","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"instrument_list","Value":"600000.SHA\n600010.SHA\n600015.SHA\n600016.SHA\n600018.SHA\n600028.SHA\n600029.SHA\n600030.SHA\n600036.SHA\n600048.SHA\n\n ","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"max_count","Value":0,"ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"rolling_conf","NodeId":"-4129"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-4129","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":1,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"-315","ModuleId":"BigQuantSpace.input_features.input_features-v1","ModuleParameters":[{"Name":"features","Value":"return_5\nsum(mf_net_amount_l_0, 5)\nclose_0\nmarket_cap_float_0\nclose_0/close_5-1\n\n \n","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features_ds","NodeId":"-315"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-315","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":2,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"-31","ModuleId":"BigQuantSpace.input_features.input_features-v1","ModuleParameters":[{"Name":"features","Value":"relative_ret(close_0)","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features_ds","NodeId":"-31"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-31","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":4,"IsPartOfPartialRun":null,"Comment":"","CommentCollapsed":true},{"Id":"-38","ModuleId":"BigQuantSpace.general_feature_extractor.general_feature_extractor-v7","ModuleParameters":[{"Name":"start_date","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"end_date","Value":"","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"before_start_days","Value":"","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"instruments","NodeId":"-38"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features","NodeId":"-38"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-38","OutputType":null}],"UsePreviousResults":false,"moduleIdForCode":7,"Comment":"","CommentCollapsed":true},{"Id":"-45","ModuleId":"BigQuantSpace.derived_feature_extractor.derived_feature_extractor-v3","ModuleParameters":[{"Name":"date_col","Value":"date","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"instrument_col","Value":"instrument","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"user_functions","Value":"def calcu_relative_ret(df):\n # 先获取中证800指数数据\n start_date = min(m7.data.read_df().date).strftime('%Y-%m-%d')\n end_date = max(m7.data.read_df().date).strftime('%Y-%m-%d')\n hs800_df = D.history_data(\n '000906.SHA',\n start_date=(pd.to_datetime(start_date) - datetime.timedelta(days=10)).strftime('%Y-%m-%d'), # 多取几天的数据\n end_date=end_date)[['date', 'close']].rename(columns={'close': 'hs800_close'})\n \n # 与个股数据合并\n df = df[['date', 'close_0']].reset_index().merge(hs800_df, on='date', how='left').set_index('index')\n \n # 返回超额收益率\n return df['close_0'].pct_change() - df['hs800_close'].pct_change()\n\n\n# 按股票代码groupby计算个股超额收益率数据\ndef relative_ret(df, close_0):\n return df.groupby('instrument', group_keys=False).apply(calcu_relative_ret)\n\n\nbigquant_run = {\n 'relative_ret': relative_ret\n}\n\n","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"input_data","NodeId":"-45"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features","NodeId":"-45"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-45","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":8,"Comment":"在衍生特征里配置自定义函数就可以了,也可以覆盖同名的预定义函数","CommentCollapsed":true},{"Id":"-52","ModuleId":"BigQuantSpace.derived_feature_extractor.derived_feature_extractor-v3","ModuleParameters":[{"Name":"date_col","Value":"date","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"instrument_col","Value":"instrument","ValueType":"Literal","LinkedGlobalParameter":null},{"Name":"user_functions","Value":"{}","ValueType":"Literal","LinkedGlobalParameter":null}],"InputPortsInternal":[{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"input_data","NodeId":"-52"},{"DataSourceId":null,"TrainedModelId":null,"TransformModuleId":null,"Name":"features","NodeId":"-52"}],"OutputPortsInternal":[{"Name":"data","NodeId":"-52","OutputType":null}],"UsePreviousResults":true,"moduleIdForCode":9,"Comment":"","CommentCollapsed":true}],"SerializedClientData":"<?xml version='1.0' encoding='utf-16'?><DataV1 xmlns:xsd='http://www.w3.org/2001/XMLSchema' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'><Meta /><NodePositions><NodePosition Node='-4129' Position='265,658,200,200'/><NodePosition Node='-315' Position='617,661,200,200'/><NodePosition Node='-31' Position='730,774,200,200'/><NodePosition Node='-38' Position='419,762,200,200'/><NodePosition Node='-45' Position='551,862,200,200'/><NodePosition Node='-52' Position='621,950,200,200'/></NodePositions><NodeGroups /></DataV1>"},"IsDraft":true,"ParentExperimentId":null,"WebService":{"IsWebServiceExperiment":false,"Inputs":[],"Outputs":[],"Parameters":[{"Name":"交易日期","Value":"","ParameterDefinition":{"Name":"交易日期","FriendlyName":"交易日期","DefaultValue":"","ParameterType":"String","HasDefaultValue":true,"IsOptional":true,"ParameterRules":[],"HasRules":false,"MarkupType":0,"CredentialDescriptor":null}}],"WebServiceGroupId":null,"SerializedClientData":"<?xml version='1.0' encoding='utf-16'?><DataV1 xmlns:xsd='http://www.w3.org/2001/XMLSchema' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'><Meta /><NodePositions></NodePositions><NodeGroups /></DataV1>"},"DisableNodesUpdate":false,"Category":"user","Tags":[],"IsPartialRun":true}
    In [1]:
    # 本代码由可视化策略环境自动生成 2018年7月21日 09:07
    # 本代码单元只能在可视化模式下编辑。您也可以拷贝代码,粘贴到新建的代码单元或者策略,然后修改。
    
    
    m1 = M.instruments.v2(
        start_date='2017-01-01',
        end_date='2018-01-01',
        market='CN_STOCK_A',
        instrument_list="""600000.SHA
    600010.SHA
    600015.SHA
    600016.SHA
    600018.SHA
    600028.SHA
    600029.SHA
    600030.SHA
    600036.SHA
    600048.SHA
    
     """,
        max_count=0
    )
    
    m2 = M.input_features.v1(
        features="""return_5
    sum(mf_net_amount_l_0, 5)
    close_0
    market_cap_float_0
    close_0/close_5-1
    
     
    """
    )
    
    m7 = M.general_feature_extractor.v7(
        instruments=m1.data,
        features=m2.data,
        start_date='',
        end_date='',
        m_cached=False
    )
    
    m4 = M.input_features.v1(
        features='relative_ret(close_0)'
    )
    
    def calcu_relative_ret(df):
        # 先获取中证800指数数据
        start_date = min(m7.data.read_df().date).strftime('%Y-%m-%d')
        end_date = max(m7.data.read_df().date).strftime('%Y-%m-%d')
        hs800_df = D.history_data(
        '000906.SHA',
        start_date=(pd.to_datetime(start_date) - datetime.timedelta(days=10)).strftime('%Y-%m-%d'),  # 多取几天的数据
        end_date=end_date)[['date', 'close']].rename(columns={'close': 'hs800_close'})
        
        # 与个股数据合并
        df = df[['date', 'close_0']].reset_index().merge(hs800_df, on='date', how='left').set_index('index')
        
        # 返回超额收益率
        return df['close_0'].pct_change() - df['hs800_close'].pct_change()
    
    
    # 按股票代码groupby计算个股超额收益率数据
    def relative_ret(df, close_0):
        return df.groupby('instrument', group_keys=False).apply(calcu_relative_ret)
    
    
    m8_user_functions_bigquant_run = {
        'relative_ret':  relative_ret
    }
    
    
    m8 = M.derived_feature_extractor.v3(
        input_data=m7.data,
        features=m4.data,
        date_col='date',
        instrument_col='instrument',
        user_functions=m8_user_functions_bigquant_run
    )
    
    m9 = M.derived_feature_extractor.v3(
        input_data=m8.data,
        features=m2.data,
        date_col='date',
        instrument_col='instrument',
        user_functions={}
    )
    
    [2018-07-21 09:07:06.514970] INFO: bigquant: instruments.v2 开始运行..
    [2018-07-21 09:07:06.528198] INFO: bigquant: 命中缓存
    [2018-07-21 09:07:06.529552] INFO: bigquant: instruments.v2 运行完成[0.014616s].
    [2018-07-21 09:07:06.535163] INFO: bigquant: input_features.v1 开始运行..
    [2018-07-21 09:07:06.586750] INFO: bigquant: 命中缓存
    [2018-07-21 09:07:06.588032] INFO: bigquant: input_features.v1 运行完成[0.052862s].
    [2018-07-21 09:07:06.597259] INFO: bigquant: general_feature_extractor.v7 开始运行..
    [2018-07-21 09:07:08.381895] INFO: 基础特征抽取: 年份 2017, 特征行数=2434
    [2018-07-21 09:07:09.516820] INFO: 基础特征抽取: 年份 2018, 特征行数=0
    [2018-07-21 09:07:09.525082] INFO: 基础特征抽取: 总行数: 2434
    [2018-07-21 09:07:09.526209] INFO: bigquant: general_feature_extractor.v7 运行完成[2.92896s].
    [2018-07-21 09:07:09.529049] INFO: bigquant: input_features.v1 开始运行..
    [2018-07-21 09:07:09.533583] INFO: bigquant: 命中缓存
    [2018-07-21 09:07:09.534437] INFO: bigquant: input_features.v1 运行完成[0.005389s].
    [2018-07-21 09:07:09.733215] INFO: bigquant: derived_feature_extractor.v3 开始运行..
    [2018-07-21 09:07:10.531858] INFO: derived_feature_extractor: 提取完成 relative_ret(close_0), 0.775s
    [2018-07-21 09:07:10.552099] INFO: derived_feature_extractor: /y_2017, 2434
    [2018-07-21 09:07:10.574013] INFO: bigquant: derived_feature_extractor.v3 运行完成[0.8408s].
    [2018-07-21 09:07:10.577004] INFO: bigquant: derived_feature_extractor.v3 开始运行..
    [2018-07-21 09:07:10.602979] INFO: derived_feature_extractor: 提取完成 close_0/close_5-1, 0.002s
    [2018-07-21 09:07:10.614650] INFO: derived_feature_extractor: 提取完成 sum(mf_net_amount_l_0, 5), 0.011s
    [2018-07-21 09:07:10.634869] INFO: derived_feature_extractor: /y_2017, 2434
    [2018-07-21 09:07:10.655059] INFO: bigquant: derived_feature_extractor.v3 运行完成[0.07805s].
    
    In [3]:
    m9.data.read_df().tail()
    
    Out[3]:
    close_0 close_5 date instrument market_cap_float_0 mf_net_amount_l_0 return_5 relative_ret(close_0) close_0/close_5-1 sum(mf_net_amount_l_0, 5)
    2429 16.315989 16.049822 2017-12-29 600028.SHA 5.857692e+11 54136712.0 1.026801 -0.012250 0.016584 -5988248.0
    2430 19.869745 19.486353 2017-12-29 600029.SHA 8.370998e+10 15215392.0 1.015332 -0.003320 0.019675 -55193096.0
    2431 95.301025 95.774895 2017-12-29 600030.SHA 1.776454e+11 14522800.0 0.995052 -0.005263 -0.004948 -59748208.0
    2432 134.327225 135.576996 2017-12-29 600036.SHA 5.986520e+11 -12134416.0 0.979082 0.009463 -0.009218 238420608.0
    2433 270.211182 249.778244 2017-12-29 600048.SHA 1.660692e+11 -3875136.0 1.085956 0.027934 0.081804 14542224.0

    基于AI排序算法的指数增强策略
    (iQuant) #4

    如果不是想计算超额收益率因子,而是想抽取中证800五日收益率因子,稍微修改一下即可:

    def calcu_relative_ret(df):
        # 先获取中证800指数数据
        start_date = min(m3.data.read_df().date).strftime('%Y-%m-%d')
        end_date = max(m3.data.read_df().date).strftime('%Y-%m-%d')
        hs800_df = D.history_data(
        '000906.SHA',
        start_date=(pd.to_datetime(start_date) - datetime.timedelta(days=10)).strftime('%Y-%m-%d'),  # 多取几天的数据
        end_date=end_date)[['date', 'close']].rename(columns={'close': 'hs800_close'})
        
        # 与个股数据合并
        df = df[['date', 'close_0']].reset_index().merge(hs800_df, on='date', how='left').set_index('index')
        
        # 返回中证800五日收益率因子
        return  df['hs800_close']/df['hs800_close'].shift(5)-1
    

    这样构建的因子,在同一个日期,所有股票该因子都一样,也可以纳入训练模型。


    (zichuan) #5

    @iQuant 您好,证券代码列表模块里面没有港股吗? 这样的话如果想针对港股自定义函数构建因子该如何做呢?谢谢