[量化学堂-Python编程]Pandas查看和选择数据

pandas
标签: #<Tag:0x00007f5bff532d68>

(iQuant) #1

导语:本节主要讲解Pandas库中 DataFrame 的数据查看与选择


Pandas 是基于 Numpy 构建的,让以 Numpy 为中心的应用变得更加简单。平台获取的数据主要是以 Pandas 中DataFrame 的形式。除此之外,Pandas 还包括 一维数组Series 以及三维的Panel。

下面将进行详细介绍:

Series:一维数组,与Numpy中的一维array类似。二者与Python基本的数据结构List也很相近,其区别是:List中的元素可以是不同的数据类型,而Array和Series中则只允许存储相同的数据类型,这样可以更有效的使用内存,提高运算效率。

DataFrame:二维的表格型数据结构。很多功能与R中的data.frame类似。可以将DataFrame理解为Series的容器。以下的内容主要以DataFrame为主。

克隆策略
In [1]:
# 首先导入库
import pandas as pd

1、Series

由一组数据(各种Numpy数据类型),以及一组与之相关的标签数据(即索引)组成。仅由一组数据即可产生最简单的Series,可以通过传递一个list对象来创建一个Series,Pandas会默认创建整型索引。

创建一个Series:

In [2]:
s = pd.Series([1,3,5,np.nan,6,8])
s
Out[2]:
0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64

获取 Series 的索引:

In [3]:
s.index
Out[3]:
RangeIndex(start=0, stop=6, step=1)

2、DataFrame

DataFrame是一个表格型的数据结构,它含有一组有序的列,每一列的数据结构都是相同的,而不同的列之间则可以是不同的数据结构(数值、字符、布尔值等)。或者以数据库进行类比,DataFrame中的每一行是一个记录,名称为Index的一个元素,而每一列则为一个字段,是这个记录的一个属性。DataFrame既有行索引也有列索引,可以被看做由Series组成的字典(共用同一个索引)。

2.1 创建一个DataFrame,包括一个numpy array, 时间索引和列名字:

In [4]:
dates = pd.date_range('20130101',periods=6)
dates
Out[4]:
DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
               '2013-01-05', '2013-01-06'],
              dtype='datetime64[ns]', freq='D')
In [5]:
df = pd.DataFrame(np.random.randn(6,4),index=dates,columns=list('ABCD'))
df
Out[5]:
A B C D
2013-01-01 -1.971026 -1.149504 -0.886253 -1.423737
2013-01-02 1.248603 0.158524 0.047291 0.834145
2013-01-03 0.572176 -0.093579 0.892276 0.376732
2013-01-04 0.343039 -0.820224 0.220198 -0.026627
2013-01-05 0.835942 -0.209759 -0.740630 0.234057
2013-01-06 -1.240034 1.753316 0.209350 0.027492

2.2 查看数据

我们以平台获取的数据为例进行讲解:

In [6]:
# 获取沪深300的数据
df = D.history_data(instruments=['000300.SHA'], start_date='2017-01-01', end_date='2017-01-20',
               fields=['open', 'high', 'low', 'close', 'volume', 'amount'])
df
Out[6]:
date instrument open high low close volume amount
0 2017-01-03 000300.SHA 3313.953369 3345.262939 3313.953369 3342.227295 8656789600 8.761277e+10
1 2017-01-04 000300.SHA 3340.557617 3369.565918 3338.152588 3368.311768 9005082700 9.801436e+10
2 2017-01-05 000300.SHA 3368.340088 3373.411865 3361.461914 3367.789307 8833635700 9.118934e+10
3 2017-01-06 000300.SHA 3366.852295 3368.870605 3346.891357 3347.666504 8745911400 9.107226e+10
4 2017-01-09 000300.SHA 3345.748291 3366.221924 3344.347168 3363.901367 8370794200 8.791612e+10
5 2017-01-10 000300.SHA 3361.639404 3370.505127 3354.087891 3358.271484 8381916000 8.253397e+10
6 2017-01-11 000300.SHA 3355.800781 3366.631592 3334.297852 3334.495361 9107143800 8.675045e+10
7 2017-01-12 000300.SHA 3332.685791 3343.127930 3315.962891 3317.624023 8123913500 7.503189e+10
8 2017-01-13 000300.SHA 3317.962402 3338.174072 3309.374023 3319.912109 8418058900 8.455610e+10
9 2017-01-16 000300.SHA 3314.150635 3322.781982 3264.207031 3319.445557 15777913500 1.532625e+11
10 2017-01-17 000300.SHA 3305.600098 3327.376709 3292.204346 3326.356201 6945332800 7.475451e+10
11 2017-01-18 000300.SHA 3323.338867 3350.961914 3320.406250 3339.365234 7062322700 7.568068e+10
12 2017-01-19 000300.SHA 3330.952148 3346.495850 3324.683594 3329.289062 6615980800 6.386971e+10
13 2017-01-20 000300.SHA 3326.977295 3358.412109 3326.374023 3354.889160 6530046200 7.083133e+10

查看前几条数据:

In [7]:
df.head() # 默认是查询5条
Out[7]:
date instrument open high low close volume amount
0 2017-01-03 000300.SHA 3313.953369 3345.262939 3313.953369 3342.227295 8656789600 8.761277e+10
1 2017-01-04 000300.SHA 3340.557617 3369.565918 3338.152588 3368.311768 9005082700 9.801436e+10
2 2017-01-05 000300.SHA 3368.340088 3373.411865 3361.461914 3367.789307 8833635700 9.118934e+10
3 2017-01-06 000300.SHA 3366.852295 3368.870605 3346.891357 3347.666504 8745911400 9.107226e+10
4 2017-01-09 000300.SHA 3345.748291 3366.221924 3344.347168 3363.901367 8370794200 8.791612e+10

查看后几条数据:

In [8]:
df.tail()
Out[8]:
date instrument open high low close volume amount
9 2017-01-16 000300.SHA 3314.150635 3322.781982 3264.207031 3319.445557 15777913500 1.532625e+11
10 2017-01-17 000300.SHA 3305.600098 3327.376709 3292.204346 3326.356201 6945332800 7.475451e+10
11 2017-01-18 000300.SHA 3323.338867 3350.961914 3320.406250 3339.365234 7062322700 7.568068e+10
12 2017-01-19 000300.SHA 3330.952148 3346.495850 3324.683594 3329.289062 6615980800 6.386971e+10
13 2017-01-20 000300.SHA 3326.977295 3358.412109 3326.374023 3354.889160 6530046200 7.083133e+10

查看 DataFrame 的索引

In [9]:
df.index
Out[9]:
RangeIndex(start=0, stop=14, step=1)

查看 DataFrame 的列名

In [10]:
df.columns
Out[10]:
Index(['date', 'instrument', 'open', 'high', 'low', 'close', 'volume',
       'amount'],
      dtype='object')

查看 DataFrame 的值

In [11]:
df.values
Out[11]:
array([[Timestamp('2017-01-03 00:00:00'), '000300.SHA', 3313.953369140625,
        3345.262939453125, 3313.953369140625, 3342.227294921875,
        8656789600, 87612768256.0],
       [Timestamp('2017-01-04 00:00:00'), '000300.SHA', 3340.5576171875,
        3369.56591796875, 3338.152587890625, 3368.311767578125, 9005082700,
        98014355456.0],
       [Timestamp('2017-01-05 00:00:00'), '000300.SHA', 3368.340087890625,
        3373.411865234375, 3361.4619140625, 3367.789306640625, 8833635700,
        91189338112.0],
       [Timestamp('2017-01-06 00:00:00'), '000300.SHA', 3366.852294921875,
        3368.87060546875, 3346.891357421875, 3347.66650390625, 8745911400,
        91072258048.0],
       [Timestamp('2017-01-09 00:00:00'), '000300.SHA', 3345.748291015625,
        3366.221923828125, 3344.34716796875, 3363.9013671875, 8370794200,
        87916118016.0],
       [Timestamp('2017-01-10 00:00:00'), '000300.SHA', 3361.639404296875,
        3370.505126953125, 3354.087890625, 3358.271484375, 8381916000,
        82533965824.0],
       [Timestamp('2017-01-11 00:00:00'), '000300.SHA', 3355.80078125,
        3366.631591796875, 3334.2978515625, 3334.495361328125, 9107143800,
        86750453760.0],
       [Timestamp('2017-01-12 00:00:00'), '000300.SHA', 3332.685791015625,
        3343.1279296875, 3315.962890625, 3317.6240234375, 8123913500,
        75031887872.0],
       [Timestamp('2017-01-13 00:00:00'), '000300.SHA', 3317.96240234375,
        3338.174072265625, 3309.3740234375, 3319.912109375, 8418058900,
        84556103680.0],
       [Timestamp('2017-01-16 00:00:00'), '000300.SHA', 3314.150634765625,
        3322.781982421875, 3264.20703125, 3319.445556640625, 15777913500,
        153262522368.0],
       [Timestamp('2017-01-17 00:00:00'), '000300.SHA', 3305.60009765625,
        3327.376708984375, 3292.204345703125, 3326.356201171875,
        6945332800, 74754506752.0],
       [Timestamp('2017-01-18 00:00:00'), '000300.SHA', 3323.3388671875,
        3350.9619140625, 3320.40625, 3339.365234375, 7062322700,
        75680677888.0],
       [Timestamp('2017-01-19 00:00:00'), '000300.SHA', 3330.9521484375,
        3346.495849609375, 3324.68359375, 3329.2890625, 6615980800,
        63869710336.0],
       [Timestamp('2017-01-20 00:00:00'), '000300.SHA', 3326.977294921875,
        3358.412109375, 3326.3740234375, 3354.88916015625, 6530046200,
        70831325184.0]], dtype=object)

使用 describe() 函数对于数据的快速统计汇总:

In [12]:
df.describe()
Out[12]:
open high low close volume amount
count 14.000000 14.000000 14.000000 14.000000 1.400000e+01 1.400000e+01
mean 3336.040283 3353.414062 3324.742920 3342.110596 8.612489e+09 8.736257e+10
std 20.855013 16.710600 25.615406 18.314083 2.248143e+09 2.110282e+10
min 3305.600098 3322.781982 3264.207031 3317.624023 6.530046e+09 6.386971e+10
25% 3319.306519 3343.661682 3314.455750 3327.089417 7.327720e+09 7.519409e+10
50% 3331.818970 3354.687012 3325.528809 3340.796265 8.399987e+09 8.565328e+10
75% 3353.287659 3368.310852 3342.798523 3357.425903 8.811705e+09 9.028322e+10
max 3368.340088 3373.411865 3361.461914 3368.311768 1.577791e+10 1.532625e+11

对数据的转置:

In [13]:
df.T
Out[13]:
0 1 2 3 4 5 6 7 8 9 10 11 12 13
date 2017-01-03 00:00:00 2017-01-04 00:00:00 2017-01-05 00:00:00 2017-01-06 00:00:00 2017-01-09 00:00:00 2017-01-10 00:00:00 2017-01-11 00:00:00 2017-01-12 00:00:00 2017-01-13 00:00:00 2017-01-16 00:00:00 2017-01-17 00:00:00 2017-01-18 00:00:00 2017-01-19 00:00:00 2017-01-20 00:00:00
instrument 000300.SHA 000300.SHA 000300.SHA 000300.SHA 000300.SHA 000300.SHA 000300.SHA 000300.SHA 000300.SHA 000300.SHA 000300.SHA 000300.SHA 000300.SHA 000300.SHA
open 3313.95 3340.56 3368.34 3366.85 3345.75 3361.64 3355.8 3332.69 3317.96 3314.15 3305.6 3323.34 3330.95 3326.98
high 3345.26 3369.57 3373.41 3368.87 3366.22 3370.51 3366.63 3343.13 3338.17 3322.78 3327.38 3350.96 3346.5 3358.41
low 3313.95 3338.15 3361.46 3346.89 3344.35 3354.09 3334.3 3315.96 3309.37 3264.21 3292.2 3320.41 3324.68 3326.37
close 3342.23 3368.31 3367.79 3347.67 3363.9 3358.27 3334.5 3317.62 3319.91 3319.45 3326.36 3339.37 3329.29 3354.89
volume 8656789600 9005082700 8833635700 8745911400 8370794200 8381916000 9107143800 8123913500 8418058900 15777913500 6945332800 7062322700 6615980800 6530046200
amount 8.76128e+10 9.80144e+10 9.11893e+10 9.10723e+10 8.79161e+10 8.2534e+10 8.67505e+10 7.50319e+10 8.45561e+10 1.53263e+11 7.47545e+10 7.56807e+10 6.38697e+10 7.08313e+10

按列对 DataFrame 进行排序

In [14]:
df.sort(columns='open')
Out[14]:
date instrument open high low close volume amount
10 2017-01-17 000300.SHA 3305.600098 3327.376709 3292.204346 3326.356201 6945332800 7.475451e+10
0 2017-01-03 000300.SHA 3313.953369 3345.262939 3313.953369 3342.227295 8656789600 8.761277e+10
9 2017-01-16 000300.SHA 3314.150635 3322.781982 3264.207031 3319.445557 15777913500 1.532625e+11
8 2017-01-13 000300.SHA 3317.962402 3338.174072 3309.374023 3319.912109 8418058900 8.455610e+10
11 2017-01-18 000300.SHA 3323.338867 3350.961914 3320.406250 3339.365234 7062322700 7.568068e+10
13 2017-01-20 000300.SHA 3326.977295 3358.412109 3326.374023 3354.889160 6530046200 7.083133e+10
12 2017-01-19 000300.SHA 3330.952148 3346.495850 3324.683594 3329.289062 6615980800 6.386971e+10
7 2017-01-12 000300.SHA 3332.685791 3343.127930 3315.962891 3317.624023 8123913500 7.503189e+10
1 2017-01-04 000300.SHA 3340.557617 3369.565918 3338.152588 3368.311768 9005082700 9.801436e+10
4 2017-01-09 000300.SHA 3345.748291 3366.221924 3344.347168 3363.901367 8370794200 8.791612e+10
6 2017-01-11 000300.SHA 3355.800781 3366.631592 3334.297852 3334.495361 9107143800 8.675045e+10
5 2017-01-10 000300.SHA 3361.639404 3370.505127 3354.087891 3358.271484 8381916000 8.253397e+10
3 2017-01-06 000300.SHA 3366.852295 3368.870605 3346.891357 3347.666504 8745911400 9.107226e+10
2 2017-01-05 000300.SHA 3368.340088 3373.411865 3361.461914 3367.789307 8833635700 9.118934e+10
In [15]:
df.sort_values(by='open')
Out[15]:
date instrument open high low close volume amount
10 2017-01-17 000300.SHA 3305.600098 3327.376709 3292.204346 3326.356201 6945332800 7.475451e+10
0 2017-01-03 000300.SHA 3313.953369 3345.262939 3313.953369 3342.227295 8656789600 8.761277e+10
9 2017-01-16 000300.SHA 3314.150635 3322.781982 3264.207031 3319.445557 15777913500 1.532625e+11
8 2017-01-13 000300.SHA 3317.962402 3338.174072 3309.374023 3319.912109 8418058900 8.455610e+10
11 2017-01-18 000300.SHA 3323.338867 3350.961914 3320.406250 3339.365234 7062322700 7.568068e+10
13 2017-01-20 000300.SHA 3326.977295 3358.412109 3326.374023 3354.889160 6530046200 7.083133e+10
12 2017-01-19 000300.SHA 3330.952148 3346.495850 3324.683594 3329.289062 6615980800 6.386971e+10
7 2017-01-12 000300.SHA 3332.685791 3343.127930 3315.962891 3317.624023 8123913500 7.503189e+10
1 2017-01-04 000300.SHA 3340.557617 3369.565918 3338.152588 3368.311768 9005082700 9.801436e+10
4 2017-01-09 000300.SHA 3345.748291 3366.221924 3344.347168 3363.901367 8370794200 8.791612e+10
6 2017-01-11 000300.SHA 3355.800781 3366.631592 3334.297852 3334.495361 9107143800 8.675045e+10
5 2017-01-10 000300.SHA 3361.639404 3370.505127 3354.087891 3358.271484 8381916000 8.253397e+10
3 2017-01-06 000300.SHA 3366.852295 3368.870605 3346.891357 3347.666504 8745911400 9.107226e+10
2 2017-01-05 000300.SHA 3368.340088 3373.411865 3361.461914 3367.789307 8833635700 9.118934e+10

2.3 选择数据

2.3.1 通过下标选取数据:

df['open'],df.open

以上两个语句是等效的,都是返回 df 名称为 open 列的数据,返回的为一个 Series。

df[0:3], df['2016-07-05':'2016-07-08']

下标索引选取的是 DataFrame 的记录,与 List 相同 DataFrame 的下标也是从0开始,区间索引的话,为一个左闭右开的区间,即[0:3]选取的为0-2三条记录。

与此等价,还可以用起始的索引名称和结束索引名称选取数据,如:df['a':'b']。有一点需要注意的是使用起始索引名称和结束索引名称时,也会包含结束索引的数据。具体看 下方示例:

以上两种方式返回的都是DataFrame。

选择一列数据:

In [16]:
df['open'] # 返回Series
Out[16]:
0     3313.953369
1     3340.557617
2     3368.340088
3     3366.852295
4     3345.748291
5     3361.639404
6     3355.800781
7     3332.685791
8     3317.962402
9     3314.150635
10    3305.600098
11    3323.338867
12    3330.952148
13    3326.977295
Name: open, dtype: float32
In [17]:
df[['open']] # 返回DataFrame
Out[17]:
open
0 3313.953369
1 3340.557617
2 3368.340088
3 3366.852295
4 3345.748291
5 3361.639404
6 3355.800781
7 3332.685791
8 3317.962402
9 3314.150635
10 3305.600098
11 3323.338867
12 3330.952148
13 3326.977295

选择多列

In [18]:
df[['open', 'high']]
Out[18]:
open high
0 3313.953369 3345.262939
1 3340.557617 3369.565918
2 3368.340088 3373.411865
3 3366.852295 3368.870605
4 3345.748291 3366.221924
5 3361.639404 3370.505127
6 3355.800781 3366.631592
7 3332.685791 3343.127930
8 3317.962402 3338.174072
9 3314.150635 3322.781982
10 3305.600098 3327.376709
11 3323.338867 3350.961914
12 3330.952148 3346.495850
13 3326.977295 3358.412109

选择多行

In [19]:
df[0:3]
Out[19]:
date instrument open high low close volume amount
0 2017-01-03 000300.SHA 3313.953369 3345.262939 3313.953369 3342.227295 8656789600 8.761277e+10
1 2017-01-04 000300.SHA 3340.557617 3369.565918 3338.152588 3368.311768 9005082700 9.801436e+10
2 2017-01-05 000300.SHA 3368.340088 3373.411865 3361.461914 3367.789307 8833635700 9.118934e+10

2.3.2 使用标签选取数据:

df.loc[行标签,列标签]

df.loc['a':'b'] #选取 ab 两行数据

df.loc[:,'open'] #选取 open 列的数据

df.loc 的第一个参数是行标签,第二个参数为列标签(可选参数,默认为所有列标签),两个参数既可以是列表也可以是单个字符,如果两个参数都为列表则返回的是 DataFrame,否则,则为 Series。

PS:loc为location的缩写。

In [20]:
df.loc[3,'open']
Out[20]:
3366.8523
In [21]:
df.loc[1:3]
Out[21]:
date instrument open high low close volume amount
1 2017-01-04 000300.SHA 3340.557617 3369.565918 3338.152588 3368.311768 9005082700 9.801436e+10
2 2017-01-05 000300.SHA 3368.340088 3373.411865 3361.461914 3367.789307 8833635700 9.118934e+10
3 2017-01-06 000300.SHA 3366.852295 3368.870605 3346.891357 3347.666504 8745911400 9.107226e+10
In [22]:
df.loc[:, 'open']
Out[22]:
0     3313.953369
1     3340.557617
2     3368.340088
3     3366.852295
4     3345.748291
5     3361.639404
6     3355.800781
7     3332.685791
8     3317.962402
9     3314.150635
10    3305.600098
11    3323.338867
12    3330.952148
13    3326.977295
Name: open, dtype: float32

2.3.3. 使用位置选取数据:

df.iloc[行位置,列位置]

df.iloc[1,1] #选取第二行,第二列的值,返回的为单个值

df.iloc[[0,2],:] #选取第一行及第三行的数据

df.iloc[0:2,:] #选取第一行到第三行(不包含)的数据

df.iloc[:,1] #选取所有记录的第二列的值,返回的为一个Series

df.iloc[1,:] #选取第一行数据,返回的为一个Series

PS:iloc 则为 integer & location 的缩写

In [23]:
df
Out[23]:
date instrument open high low close volume amount
0 2017-01-03 000300.SHA 3313.953369 3345.262939 3313.953369 3342.227295 8656789600 8.761277e+10
1 2017-01-04 000300.SHA 3340.557617 3369.565918 3338.152588 3368.311768 9005082700 9.801436e+10
2 2017-01-05 000300.SHA 3368.340088 3373.411865 3361.461914 3367.789307 8833635700 9.118934e+10
3 2017-01-06 000300.SHA 3366.852295 3368.870605 3346.891357 3347.666504 8745911400 9.107226e+10
4 2017-01-09 000300.SHA 3345.748291 3366.221924 3344.347168 3363.901367 8370794200 8.791612e+10
5 2017-01-10 000300.SHA 3361.639404 3370.505127 3354.087891 3358.271484 8381916000 8.253397e+10
6 2017-01-11 000300.SHA 3355.800781 3366.631592 3334.297852 3334.495361 9107143800 8.675045e+10
7 2017-01-12 000300.SHA 3332.685791 3343.127930 3315.962891 3317.624023 8123913500 7.503189e+10
8 2017-01-13 000300.SHA 3317.962402 3338.174072 3309.374023 3319.912109 8418058900 8.455610e+10
9 2017-01-16 000300.SHA 3314.150635 3322.781982 3264.207031 3319.445557 15777913500 1.532625e+11
10 2017-01-17 000300.SHA 3305.600098 3327.376709 3292.204346 3326.356201 6945332800 7.475451e+10
11 2017-01-18 000300.SHA 3323.338867 3350.961914 3320.406250 3339.365234 7062322700 7.568068e+10
12 2017-01-19 000300.SHA 3330.952148 3346.495850 3324.683594 3329.289062 6615980800 6.386971e+10
13 2017-01-20 000300.SHA 3326.977295 3358.412109 3326.374023 3354.889160 6530046200 7.083133e+10
In [24]:
df.iloc[1,1] # 选取第二行,第二列的值,返回的为单个值
Out[24]:
'000300.SHA'
In [25]:
df.iloc[[0,2],:] # 选取第一行及第三行的数据
Out[25]:
date instrument open high low close volume amount
0 2017-01-03 000300.SHA 3313.953369 3345.262939 3313.953369 3342.227295 8656789600 8.761277e+10
2 2017-01-05 000300.SHA 3368.340088 3373.411865 3361.461914 3367.789307 8833635700 9.118934e+10
In [26]:
df.iloc[0:2,:] # 选取第一行到第三行(不包含)的数据
Out[26]:
date instrument open high low close volume amount
0 2017-01-03 000300.SHA 3313.953369 3345.262939 3313.953369 3342.227295 8656789600 8.761277e+10
1 2017-01-04 000300.SHA 3340.557617 3369.565918 3338.152588 3368.311768 9005082700 9.801436e+10
In [27]:
df.iloc[:,1] # 选取所有记录的第一列的值,返回的为一个Series
Out[27]:
0     000300.SHA
1     000300.SHA
2     000300.SHA
3     000300.SHA
4     000300.SHA
5     000300.SHA
6     000300.SHA
7     000300.SHA
8     000300.SHA
9     000300.SHA
10    000300.SHA
11    000300.SHA
12    000300.SHA
13    000300.SHA
Name: instrument, dtype: object
In [28]:
df.iloc[1,:] # 选取第一行数据,返回的为一个Series
Out[28]:
date          2017-01-04 00:00:00
instrument             000300.SHA
open                      3340.56
high                      3369.57
low                       3338.15
close                     3368.31
volume                 9005082700
amount                9.80144e+10
Name: 1, dtype: object

2.3.4. 更广义的切片方式是使用.ix,它自动根据给到的索引类型判断是使用位置还是标签进行切片

df.ix[1,1]

df.ix['a':'b']

In [29]:
df
Out[29]:
date instrument open high low close volume amount
0 2017-01-03 000300.SHA 3313.953369 3345.262939 3313.953369 3342.227295 8656789600 8.761277e+10
1 2017-01-04 000300.SHA 3340.557617 3369.565918 3338.152588 3368.311768 9005082700 9.801436e+10
2 2017-01-05 000300.SHA 3368.340088 3373.411865 3361.461914 3367.789307 8833635700 9.118934e+10
3 2017-01-06 000300.SHA 3366.852295 3368.870605 3346.891357 3347.666504 8745911400 9.107226e+10
4 2017-01-09 000300.SHA 3345.748291 3366.221924 3344.347168 3363.901367 8370794200 8.791612e+10
5 2017-01-10 000300.SHA 3361.639404 3370.505127 3354.087891 3358.271484 8381916000 8.253397e+10
6 2017-01-11 000300.SHA 3355.800781 3366.631592 3334.297852 3334.495361 9107143800 8.675045e+10
7 2017-01-12 000300.SHA 3332.685791 3343.127930 3315.962891 3317.624023 8123913500 7.503189e+10
8 2017-01-13 000300.SHA 3317.962402 3338.174072 3309.374023 3319.912109 8418058900 8.455610e+10
9 2017-01-16 000300.SHA 3314.150635 3322.781982 3264.207031 3319.445557 15777913500 1.532625e+11
10 2017-01-17 000300.SHA 3305.600098 3327.376709 3292.204346 3326.356201 6945332800 7.475451e+10
11 2017-01-18 000300.SHA 3323.338867 3350.961914 3320.406250 3339.365234 7062322700 7.568068e+10
12 2017-01-19 000300.SHA 3330.952148 3346.495850 3324.683594 3329.289062 6615980800 6.386971e+10
13 2017-01-20 000300.SHA 3326.977295 3358.412109 3326.374023 3354.889160 6530046200 7.083133e+10
In [30]:
df.ix[1,1]
Out[30]:
'000300.SHA'
In [31]:
df.ix[1,'open']
Out[31]:
3340.5576

2.3.5 通过逻辑指针进行数据切片:

df[逻辑条件]

df[df.one >= 2] #单个逻辑条件

df[(df.one >=1 ) & (df.one < 3) ] #多个逻辑条件组合

In [32]:
df
Out[32]:
date instrument open high low close volume amount
0 2017-01-03 000300.SHA 3313.953369 3345.262939 3313.953369 3342.227295 8656789600 8.761277e+10
1 2017-01-04 000300.SHA 3340.557617 3369.565918 3338.152588 3368.311768 9005082700 9.801436e+10
2 2017-01-05 000300.SHA 3368.340088 3373.411865 3361.461914 3367.789307 8833635700 9.118934e+10
3 2017-01-06 000300.SHA 3366.852295 3368.870605 3346.891357 3347.666504 8745911400 9.107226e+10
4 2017-01-09 000300.SHA 3345.748291 3366.221924 3344.347168 3363.901367 8370794200 8.791612e+10
5 2017-01-10 000300.SHA 3361.639404 3370.505127 3354.087891 3358.271484 8381916000 8.253397e+10
6 2017-01-11 000300.SHA 3355.800781 3366.631592 3334.297852 3334.495361 9107143800 8.675045e+10
7 2017-01-12 000300.SHA 3332.685791 3343.127930 3315.962891 3317.624023 8123913500 7.503189e+10
8 2017-01-13 000300.SHA 3317.962402 3338.174072 3309.374023 3319.912109 8418058900 8.455610e+10
9 2017-01-16 000300.SHA 3314.150635 3322.781982 3264.207031 3319.445557 15777913500 1.532625e+11
10 2017-01-17 000300.SHA 3305.600098 3327.376709 3292.204346 3326.356201 6945332800 7.475451e+10
11 2017-01-18 000300.SHA 3323.338867 3350.961914 3320.406250 3339.365234 7062322700 7.568068e+10
12 2017-01-19 000300.SHA 3330.952148 3346.495850 3324.683594 3329.289062 6615980800 6.386971e+10
13 2017-01-20 000300.SHA 3326.977295 3358.412109 3326.374023 3354.889160 6530046200 7.083133e+10
In [33]:
df[df.open > 3340]
Out[33]:
date instrument open high low close volume amount
1 2017-01-04 000300.SHA 3340.557617 3369.565918 3338.152588 3368.311768 9005082700 9.801436e+10
2 2017-01-05 000300.SHA 3368.340088 3373.411865 3361.461914 3367.789307 8833635700 9.118934e+10
3 2017-01-06 000300.SHA 3366.852295 3368.870605 3346.891357 3347.666504 8745911400 9.107226e+10
4 2017-01-09 000300.SHA 3345.748291 3366.221924 3344.347168 3363.901367 8370794200 8.791612e+10
5 2017-01-10 000300.SHA 3361.639404 3370.505127 3354.087891 3358.271484 8381916000 8.253397e+10
6 2017-01-11 000300.SHA 3355.800781 3366.631592 3334.297852 3334.495361 9107143800 8.675045e+10
In [34]:
df[(df.open > 3310) & (df.close < 3350)]
Out[34]:
date instrument open high low close volume amount
0 2017-01-03 000300.SHA 3313.953369 3345.262939 3313.953369 3342.227295 8656789600 8.761277e+10
3 2017-01-06 000300.SHA 3366.852295 3368.870605 3346.891357 3347.666504 8745911400 9.107226e+10
6 2017-01-11 000300.SHA 3355.800781 3366.631592 3334.297852 3334.495361 9107143800 8.675045e+10
7 2017-01-12 000300.SHA 3332.685791 3343.127930 3315.962891 3317.624023 8123913500 7.503189e+10
8 2017-01-13 000300.SHA 3317.962402 3338.174072 3309.374023 3319.912109 8418058900 8.455610e+10
9 2017-01-16 000300.SHA 3314.150635 3322.781982 3264.207031 3319.445557 15777913500 1.532625e+11
11 2017-01-18 000300.SHA 3323.338867 3350.961914 3320.406250 3339.365234 7062322700 7.568068e+10
12 2017-01-19 000300.SHA 3330.952148 3346.495850 3324.683594 3329.289062 6615980800 6.386971e+10
In [35]:
df[df>3360]
Out[35]:
date instrument open high low close volume amount
0 2017-01-03 000300.SHA NaN NaN NaN NaN 8656789600 8.761277e+10
1 2017-01-04 000300.SHA NaN 3369.565918 NaN 3368.311768 9005082700 9.801436e+10
2 2017-01-05 000300.SHA 3368.340088 3373.411865 3361.461914 3367.789307 8833635700 9.118934e+10
3 2017-01-06 000300.SHA 3366.852295 3368.870605 NaN NaN 8745911400 9.107226e+10
4 2017-01-09 000300.SHA NaN 3366.221924 NaN 3363.901367 8370794200 8.791612e+10
5 2017-01-10 000300.SHA 3361.639404 3370.505127 NaN NaN 8381916000 8.253397e+10
6 2017-01-11 000300.SHA NaN 3366.631592 NaN NaN 9107143800 8.675045e+10
7 2017-01-12 000300.SHA NaN NaN NaN NaN 8123913500 7.503189e+10
8 2017-01-13 000300.SHA NaN NaN NaN NaN 8418058900 8.455610e+10
9 2017-01-16 000300.SHA NaN NaN NaN NaN 15777913500 1.532625e+11
10 2017-01-17 000300.SHA NaN NaN NaN NaN 6945332800 7.475451e+10
11 2017-01-18 000300.SHA NaN NaN NaN NaN 7062322700 7.568068e+10
12 2017-01-19 000300.SHA NaN NaN NaN NaN 6615980800 6.386971e+10
13 2017-01-20 000300.SHA NaN NaN NaN NaN 6530046200 7.083133e+10

本文由BigQuant宽客学院推出,版权归BigQuant所有,转载请注明出处。


【宽客学院】Pandas快速入门
社区干货与精选整理(持续更新中...)
(江旭奇) #2

pandas挺好用的


(量化新手) #3

很好的教程