本文重点知识:
- 创建带有日期的索引:
dates = pd.date_range('20190924', periods=6)
- head()、tail()
- 按轴排序:索引排序
sort_index
,默认是ascending=True
升序- axis=0:行索引,可以用
index
- axis=1:列索引,可以用
columns
- axis=0:行索引,可以用
- 按值排序:
df.sort_values(by='columns')
,默认升序
创建数据
1 | import numpy as np |
1 | s = pd.Series([1, 3, 5, np.nan, 6, 89]) |
1 | 0 1.0 |
1 | dates = pd.date_range('20190924', periods=6) |
1 | DatetimeIndex(['2019-09-24', '2019-09-25', '2019-09-26', '2019-09-27', |
1 | df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list("ABCD")) |
A | B | C | D | |
---|---|---|---|---|
2019-09-24 | 0.500005 | 0.166578 | -0.758513 | -0.679173 |
2019-09-25 | 0.090209 | 0.117906 | -0.402183 | 2.261182 |
2019-09-26 | -0.809052 | -0.173144 | 0.324912 | -1.885101 |
2019-09-27 | 0.310374 | -1.502089 | 1.524401 | 0.269953 |
2019-09-28 | -0.846488 | -0.235878 | 1.398896 | -0.229573 |
2019-09-29 | 0.975853 | -0.998395 | -0.514480 | -0.882704 |
1 | # 同时创建多个不同的列 |
A | B | C | D | E | F | |
---|---|---|---|---|---|---|
0 | 1.0 | 2013-01-02 | 1.0 | 3 | test | foo |
1 | 1.0 | 2013-01-02 | 1.0 | 3 | train | foo |
2 | 1.0 | 2013-01-02 | 1.0 | 3 | test | foo |
3 | 1.0 | 2013-01-02 | 1.0 | 3 | train | foo |
1 | df2.dtypes |
1 | A float64 |
查看数据
查看数据的相关信息
- 头、尾几行数据
- index、columns
- describe ,T
1 | # 前几行数据,默认是5行 |
A | B | C | D | |
---|---|---|---|---|
2019-09-24 | 0.500005 | 0.166578 | -0.758513 | -0.679173 |
2019-09-25 | 0.090209 | 0.117906 | -0.402183 | 2.261182 |
2019-09-26 | -0.809052 | -0.173144 | 0.324912 | -1.885101 |
1 | # 查看末几行,默认是5行 |
A | B | C | D | |
---|---|---|---|---|
2019-09-28 | -0.846488 | -0.235878 | 1.398896 | -0.229573 |
2019-09-29 | 0.975853 | -0.998395 | -0.514480 | -0.882704 |
1 | # 查看数据的行索引 |
1 | DatetimeIndex(['2019-09-24', '2019-09-25', '2019-09-26', '2019-09-27', |
1 | # 查看数据的列属性 |
1 | Index(['A', 'B', 'C', 'D'], dtype='object') |
1 | df.to_numpy() |
1 | array([[ 0.50000505, 0.16657823, -0.75851253, -0.67917279], |
1 | df.describe() |
A | B | C | D | |
---|---|---|---|---|
count | 6.000000 | 6.000000 | 6.000000 | 6.000000 |
mean | 0.036817 | -0.437504 | 0.262172 | -0.190903 |
std | 0.730717 | 0.668113 | 0.997562 | 1.400993 |
min | -0.846488 | -1.502089 | -0.758513 | -1.885101 |
25% | -0.584237 | -0.807766 | -0.486406 | -0.831821 |
50% | 0.200292 | -0.204511 | -0.038636 | -0.454373 |
75% | 0.452597 | 0.045144 | 1.130400 | 0.145071 |
max | 0.975853 | 0.166578 | 1.524401 | 2.261182 |
1 | df.T |
2019-09-24 00:00:00 | 2019-09-25 00:00:00 | 2019-09-26 00:00:00 | 2019-09-27 00:00:00 | 2019-09-28 00:00:00 | 2019-09-29 00:00:00 | |
---|---|---|---|---|---|---|
A | 0.500005 | 0.090209 | -0.809052 | 0.310374 | -0.846488 | 0.975853 |
B | 0.166578 | 0.117906 | -0.173144 | -1.502089 | -0.235878 | -0.998395 |
C | -0.758513 | -0.402183 | 0.324912 | 1.524401 | 1.398896 | -0.514480 |
D | -0.679173 | 2.261182 | -1.885101 | 0.269953 | -0.229573 | -0.882704 |
1 | # sort_index |
A | B | C | D | |
---|---|---|---|---|
2019-09-24 | 0.500005 | 0.166578 | -0.758513 | -0.679173 |
2019-09-25 | 0.090209 | 0.117906 | -0.402183 | 2.261182 |
2019-09-26 | -0.809052 | -0.173144 | 0.324912 | -1.885101 |
2019-09-27 | 0.310374 | -1.502089 | 1.524401 | 0.269953 |
2019-09-28 | -0.846488 | -0.235878 | 1.398896 | -0.229573 |
2019-09-29 | 0.975853 | -0.998395 | -0.514480 | -0.882704 |
1 | # 按照行索引进行降序 |
A | B | C | D | |
---|---|---|---|---|
2019-09-29 | 0.975853 | -0.998395 | -0.514480 | -0.882704 |
2019-09-28 | -0.846488 | -0.235878 | 1.398896 | -0.229573 |
2019-09-27 | 0.310374 | -1.502089 | 1.524401 | 0.269953 |
2019-09-26 | -0.809052 | -0.173144 | 0.324912 | -1.885101 |
2019-09-25 | 0.090209 | 0.117906 | -0.402183 | 2.261182 |
2019-09-24 | 0.500005 | 0.166578 | -0.758513 | -0.679173 |
1 | # 按照某个属性进行排序 |
A | B | C | D | |
---|---|---|---|---|
2019-09-27 | 0.310374 | -1.502089 | 1.524401 | 0.269953 |
2019-09-29 | 0.975853 | -0.998395 | -0.514480 | -0.882704 |
2019-09-28 | -0.846488 | -0.235878 | 1.398896 | -0.229573 |
2019-09-26 | -0.809052 | -0.173144 | 0.324912 | -1.885101 |
2019-09-25 | 0.090209 | 0.117906 | -0.402183 | 2.261182 |
2019-09-24 | 0.500005 | 0.166578 | -0.758513 | -0.679173 |
1 | df.sort_values(by=["B","C"]) |
A | B | C | D | |
---|---|---|---|---|
2019-09-27 | 0.310374 | -1.502089 | 1.524401 | 0.269953 |
2019-09-29 | 0.975853 | -0.998395 | -0.514480 | -0.882704 |
2019-09-28 | -0.846488 | -0.235878 | 1.398896 | -0.229573 |
2019-09-26 | -0.809052 | -0.173144 | 0.324912 | -1.885101 |
2019-09-25 | 0.090209 | 0.117906 | -0.402183 | 2.261182 |
2019-09-24 | 0.500005 | 0.166578 | -0.758513 | -0.679173 |
选择数据
查看指定的行列数据
1 | # 指定列属性查看数据 |
B | C | |
---|---|---|
2019-09-24 | 0.166578 | -0.758513 |
2019-09-25 | 0.117906 | -0.402183 |
2019-09-26 | -0.173144 | 0.324912 |
2019-09-27 | -1.502089 | 1.524401 |
2019-09-28 | -0.235878 | 1.398896 |
2019-09-29 | -0.998395 | -0.514480 |
1 | # 指定行标签查看指定的行数据 |
A | B | C | D | |
---|---|---|---|---|
2019-09-25 | 0.090209 | 0.117906 | -0.402183 | 2.261182 |
2019-09-26 | -0.809052 | -0.173144 | 0.324912 | -1.885101 |
1 | df["20190924":"20190927"] |
A | B | C | D | |
---|---|---|---|---|
2019-09-24 | 0.500005 | 0.166578 | -0.758513 | -0.679173 |
2019-09-25 | 0.090209 | 0.117906 | -0.402183 | 2.261182 |
2019-09-26 | -0.809052 | -0.173144 | 0.324912 | -1.885101 |
2019-09-27 | 0.310374 | -1.502089 | 1.524401 | 0.269953 |
loc
根据标签(不是自带的数字索引)查看数据
1 | df.loc[dates[0]] |
1 | A 0.500005 |
1 | dates[0] |
1 | Timestamp('2019-09-24 00:00:00', freq='D') |
1 | # 选择行和列 |
A | B | |
---|---|---|
2019-09-24 | 0.500005 | 0.166578 |
2019-09-25 | 0.090209 | 0.117906 |
2019-09-26 | -0.809052 | -0.173144 |
2019-09-27 | 0.310374 | -1.502089 |
2019-09-28 | -0.846488 | -0.235878 |
2019-09-29 | 0.975853 | -0.998395 |
1 | # 索引通过标签来实现 |
A | B | |
---|---|---|
2019-09-24 | 0.500005 | 0.166578 |
2019-09-25 | 0.090209 | 0.117906 |
2019-09-26 | -0.809052 | -0.173144 |
2019-09-27 | 0.310374 | -1.502089 |
1 | # 指定的行或者列可以是切片形式 |
A | B | |
---|---|---|
2019-09-24 | 0.500005 | 0.166578 |
2019-09-25 | 0.090209 | 0.117906 |
2019-09-26 | -0.809052 | -0.173144 |
2019-09-27 | 0.310374 | -1.502089 |
iloc数字索引
记忆放法:iloc
记为intloc
,int为整型,表示通过数字来进行索引
1 | df |
A | B | C | D | |
---|---|---|---|---|
2019-09-24 | 0.500005 | 0.166578 | -0.758513 | -0.679173 |
2019-09-25 | 0.090209 | 0.117906 | -0.402183 | 2.261182 |
2019-09-26 | -0.809052 | -0.173144 | 0.324912 | -1.885101 |
2019-09-27 | 0.310374 | -1.502089 | 1.524401 | 0.269953 |
2019-09-28 | -0.846488 | -0.235878 | 1.398896 | -0.229573 |
2019-09-29 | 0.975853 | -0.998395 | -0.514480 | -0.882704 |
1 | df.iloc[1:3] |
A | B | C | D | |
---|---|---|---|---|
2019-09-25 | 0.090209 | 0.117906 | -0.402183 | 2.261182 |
2019-09-26 | -0.809052 | -0.173144 | 0.324912 | -1.885101 |
1 | df.iloc[1:3, 0:2] # 切片形式,连续性 |
A | B | |
---|---|---|
2019-09-25 | 0.090209 | 0.117906 |
2019-09-26 | -0.809052 | -0.173144 |
1 | df.iloc[[1, 2, 4], [0, 2]] # 行索引是离散的值 |
A | C | |
---|---|---|
2019-09-25 | 0.090209 | -0.402183 |
2019-09-26 | -0.809052 | 0.324912 |
2019-09-28 | -0.846488 | 1.398896 |
1 | df.iloc[:, 1:3] |
B | C | |
---|---|---|
2019-09-24 | 0.166578 | -0.758513 |
2019-09-25 | 0.117906 | -0.402183 |
2019-09-26 | -0.173144 | 0.324912 |
2019-09-27 | -1.502089 | 1.524401 |
2019-09-28 | -0.235878 | 1.398896 |
2019-09-29 | -0.998395 | -0.514480 |
获取具体位置的元素
1 | df.iloc[1,2] |
1 | -0.4021832300071616 |
1 | df.iat[1,2] # 等同上面 |
1 | -0.4021832300071616 |
布尔索引
1 | df |
A | B | C | D | |
---|---|---|---|---|
2019-09-24 | 0.500005 | 0.166578 | -0.758513 | -0.679173 |
2019-09-25 | 0.090209 | 0.117906 | -0.402183 | 2.261182 |
2019-09-26 | -0.809052 | -0.173144 | 0.324912 | -1.885101 |
2019-09-27 | 0.310374 | -1.502089 | 1.524401 | 0.269953 |
2019-09-28 | -0.846488 | -0.235878 | 1.398896 | -0.229573 |
2019-09-29 | 0.975853 | -0.998395 | -0.514480 | -0.882704 |
1 | df[df.A > 0] # 将属性A中大于0的行全部选择出出来 |
A | B | C | D | |
---|---|---|---|---|
2019-09-24 | 0.500005 | 0.166578 | -0.758513 | -0.679173 |
2019-09-25 | 0.090209 | 0.117906 | -0.402183 | 2.261182 |
2019-09-27 | 0.310374 | -1.502089 | 1.524401 | 0.269953 |
2019-09-29 | 0.975853 | -0.998395 | -0.514480 | -0.882704 |
1 | df[df > 0] |
A | B | C | D | |
---|---|---|---|---|
2019-09-24 | 0.500005 | 0.166578 | NaN | NaN |
2019-09-25 | 0.090209 | 0.117906 | NaN | 2.261182 |
2019-09-26 | NaN | NaN | 0.324912 | NaN |
2019-09-27 | 0.310374 | NaN | 1.524401 | 0.269953 |
2019-09-28 | NaN | NaN | 1.398896 | NaN |
2019-09-29 | 0.975853 | NaN | NaN | NaN |