concat
函数
直接将值和索引粘合在一起,默认是在axis=0
上面工作,得到的是新的Series
;改成axis=1,变成一个DF型数据
axis
axis=0
:默认是Series
axis=1
:得到DF
数据,缺值用NaN
补充
join
outer
:合并,缺值用nan
inner
:求交集,非交集部分直接删除
keys
:用于层次化索引ignore_index
:不保留连接轴上的索引,产生新的索引
1 | import pandas as pd |
1 | s1 = pd.Series([0,1], index=['a','b']) |
1 | s2 = pd.Series([2,3,4], index=['c','d','e']) |
1 | s3 = pd.Series([5, 6], index=['f', 'g']) |
1 | # 默认是得到S型数据 |
1 | a 0 |
1 | # 转成DF型数据:axis=1 |
0 | 1 | |
---|---|---|
a | 0.0 | NaN |
b | 1.0 | NaN |
c | NaN | 2.0 |
d | NaN | 3.0 |
e | NaN | 4.0 |
1 | pd.concat([s1,s2,s3], axis=1,sort=True) |
0 | 1 | 2 | |
---|---|---|---|
a | 0.0 | NaN | NaN |
b | 1.0 | NaN | NaN |
c | NaN | 2.0 | NaN |
d | NaN | 3.0 | NaN |
e | NaN | 4.0 | NaN |
f | NaN | NaN | 5.0 |
g | NaN | NaN | 6.0 |
1 | s4 = pd.concat([s1, s3]) |
1 | a 0 |
1 | # join='inner':交集;join='outer':并集 |
0 | 1 | |
---|---|---|
a | 0 | 0 |
b | 1 | 1 |
1 | pd.concat([s1,s4],axis=1,join_axes=[['a','c','b','e']]) |
0 | 1 | |
---|---|---|
a | 0.0 | 0.0 |
c | NaN | NaN |
b | 1.0 | 1.0 |
e | NaN | NaN |
1 | # keys参数生成层次化索引 |
1 | one a 0 |
1 | result.unstack() |
a | b | f | g | |
---|---|---|---|---|
one | 0.0 | 1.0 | NaN | NaN |
two | 0.0 | 1.0 | NaN | NaN |
three | NaN | NaN | 5.0 | 6.0 |
1 | # 沿着axis=1合并,keys成为DF的列头 |
one | two | three | |
---|---|---|---|
a | 0.0 | NaN | NaN |
b | 1.0 | NaN | NaN |
c | NaN | 2.0 | NaN |
d | NaN | 3.0 | NaN |
e | NaN | 4.0 | NaN |
f | NaN | NaN | 5.0 |
g | NaN | NaN | 6.0 |
merge函数
可根据⼀个或多个键将不同DataFrame
中的⾏连接起来,它实现的就是数据库的join
操作 ,就是数据库风格的合并
常用参数表格
参数 | 说明 |
---|---|
left | 参与合并的左侧DF |
right | 参与合并的右侧DF |
how | 默认是inner,inner、outer、right、left |
on | 用于连接的列名,默认是相同的列名 |
left_on \right_on | 左侧、右侧DF 中用作连接键的列 |
sort | 根据连接键对合并后的数据进行排序,默认是T |
suffixes | 重复列名,直接指定后缀,用元组的形式(’_left’, ‘_right’) |
left_index、right_index | 将左侧、右侧的行索引index 作为连接键(用于index的合并) |
1 | df1 = pd.DataFrame({'key': ['b', 'b', 'a', 'c', 'a', 'a','c'], |
key | data1 | |
---|---|---|
0 | b | 0 |
1 | b | 1 |
2 | a | 2 |
3 | c | 3 |
4 | a | 4 |
5 | a | 5 |
6 | c | 6 |
1 | df2 = pd.DataFrame({'key': ['a', 'b', 'd'], |
key | data2 | |
---|---|---|
0 | a | 0 |
1 | b | 1 |
2 | d | 2 |
两个DF有相同的列属性怎么处理
- 如果不指定on参数,自动按照重叠的列名进行合并
- 最好指定
key
:pd.merge(df1, df2, on='key')
1 | # 将两个df数据中相同的值进行合并 |
key | data1 | data2 | |
---|---|---|---|
0 | b | 0 | 1 |
1 | b | 1 | 1 |
2 | a | 2 | 0 |
3 | a | 4 | 0 |
4 | a | 5 | 0 |
两个DF没有相同的列属性怎么处理
- 若没有相同的列属性,需要指定
left_on
和right_on
pd.merge(df3,df4,left_on='lkey',right_on='rkey')
1 | df3 = pd.DataFrame({'lkey': ['b', 'b', 'a', 'c', 'a', 'a','c'], |
lkey | data1 | |
---|---|---|
0 | b | 0 |
1 | b | 1 |
2 | a | 2 |
3 | c | 3 |
4 | a | 4 |
5 | a | 5 |
6 | c | 6 |
1 | df4 = pd.DataFrame({'rkey': ['a', 'b', 'd'], |
rkey | data2 | |
---|---|---|
0 | a | 0 |
1 | b | 1 |
2 | d | 2 |
1 | pd.merge(df3,df4,left_on='lkey',right_on='rkey') # 相同的列属性指定新生成的df数据中的新列名 |
lkey | data1 | rkey | data2 | |
---|---|---|---|---|
0 | b | 0 | b | 1 |
1 | b | 1 | b | 1 |
2 | a | 2 | a | 0 |
3 | a | 4 | a | 0 |
4 | a | 5 | a | 0 |
交集和并集
通过参数how
来实现,参数表格
选项 | 说明 |
---|---|
inner | 两个表中公有的键 |
outer | 两个表中所有的键,不存在的值用NaN 补足 |
left | 左表中所有的键 |
right | 右表中所有的键 |
- 交集:
how=inner
,默认取值,内连接 - 并集:
how=outer
,外连接
1 | pd.merge(df1, df2, how="outer") # 外键求并集,默认是inner求交集 |
key | data1 | data2 | |
---|---|---|---|
0 | b | 0.0 | 1.0 |
1 | b | 1.0 | 1.0 |
2 | a | 2.0 | 0.0 |
3 | a | 4.0 | 0.0 |
4 | a | 5.0 | 0.0 |
5 | c | 3.0 | NaN |
6 | c | 6.0 | NaN |
7 | d | NaN | 2.0 |
重复列名处理
1 | left = pd.DataFrame({'key1': ['foo', 'foo', 'bar'], |
1 | right = pd.DataFrame({'key1': ['foo', 'foo', 'bar', 'bar'], |
1 | pd.merge(left, right, on=["key1", "key2"], how="outer") # 使用相同的列名key1和key2 |
key1 | key2 | lval | rval | |
---|---|---|---|---|
0 | foo | one | 1.0 | 4.0 |
1 | foo | one | 1.0 | 5.0 |
2 | foo | two | 2.0 | NaN |
3 | bar | one | 3.0 | 6.0 |
4 | bar | two | NaN | 7.0 |
1 | pd.merge(left, right, on='key1') # 通过key1进行连接,key2重复了,默认是在key2的后面添加_x、_y |
key1 | key2_x | lval | key2_y | rval | |
---|---|---|---|---|---|
0 | foo | one | 1 | one | 4 |
1 | foo | one | 1 | one | 5 |
2 | foo | two | 2 | one | 4 |
3 | foo | two | 2 | one | 5 |
4 | bar | one | 3 | one | 6 |
5 | bar | one | 3 | two | 7 |
1 | pd.merge(left, right, on='key1',suffixes=('_left', '_right')) # 指定后缀连接符 |
key1 | key2_left | lval | key2_right | rval | |
---|---|---|---|---|---|
0 | foo | one | 1 | one | 4 |
1 | foo | one | 1 | one | 5 |
2 | foo | two | 2 | one | 4 |
3 | foo | two | 2 | one | 5 |
4 | bar | one | 3 | one | 6 |
5 | bar | one | 3 | two | 7 |