pandas之concat and merge

时间:2022-05-30
本文章向大家介绍pandas之concat and merge,主要内容包括其使用实例、应用技巧、基本知识点总结和需要注意事项,具有一定的参考价值,需要的朋友可以参考一下。

pandas 表横竖合concat,参考官方链接

竖合

# Merge, join, and concatenate
import pandas as pd
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3', 'A4'],
                    'B': ['B0', 'B1', 'B2', 'B3', 'B4'],
                    'C': ['C0', 'C1', 'C2', 'C3', 'C4'],
                    'D': ['D0', 'D1', 'D2', 'D3', 'D4'],
                    'E': ['E0', 'E1', 'E2', 'E3', 'E4']
                   },
                           index=[0, 1, 2, 3,4])
df2 = pd.DataFrame({'A': ['A4', 'A5', 'A6', 'A7'],
                    'B': ['B4', 'B5', 'B6', 'B7'],
                    'F': ['F4', 'F5', 'F6', 'F7'],
                    'D': ['D4', 'D5', 'D6', 'D7'],
                   },
                   index=[4, 5, 6, 7])

df3 = pd.DataFrame({'A': ['A8', 'A9', 'A10', 'A11'],
                            'B': ['B8', 'B9', 'B10', 'B11'],
                            'C': ['C8', 'C9', 'C10', 'C11'],
                            'D': ['D8', 'D9', 'D10', 'D11']})

frames = [df1, df2, df3]
# 可以将三个表数据尾部追加到一起,但是每个表都有自己的index,df3是默认的1,2,3
# 合并之后index
'''concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False,
   keys=None, levels=None, names=None, verify_integrity=False, copy=True)
'''
result = pd.concat(frames)
print result

保留了各自的index以及各自数据,缺失的用 NaN填充,concat函数默认参数

join='inner'相当于求交集,ignore_index=True,忽略各自的index,合并后数据集新的index

result = pd.concat(frames, join='inner', ignore_index=True)
print result

可以用下面的形式修改index

# 指定'A'列数据为index
result.index = result['A']
# 使用reser_inex放弃之前的index,用0,1,2取代
rl = result.reset_index(drop=True)

横合

result = pd.concat([df1, df2], axis=1)
print result

使用join = inner参数

result = pd.concat([df1, df2], axis=1, join='inner')
print result

如果我们只想用df1中数据,等于以df1数据为准

result = pd.concat([df1, df4], axis=1, join_axes=[df1.index])

类似于数据库操作merge

函数原型

pd.merge(left, right, how='inner', on=None, left_on=None, right_on=None,
         left_index=False, right_index=False, sort=True,
         suffixes=('_x', '_y'), copy=True, indicator=False,
         validate=None)

对比数据库操作

Merge method

SQL Join Name

Description

left

LEFT OUTER JOIN

Use keys from left frame only

right

RIGHT OUTER JOIN

Use keys from right frame only

outer

FULL OUTER JOIN

Use union of keys from both frames

inner

INNER JOIN

Use intersection of keys from both frames

left = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3','K4'],
                     'A': ['A0', 'A1', 'A2', 'A3', 'A4'],
                     'B': ['B0', 'B1', 'B2', 'B3', 'B4']})

right = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3', 'K5'],
                      'C': ['C0', 'C1', 'C2', 'C3', 'C5'],
                      'D': ['D0', 'D1', 'D2', 'D3', 'D5']}) 

left

result = pd.merge(left, right,how='left', on='key')
print result
result = pd.merge(left, right,how='right', on='key')

inner

result = pd.merge(left, right,how='inner', on='key')
print result

outer

result = pd.merge(left, right,how='outer', on='key')
print result

复杂一点的例子

left = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],
                     'key2': ['K0', 'K1', 'K0', 'K1'],
                     'A': ['A0', 'A1', 'A2', 'A3'],
                     'B': ['B0', 'B1', 'B2', 'B3']})

right = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],       
                      'key2': ['K0', 'K0', 'K0', 'K0'],
                      'C': ['C0', 'C1', 'C2', 'C3'],
                      'D': ['D0', 'D1', 'D2', 'D3']})

left

result = pd.merge(left, right, how = 'left', on=['key1', 'key2'])
print result
result = pd.merge(left, right, how = 'right', on=['key1', 'key2'])
print result

inner

result = pd.merge(left, right, how = 'inner', on=['key1', 'key2'])
print result

outer

result = pd.merge(left, right, how = 'outer', on=['key1', 'key2'])
print result

通过left_index,right_index代替on

left = pd.DataFrame({'A': ['A0', 'A1', 'A2'],
                     'B': ['B0', 'B1', 'B2']},
                    index=['K0', 'K1', 'K2'])

right = pd.DataFrame({'C': ['C0', 'C2', 'C3'],
                      'D': ['D0', 'D2', 'D3']},
                      index=['K0', 'K2', 'K3'])
result = pd.merge(left, right, left_index=True, right_index=True, how='outer')
print result

下面两个表,以left表‘key’与right表index为标准merge

left = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
                     'B': ['B0', 'B1', 'B2', 'B3'],
                     'key': ['K0', 'K1', 'K0', 'K1']})

right = pd.DataFrame({'C': ['C0', 'C1'],
                      'D': ['D0', 'D1']},
                     index=['K0', 'K1'])
result = pd.merge(left, right, left_on='key', right_index=True,how='left', sort=False);
print result