diamonds >> arrange(X.table, ascending=False) >> head(5)

       carat   cut color clarity  depth  table  price     x     y     z
24932   2.01  Fair     F     SI1   58.6   95.0  13387  8.32  8.31  4.87
50773   0.81  Fair     F     SI2   68.8   79.0   2301  5.26  5.20  3.58
51342   0.79  Fair     G     SI1   65.3   76.0   2362  5.52  5.13  3.35
52860   0.50  Fair     E     VS2   79.0   73.0   2579  5.21  5.18  4.09
49375   0.70  Fair     H     VS1   62.0   73.0   2100  5.65  5.54  3.47

(diamonds >> group_by(X.cut) >> arrange(X.price) >>
 head(3) >> ungroup() >> mask(X.carat < 0.23))

    carat      cut color clarity  depth  table  price     x     y     z
8    0.22     Fair     E     VS2   65.1   61.0    337  3.87  3.78  2.49
1    0.21  Premium     E     SI1   59.8   61.0    326  3.89  3.84  2.31
12   0.22  Premium     F     SI1   60.4   61.0    342  3.88  3.84  2.33

rename（）函数 rename（）函数会将新列名覆盖原列名。

diamonds >> rename(CUT=X.cut, COLOR='color') >> head(2)

   carat      CUT COLOR clarity  depth  table  price     x     y     z
0   0.23    Ideal     E     SI2   61.5   55.0    326  3.95  3.98  2.43
1   0.21  Premium     E     SI1   59.8   61.0    326  3.89  3.84  2.31

gather（）函数数据框在“宽”和“长”格式之间转换是数据变换中的常见模式。 gather（）函数将DataFrame中的指定列融合为两个键：variable和value。

diamonds >> gather('variable', 'value', ['price', 'depth','x','y','z']) >> head(5)

   carat      cut color clarity  table variable  value
0   0.23    Ideal     E     SI2   55.0    price  326.0
1   0.21  Premium     E     SI1   61.0    price  326.0
2   0.23     Good     E     VS1   65.0    price  327.0
3   0.29  Premium     I     VS2   58.0    price  334.0
4   0.31     Good     J     SI2   58.0    price  335.0

如果未指定任何列，则整个DataFrame将转换为两个键： variable和value。

diamonds >> gather('variable', 'value') >> head(5)

  variable value
0    carat  0.23
1    carat  0.21
2    carat  0.23
3    carat  0.29
4    carat  0.31

Python从零开始第三章数据处理与分析python中的dplyr（3）目录

目录

第二章（pandas）

Python从零开始第三章数据处理与分析python中的dplyr（1）

Python从零开始第三章数据处理与分析python中的dplyr（2）

Python从零开始第三章数据处理与分析python中的dplyr（3）