Pandas Exercises for Data Analysis (Continuously updated)
时间:2019-11-26
本文章向大家介绍Pandas Exercises for Data Analysis (Continuously updated),主要包括Pandas Exercises for Data Analysis (Continuously updated)使用实例、应用技巧、基本知识点总结和需要注意事项,具有一定的参考价值,需要的朋友可以参考一下。
# 1. How to import pandas and check the version?
import pandas as pd
print(pd.__version__)
print(pd.show_versions(as_json=True))
0.23.4
{'system': {'commit': None, 'python': '3.7.0.final.0', 'python-bits': 64, 'OS': 'Windows', 'OS-release': '10', 'machine': 'AMD64', 'processor': 'Intel64 Family 6 Model 142 Stepping 10, GenuineIntel', 'byteorder': 'little', 'LC_ALL': 'None', 'LANG': 'None', 'LOCALE': 'None.None'}, 'dependencies': {'pandas': '0.23.4', 'pytest': '3.8.0', 'pip': '19.2.1', 'setuptools': '40.2.0', 'Cython': '0.28.5', 'numpy': '1.17.2', 'scipy': '1.1.0', 'pyarrow': None, 'xarray': None, 'IPython': '6.5.0', 'sphinx': '1.7.9', 'patsy': '0.5.0', 'dateutil': '2.7.3', 'pytz': '2018.5', 'blosc': None, 'bottleneck': '1.2.1', 'tables': '3.4.4', 'numexpr': '2.6.8', 'feather': None, 'matplotlib': '2.2.3', 'openpyxl': '2.5.6', 'xlrd': '1.1.0', 'xlwt': '1.3.0', 'xlsxwriter': '1.1.0', 'lxml': '4.2.5', 'bs4': '4.6.3', 'html5lib': '1.0.1', 'sqlalchemy': '1.2.11', 'pymysql': None, 'psycopg2': None, 'jinja2': '2.10', 's3fs': None, 'fastparquet': None, 'pandas_gbq': None, 'pandas_datareader': None}}
None
# 2. How to create a series from a list, numpy array and dict?
import numpy as np
mylist = list('abcedfghijklmnopqrstuvwxyz')
myarr = np.arange(26)
mydict = dict(zip(mylist, myarr))
# ser1 = pd.Series(mylist)
# ser2 = pd.Series(myarr)
ser3 = pd.Series(mydict)
print(ser3.head(3))
a 0
b 1
c 2
dtype: int64
# 3. How to convert the index of a series into a column of a dataframe?
mylist = list('abcedfghijklmnopqrstuvwxyz')
myarr = np.arange(26)
mydict = dict(zip(mylist, myarr))
ser = pd.Series(mydict)
ser.describe()
df = ser.to_frame().reset_index()
print(df.head())
index 0
0 a 0
1 b 1
2 c 2
3 e 3
4 d 4
# 4. How to combine many series to form a dataframe?
import numpy as np
ser1 = pd.Series(list('abcedfghijklmnopqrstuvwxyz'))
ser2 = pd.Series(np.arange(26))
# Solution 1
df = pd.concat([ser1, ser2], axis=1)
# Solution 2
# df = pd.DataFrame({'col1':ser1, 'col2': ser2})
print(df.head())
0 1
0 a 0
1 b 1
2 c 2
3 e 3
4 d 4
# 5. How to assign name to the series’ index?
# Give a name to the series ser calling it ‘alphabets’.
ser = pd.Series(list('abcedfghijklmnopqrstuvwxyz'))
ser.name = 'alphabets'
ser.head()
0 a
1 b
2 c
3 e
4 d
Name: alphabets, dtype: object
# 6. How to get the items of series A not present in series B?
# From ser1 remove items present in ser2.
ser1 = pd.Series([1, 2, 3, 4, 5])
ser2 = pd.Series([4, 5, 6, 7, 8])
# print(ser1.isin(ser2))
ser1[~ser1.isin(ser2)]
0 1
1 2
2 3
dtype: int64
# 7. How to get the items not common to both series A and series B?
# Get all items of ser1 and ser2 not common to both.
ser1 = pd.Series([1, 2, 3, 4, 5])
ser2 = pd.Series([4, 5, 6, 7, 8])
ser_u = pd.Series(np.union1d(ser1, ser2))
ser_i = pd.Series(np.intersect1d(ser1, ser2))
ser_u[~ser_u.isin(ser_i)]
0 1
1 2
2 3
5 6
6 7
7 8
dtype: int64
# 8. How to get the minimum, 25th percentile, median, 75th, and max of a numeric series?
# Compute the minimum, 25th percentile, median, 75th, and maximum of ser.
ser = pd.Series(np.random.normal(10, 5, 25))
np.percentile(ser, q=[0, 25, 50, 75, 100])
array([ 1.6294664 , 6.63669818, 9.88911315, 12.63793738, 19.94314505])
# 9. How to get frequency counts of unique items of a series?
# Calculte the frequency counts of each unique value ser.
ser = pd.Series(np.take(list('abcdefgh'), np.random.randint(8, size=30)))
ser.value_counts()
c 6
h 6
b 4
f 4
g 4
a 3
e 2
d 1
dtype: int64
# 10. How to keep only top 2 most frequent values as it is and replace everything else as ‘Other’?
# From ser, keep the top 2 most frequent items as it is and replace everything else as ‘Other’.
np.random.RandomState(100)
ser = pd.Series(np.random.randint(1, 5, [12]))
print("Top 2 Freq:", ser.value_counts())
ser[~ser.isin(ser.value_counts().index[:2])] = 'Other'
ser
Top 2 Freq: 3 5
2 3
4 2
1 2
dtype: int64
0 3
1 2
2 Other
3 2
4 Other
5 3
6 2
7 Other
8 Other
9 3
10 3
11 3
dtype: object
# 11. How to bin a numeric series to 10 groups of equal size?
# Bin the series ser into 10 equal deciles and replace the values with the bin name.
# Input
ser = pd.Series(np.random.random(20))
print(ser.head())
# Solution
pd.qcut(ser, q=[0, .10, .20, .3, .4, .5, .6, .7, .8, .9, 1],
labels=['1st', '2nd', '3rd', '4th', '5th', '6th', '7th', '8th', '9th', '10th']).head()
0 0.733123
1 0.512086
2 0.325354
3 0.634904
4 0.802665
dtype: float64
0 8th
1 5th
2 3rd
3 7th
4 9th
dtype: category
Categories (10, object): [1st < 2nd < 3rd < 4th ... 7th < 8th < 9th < 10th]
# 12. How to convert a numpy array to a dataframe of given shape?
# Reshape the series ser into a dataframe with 7 rows and 5 columns
ser = pd.Series(np.random.randint(1, 10, 35))
pd.DataFrame(ser.values.reshape(7, 5))
0 | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|
0 | 8 | 7 | 9 | 5 | 5 |
1 | 2 | 4 | 1 | 5 | 9 |
2 | 5 | 1 | 7 | 6 | 3 |
3 | 6 | 2 | 7 | 3 | 5 |
4 | 2 | 6 | 1 | 9 | 5 |
5 | 7 | 8 | 1 | 4 | 5 |
6 | 6 | 2 | 2 | 3 | 2 |
# 13. How to find the positions of numbers that are multiples of 3 from a series?
# Find the positions of numbers that are multiples of 3 from ser.
ser = pd.Series(np.random.randint(1, 10, 7))
print(ser)
np.argwhere(ser % 3 == 0)
0 8
1 9
2 5
3 8
4 6
5 7
6 7
dtype: int32
array([[1],
[4]], dtype=int64)
# 14. How to extract items at given positions from a series
# From ser, extract the items at positions in list pos.
ser = pd.Series(list('abcdefghijklmnopqrstuvwxyz'))
pos = [0, 4, 8, 14, 20]
# ser[pos]
ser.take(pos)
0 a
4 e
8 i
14 o
20 u
dtype: object
# 15. How to stack two series vertically and horizontally ?
# Stack ser1 and ser2 vertically and horizontally (to form a dataframe).
ser1 = pd.Series(range(5))
ser2 = pd.Series(list('abcde'))
# Vertical
ser1.append(ser2)
# Horizontal
df = pd.concat([ser1, ser2], axis=1)
print(df)
0 1
0 0 a
1 1 b
2 2 c
3 3 d
4 4 e
# 16. How to get the positions of items of series A in another series B?
# Get the positions of items of ser2 in ser1 as a list.
ser1 = pd.Series([10, 9, 6, 5, 3, 1, 12, 8, 13])
ser2 = pd.Series([1, 3, 10, 13])
# Solution 1
[np.where(i == ser1)[0].tolist()[0] for i in ser2]
# Solution 2
[pd.Index(ser1).get_loc(i) for i in ser2]
[5, 4, 0, 8]
# 17. How to compute the mean squared error on a truth and predicted series?
# Compute the mean squared error of truth and pred series.
原文地址:https://www.cnblogs.com/ohou/p/11933928.html
- 2017"百度之星"程序设计大赛 - 资格赛【1001 Floyd求最小环 1002 歪解(并查集),1003 完全背包 1004 01背包 1005 打表找规律+卡特兰数】
- 洛谷 2634&&BZOJ 2152: 聪聪可可【点分治学习+超详细注释】
- 【经验总结】Java在ACM算法竞赛编程中易错点
- 【Java学习笔记之六】java三种循环(for,while,do......while)的使用方法及区别
- 类A是公共的,应在名为A.java的文件中声明错误
- 逆天通用水印支持Winform,WPF,Web,WP,Win10。支持位置选择(9个位置 ==》[X])
- 【Java学习笔记之七】java函数的语法规则总结
- BZOJ 3038: 上帝造题的七分钟2【线段树区间开方问题】
- BZOJ 3211: 花神游历各国【线段树区间开方问题】
- WP、Win10开发或者WPF开发时绘制自定义窗体~例如:一个手机
- 【Java学习笔记之八】JavaBean中布尔类型使用注意事项
- BZOJ 1597: [Usaco2008 Mar]土地购买【斜率优化+凸包维护】
- BZOJ 1046: [HAOI2007]上升序列【贪心+二分状态+dp+递归】
- 【Java学习笔记之九】java二维数组及其多维数组的内存应用拓展延伸
- JavaScript 教程
- JavaScript 编辑工具
- JavaScript 与HTML
- JavaScript 与Java
- JavaScript 数据结构
- JavaScript 基本数据类型
- JavaScript 特殊数据类型
- JavaScript 运算符
- JavaScript typeof 运算符
- JavaScript 表达式
- JavaScript 类型转换
- JavaScript 基本语法
- JavaScript 注释
- Javascript 基本处理流程
- Javascript 选择结构
- Javascript if 语句
- Javascript if 语句的嵌套
- Javascript switch 语句
- Javascript 循环结构
- Javascript 循环结构实例
- Javascript 跳转语句
- Javascript 控制语句总结
- Javascript 函数介绍
- Javascript 函数的定义
- Javascript 函数调用
- Javascript 几种特殊的函数
- JavaScript 内置函数简介
- Javascript eval() 函数
- Javascript isFinite() 函数
- Javascript isNaN() 函数
- parseInt() 与 parseFloat()
- escape() 与 unescape()
- Javascript 字符串介绍
- Javascript length属性
- javascript 字符串函数
- Javascript 日期对象简介
- Javascript 日期对象用途
- Date 对象属性和方法
- Javascript 数组是什么
- Javascript 创建数组
- Javascript 数组赋值与取值
- Javascript 数组属性和方法
- 使用Python文件读写,自定义分隔符(custom delimiter)
- keras 自定义loss层+接受输入实例
- Python如何对XML 解析
- 使用pytorch 筛选出一定范围的值
- Python爬虫爬取博客实现可视化过程解析
- 在keras里实现自定义上采样层
- 实例讲解PHP中使用命名空间
- PDO::errorCode讲解
- PDO::errorInfo讲解
- PHP的PDO大对象(LOBs)
- PHP抽象类与接口的区别详解
- PDO::exec讲解
- 使用keras框架cnn+ctc_loss识别不定长字符图片操作
- PHP实现的策略模式示例
- 浅谈pytorch中torch.max和F.softmax函数的维度解释