项目介绍

一直想写一份适合经济学等社科背景、学术科研向的 Python 教程。因为学经济学的多少会对 Stata 有所了解，有一些写~代码~命令的经历，这份教程应该：

简洁好理解，花最少的时间了解 Python 的核心用法；
实用易操作，最好是能够看完上手即用。

在构思了一段时间之后，偶然发现 Ties de Kok 的 Get started with Python for research tutorial项目已经搭建出了我想要的框架。于是打算在这个项目的基础上进行完善，首先将其主要内容“汉化”成中文，之后对用法进行扩充、加入典型用法和案例。

原作者简介：Ties de Kok (Personal Website)为华盛顿大学福斯特商学院的助理教授，他专注于将计算机科学与实证会计研究相结合，研究兴趣是财务会计、资本市场、计算机科学、自然语言处理和经验管理会计。

变量

Python 中基础的数据类型有int，float和str。

a = 5
b = 3.5
c = 'A string'

type(a),type(b),type(c)

(int, float, str)

转换类型

int(3.6),str(5)

(3, '5')

查看类型

type(a),type(b),type(c)

(int, float, str)

打印输出

print("Hello")

Hello

print("Hello"+"World")

HelloWorld

apples = 'apples'
print("I have", 2, apples)

I have 2 apples

`format` 格式化输出

"{} {}".format("Hello", "World")

'Hello World'

print("{:.2f}".format(3.1415926))

3.14

数字	格式	输出	描述
3.1415926	{:.2f}	3.14	保留小数点后两位
3.1415926	{:+.2f}	+3.14	带符号保留小数点后两位
-1	{:+.2f}	-1.00	带符号保留小数点后两位
2.71828	{:.0f}	3	不带小数
5	{:0>2d}	05	数字补零 (填充左边, 宽度为 2)
5	{:x<4d}	5xxx	数字补 x (填充右边, 宽度为 4)
10	{:x<4d}	10xx	数字补 x (填充右边, 宽度为 4)
1000000	{:,}	1,000,000	以逗号分隔的数字格式
0.25	{:.2%}	25.00%	百分比格式
1000000000	{:.2e}	1.00e+09	指数记法
13	{:>10d}	13	右对齐 (默认, 宽度为 10)
13	{:<10d}	13	左对齐 (宽度为 10)
13	{:^10d}	13	中间对齐 (宽度为 10)

f-strings 格式化输出(Python 3.6)

Python 3.6 新增了 f-strings，这个特性叫做字面量格式化字符串，F 字符串是开头有一个 f 的字符串文字，Python 会计算其中的用大括号包起来的表达式，并将计算后的值替换进去。

year, p_version = 2018, '3.6'
f'The year {year} is pretty awesome with F-strings from Python {p_version}'

'The year 2018 is pretty awesome with F-strings from Python 3.6'

算术运算符

以下假设变量：a=10, b=20：

运算符	描述	实例
+	加 - 两个对象相加	a + b 输出结果 30
-	减 - 得到负数或是一个数减去另一个数	a - b 输出结果 -10
*	乘 - 两个数相乘或是返回一个被重复若干次的字符串	a * b 输出结果 200
/	除 - x 除以 y	b / a 输出结果 2
%	取模 - 返回除法的余数	b % a 输出结果 0
**	幂 - 返回 x 的 y 次幂	a**b 为 10 的 20 次方，输出结果 100000000000000000000

字符运算符

用单引号，双引号或三引号定义字符串（用于多行）

hello = 'world'
saying = "hello world"
paragraph = """ This is
a paragraph
"""

数据结构

Python 有四种基本的数据结构：列表（list）、元组（tuple）、字典（dict）和集合（set）。

列表

列表放在中括号（[]）中

pets = ['dogs', 'cat', 'bird']
pets.append('lizard')
pets

['dogs', 'cat', 'bird', 'lizard']

元组

元组括在括号（()）中，元组不支持任意添加或删除元素，但是它们更快并且消耗更少的内存。

pets = ('dogs', 'cat', 'bird')
pets

('dogs', 'cat', 'bird')

字典

字典使用大括号（{}）构建,字典是无序的，但具有键（Key），值（value）对。

person = {'name': 'fred', 'age': 29}
print(person['name'], person['age'])

fred 29

person['money'] = 50
del person['age']
person

{'name': 'fred', 'money': 50}

集合

集就像一个列表，但只能包含唯一值。

pets_1 = set(['dogs', 'cat', 'bird'])
pets_2 = set(['dogs', 'horse', 'zebra', 'zebra'])
pets_2

{'dogs', 'horse', 'zebra'}

集合可以用于完成许多有用的操作：

pets_1.union(pets_2) # 并集

{'bird', 'cat', 'dogs', 'horse', 'zebra'}

pets_1.intersection(pets_2) # 交集

{'dogs'}

pets_1.difference(pets_2) # 补集

{'bird', 'cat'}

组合

基本的数据结构几乎可以处理任何 Python 项目。

combo = ('apple', 'orange')
mix = {'fruit' : [combo, ('banana', 'pear')]}
mix['fruit'][0]

('apple', 'orange')

切片

如果对象是有序的（例如列表或元组），则可以选择索引：

pets = ['dogs', 'cat', 'bird', 'lizzard']

favorite_pet = pets[0]
favorite_pet

'dogs'

reptile = pets[-1]
reptile

'lizzard'

pets[1:3]

['cat', 'bird']

pets[:2]

['dogs', 'cat']

切片也适用于字符串：

fruit = 'banana'
fruit[:2]

'ba'

函数

Python 可以接收输入的参数，通过定义逻辑和操作来处理输入（并可能返回某些内容）。

def add_5(number):
    return number + 5

定义函数的操作不会执行代码，仅在调用函数后才会执行：

add_5(5)

也可以添加具有默认值的参数：

def add(number, add=5):
    return number + add

add(10, add=3)

lambda 函数

pairs = [(1, 'one'), (2, 'two'), (3, 'three'), (4, 'four')]
pairs.sort(key=lambda pair: pair[1])
print(pairs)

[(4, 'four'), (1, 'one'), (3, 'three'), (2, 'two')]

空格（代码块）

Python 使用缩进来区分代码块。注意，子代码块有自己的作用域（local scope），注意下方示例的a：

def example():
    a = 'Layer 1'
    print(a)

    def layer_2():
        a = 'Layer 2'
        print(a)

    layer_2()

example()

Layer 1
Layer 2

条件

grade = 95
if grade == 90:
    print('A')
elif grade < 90:
    print('B')
elif grade >= 80:
    print('C')
else:
    print('D')

循环

for num in range(0, 6, 2):
    print(num)

0
2
4

list_fruit = ['Apple', 'Banana', 'Orange']
for fruit in list_fruit:
    print(fruit)

Apple
Banana
Orange

for num in range(100):
    print(num)
    if num == 2:
        break

0
1
2

也可以使用while循环：

count = 0
while count < 4:
    print(count)
    count += 1

遍历列表中的元组：

tuple_in_list = [(1, 2), (3, 4)]
for a, b in tuple_in_list:
    print(a + b)

3
7

在字典中循环：

dictionary = {'one' : 1, 'two' : 2, 'three' : 3}
for key, value in dictionary.items():
    print(key, value + 10)

one 11
two 12
three 13

推导式

推导式可以使用循环很方便的生成列表和字典。

列表推导式

new_list = [x + 5 for x in range(0,6)]
new_list

[5, 6, 7, 8, 9, 10]

传统方式的等效操作为：

new_list = []
for x in range(0,6):
    new_list.append(x + 5)
new_list

[5, 6, 7, 8, 9, 10]

字典推导式

new_dict = {'num_{}'.format(x) : x + 5 for x in range(0,6)}
new_dict

{'num_0': 5, 'num_1': 6, 'num_2': 7, 'num_3': 8, 'num_4': 9, 'num_5': 10}

传统方式的等效操作为：

new_dict = {}
for x in range(0,6):
    new_dict['num_{}'.format(x)] = x + 5
new_dict

{'num_0': 5, 'num_1': 6, 'num_2': 7, 'num_3': 8, 'num_4': 9, 'num_5': 10}

异常处理

Python 异常报错如下：

num_list = [1, 2, 3]
num_list.remove(4)

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

<ipython-input-223-926b625f5509> in <module>
      1 num_list = [1, 2, 3]
----> 2 num_list.remove(4)


ValueError: list.remove(x): x not in list

可以使用try和except捕获异常：

try:
    num_list.remove(4)
except:
    print('ERROR!')

ERROR!

将错误类型指定为：

try:
    num_list.remove(4)
except ValueError as e:
    print('Error: ', e)
except Exception as e:
    print('Other error: ', e)
finally:
    print('Done')

Error:  list.remove(x): x not in list
Done

导入库

import math
math.sin(1)

0.8414709848078965

import math as math_lib
math_lib.sin(1)

0.8414709848078965

from math import sin
sin(1)

0.8414709848078965

系统操作

import os

获取当前路径

os.chdir(r'D:\PyStaData\Python\Python_for_Research\PythonforResearch')
os.getcwd()

'D:\PyStaData\Python\Python_for_Research\PythonforResearch'

输出当前路径文件（夹）

os.listdir()[:5]

['.ipynb_checkpoints', '0_语法基础.ipynb', 'data', 'images']

结合列表推导式进行文件类型过滤：

[file for file in os.listdir() if file[-5:] == 'ipynb'][:5]

['0_语法基础.ipynb']

更改路径

os.chdir(r'D:\PyStaData\Python\Python_for_Research\PythonforResearch\data')
os.getcwd()

'D:\PyStaData\Python\Python_for_Research\PythonforResearch\data'

r'path'表示原始字符串，原始字符串不将视为特殊字符。

文件输入与输出

打开文件的四种模式：

w -> write only r -> read only w+ -> read and write + completely overwrite file a+ -> read and write + append at the bottom

with open('new_file.txt', 'w') as file:
    file.write('Content of new file. nHi there!')

with open('new_file.txt', 'r') as file:
    file_content = file.read()

file_content

'Content of new file. nHi there!'

print(file_content)

Content of new file.
Hi there!

with open('new_file.txt', 'a+') as file:
    file.write('n' + 'New line')

with open('new_file.txt', 'r') as file:
    print(file.read())

Content of new file.
Hi there!
New line

最好使用with，因为它会自动关闭文件。

PythonforResearch | 0_语法基础