0. 知识背景 IO- Input/Output: 通常涉及数据交换的地方都需要IO接口,例如: 磁盘、网络等 基本概念:input, output, stream 存在问题:输入和输出速度不匹配 解决方法:同步、异步(回调: 好了叫我, 异步: 好了没…好了没)
1. 文件读写 在磁盘上读写文件的功能都是由操作系统提供的,不过不允许程序直接操作磁盘。因此,读写文件其实是程序请求操作系统打开一个文件对象(文件描述符)来实现读写。
1.1. 读文件 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 This is 1st line This is 2nd line This is 3rd line >>> f = open ('test' , 'r' ) Traceback (most recent call last): File "<input>" , line 1 , in <module> FileNotFoundError: [Errno 2 ] No such file or directory: 'test' >>> f = open ('rwtest' , 'r' ) >>> f.read() '# This is for read and write test\nThis is 1st line\nThis is 2nd line\nThis is 3rd line' >>> f.close() (1 ). read() -小文件一次性读取 >>> with open ('rwtest' , 'r' ) as f:... content = f.read()... print (type (content))... print (content)... <class 'str '> # This is for read and write test This is 1st line This is 2nd line This is 3rd line (2 ). read (size ) -因内存考虑,可以反复的按照bytes 读取 >>> with open ('rwtest' , 'r' ) as f : ... content = f.read(8 ) ... print (type (content))... print (content)... <class 'str '> # This i (2 ). readline () - 文件逐行读入,并将'/n '也读入了,因此最好splitlines ()掉换行符 >>> with open ('rwtest' , 'r' ) as f : ... line = f.readline()... print (type (line))... print (line)... while line:... print (line)... line = f.readline()... <class 'str '> # This is for read and write test # This is for read and write test This is 1st line This is 2nd line This is 3rd line (3 ). readlines () - 读取整个文件所有行,存入list ,每行为一个元素 >>> with open ('rwtest' , 'r' ) as f : ... lines = f.readlines()... print (type (lines))... print (lines)... <class 'list '> ['# This is for read and write test \n ', 'This is 1st line \n ', 'This is 2nd line \n ', 'This is 3rd line ']
1.2. 写文件 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 >>> with open ('test' ,'w' ) as f: ... f.write('Hello' )... >>> with open ('test' ,'r' ) as f:... print (f.read())... Hello >>> with open ('rwtest' ,'a' ) as f: ... f.write('Hello' )... >>> with open ('rwtest' ,'r' ) as f:... print (f.read())... This is 1st line This is 2nd line This is 3rd lineHello
1.3. 二进制文件
图片、视频等都为二进制文件
1 2 3 4 >>> with open ('WechatIMG191.png' , 'rb' ) as f:... print (f.read())... b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x01,......'
1.4. 字符编码
读取非UTF-8的文本文件,需要给open()函数传入’encoding=’ 参数
2. StringIO 和 BytesIO - python特性: 类文件对象(file-like object) 如果短时间的重复利用,也不希望持久化且对速度的要求较高,则可以使用内存级别的IO。
2.1. StringIO 在内存中有一个标志位的概念,向里面写入,标志位会后移到下一个空白处;然而读取数据的时候也是从标志位开始读的,因此需要手动移动标志位。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 >>> from io import StringIOs = StringIO() >>> type (s)<class '_io .StringIO '> >>> s .write ('Hello\nWorld\n!' ) 13 >>> s .seek (0 ) #操作标志位移动 0 >>> s .read () 'Hello \nWorld \n !' >>> s .getvalue () #直接获取全部值 'Hello \nWorld \n !' >>> f = StringIO ('Hello\nWorld\n!' ) >>> while True : ... s = f.readline() ... if s == "" :... break ... print (s.strip())... Hello World !
2.2. BytesIO StringIO用于字符串的存储,对于图像、视频等Bytes类型的内容就需要用到BytesIO对象。
1 2 3 4 5 6 7 >>> from io import BytesIO>>> f = BytesIO()>>> f.write("中国" .encode('utf-8' ))6 >>> f.getvalue()b'\xe4\xb8\xad\xe5\x9b\xbd' >>> f.close()
3. 操作文件和目录 python的os模块可以直接调用操作系统提供的接口函数
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 >>> import os>>> os.name 'posix' >>> os.uname() posix.uname_result(sysname='Darwin' , nodename='AarondeMacBookPro.local' ,....太多了省略) >>> os.environ environ({'PATH' : '/Users/...' , ....,'HOME' : '...' }) >>> os.environ.get('PATH' ) '/Users/...' >>> os.path.abspath('.' ) '/Users/...' >>> os.path.join('/user/user1' ,'testdir' ) '/user/user1/testdir' >>> os.path.split('/user/user1/testdir' ) ('/user/user1' , 'testdir' ) >>> os.path.splitext('/user/user1/test.txt' ) ('/user/user1/test' , '.txt' ) >>> os.listdir() ['rwtest' , 'WechatIMG191.png' , '.DS_Store' , 'test' , 'test4.py' , 'Bitcoin.py' , 'test1.py' , 'testpackage' , 'test2.py' , 'test3.py' , '.idea' ] >>> [x for x in os.listdir('.' ) if os.path.isfile(x) and os.path.splitext(x)[1 ]=='.py' ] ['test4.py' , 'Bitcoin.py' , 'test1.py' , 'test2.py' , 'test3.py' ]
4. 序列化 序列化(pickling) : 把变量从内存中变成可存储或传输的过程 反序列化(unpickling) : 把变量从序列化对象重新读到内存中
4.1. pickle - 只能用于python 1 2 3 4 5 6 7 8 9 10 11 12 >>> import pickle>>> d = dict (name = 'Tom' , age = 18 )>>> pickle.dumps(d) b'\x80\x03}q\x00(X\x04\x00\x00\x00nameq\x01X\x03\x00\x00\x00Tomq\x02X\x03\x00\x00\x00ageq\x03K\x12u.' >>> f = open ('dump.txt' , 'wb' ) >>> pickle.dump(d, f) >>> f.close()>>> f = open ('dump.txt' , 'rb' ) >>> d = pickle.load(f) >>> f.close()>>> d {'name' : 'Tom' , 'age' : 18 }
4.2. JSON - 一种通用型序列化标准格式 1 2 3 4 5 6 7 >>> import json>>> d = dict (name = 'Tom' , age = 18 )>>> json.dumps(d) '{"name": "Tom", "age": 18}' >>> json_str = json.dumps(d)>>> json.loads(json_str) {'name' : 'Tom' , 'age' : 18 }
一般我们习惯用类(class)来定义对象:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 import jsonclass People (object ): def __init__ (self, name, age ): self.name = name self.age = age def peopel2dict (people ): return { 'name' : people.name, 'age' : people.age } def dict2people (p ): return People(p['name' ], p['age' ]) p = People('Tom' , 18 ) print (p)print (json.dumps(p, default=peopel2dict)) print (p.__dict__)print (json.dumps(p, default=lambda obj: obj.__dict__)) json_str = json.dumps(p, default=peopel2dict) print (json.loads(json_str, object_hook=dict2people))<__main__.People object at 0x1083f4550 > {"name" : "Tom" , "age" : 18 } {'name' : 'Tom' , 'age' : 18 } {"name" : "Tom" , "age" : 18 } <__main__.People object at 0x10840c3d0 >