Python生成器与迭代器深度解析
Python生成器与迭代器深度解析一、迭代器协议迭代器是实现了__iter__()和__next__()方法的对象。1.1 自定义迭代器class Counter:def __init__(self, start, end):self.current startself.end enddef __iter__(self):return selfdef __next__(self):if self.current self.end:raise StopIterationself.current 1return self.current - 1counter Counter(0, 5)for num in counter:print(num) # 输出: 0 1 2 3 4二、生成器基础生成器是一种特殊的迭代器使用yield关键字定义。2.1 生成器函数def fibonacci(n):a, b 0, 1count 0while count n:yield aa, b b, a bcount 1for num in fibonacci(10):print(num, end )# 输出: 0 1 1 2 3 5 8 13 21 342.2 生成器表达式# 列表推导式 - 立即创建整个列表squares_list [x**2 for x in range(1000000)]# 生成器表达式 - 惰性求值squares_gen (x**2 for x in range(1000000))三、生成器的优势3.1 内存效率# 低效占用大量内存def read_large_file_bad(file_path):with open(file_path) as f:return f.readlines()# 高效使用生成器def read_large_file_good(file_path):with open(file_path) as f:for line in f:yield line.strip()3.2 惰性求值def infinite_sequence():num 0while True:yield numnum 1# 可以处理无限序列gen infinite_sequence()print(next(gen)) # 0print(next(gen)) # 1print(next(gen)) # 2四、生成器的高级特性4.1 send()方法def echo_generator():while True:received yieldprint(f收到: {received})gen echo_generator()next(gen) # 启动生成器gen.send(Hello) # 输出: 收到: Hellogen.send(World) # 输出: 收到: World4.2 双向通信def running_average():total 0count 0average Nonewhile True:value yield averagetotal valuecount 1average total / countavg running_average()next(avg) # 启动生成器print(avg.send(10)) # 10.0print(avg.send(20)) # 15.0print(avg.send(30)) # 20.04.3 throw()方法def generator_with_exception():try:while True:value yieldprint(f处理: {value})except ValueError as e:print(f捕获异常: {e})gen generator_with_exception()next(gen)gen.send(10)gen.throw(ValueError, 这是一个错误)4.4 close()方法def closable_generator():try:while True:value yieldprint(f处理: {value})finally:print(生成器关闭执行清理)gen closable_generator()next(gen)gen.send(1)gen.close() # 输出: 生成器关闭执行清理五、yield from语法yield from允许生成器委托给另一个生成器。def generator1():yield 1yield 2def generator2():yield 3yield 4def combined():yield from generator1()yield from generator2()for num in combined():print(num) # 输出: 1 2 3 4六、实战案例数据管道def read_csv(file_path):读取CSV文件with open(file_path) as f:next(f) # 跳过标题行for line in f:yield line.strip().split(,)def filter_by_age(rows, min_age):过滤年龄for row in rows:if int(row[1]) min_age:yield rowdef extract_names(rows):提取姓名for row in rows:yield row[0]# 构建数据管道pipeline extract_names(filter_by_age(read_csv(users.csv),min_age18))for name in pipeline:print(name)七、实战案例分批处理def batch_processor(iterable, batch_size):将可迭代对象分批处理batch []for item in iterable:batch.append(item)if len(batch) batch_size:yield batchbatch []if batch: # 处理剩余项yield batchdata range(25)for batch in batch_processor(data, batch_size10):print(f处理批次: {batch})八、实战案例树遍历class TreeNode:def __init__(self, value, leftNone, rightNone):self.value valueself.left leftself.right rightdef inorder_traversal(node):中序遍历if node:yield from inorder_traversal(node.left)yield node.valueyield from inorder_traversal(node.right)# 构建树root TreeNode(1,TreeNode(2, TreeNode(4), TreeNode(5)),TreeNode(3))for value in inorder_traversal(root):print(value, end ) # 输出: 4 2 5 1 3九、实战案例滑动窗口def sliding_window(iterable, window_size):生成滑动窗口from collections import dequewindow deque(maxlenwindow_size)for item in iterable:window.append(item)if len(window) window_size:yield list(window)data [1, 2, 3, 4, 5, 6, 7, 8]for window in sliding_window(data, 3):print(window)# 输出:# [1, 2, 3]# [2, 3, 4]# [3, 4, 5]# ...十、迭代器工具itertools模块10.1 无限迭代器from itertools import count, cycle, repeat# count: 无限计数for i in count(10, 2):if i 20:breakprint(i) # 10, 12, 14, 16, 18, 20# cycle: 循环迭代counter 0for item in cycle([A, B, C]):if counter 5:breakprint(item)counter 1# repeat: 重复元素for item in repeat(Hello, 3):print(item)10.2 组合迭代器from itertools import chain, zip_longest, product# chain: 连接多个迭代器for item in chain([1, 2], [3, 4], [5, 6]):print(item)# zip_longest: 最长的迭代器for item in zip_longest([1, 2], [a, b, c], fillvalue0):print(item) # (1, a), (2, b), (0, c)# product: 笛卡尔积for item in product([1, 2], [a, b]):print(item) # (1, a), (1, b), (2, a), (2, b)十一、性能对比import sys# 列表推导式list_comp [x**2 for x in range(10000)]print(f列表大小: {sys.getsizeof(list_comp)} 字节)# 生成器表达式gen_exp (x**2 for x in range(10000))print(f生成器大小: {sys.getsizeof(gen_exp)} 字节)十二、最佳实践1. 处理大数据集时优先使用生成器2. 使用生成器表达式替代简单的列表推导式3. 利用yield from简化生成器委托4. 合理使用itertools模块提供的工具5. 注意生成器只能迭代一次的特性十三、总结生成器和迭代器是Python中处理序列数据的强大工具。生成器通过惰性求值提供了内存高效的解决方案特别适合处理大数据集和无限序列。掌握生成器的高级特性如send()、throw()和yield from可以构建更加灵活和强大的数据处理管道。