有时候我反问我自己,怎么不知道在Python 3中用更简单的方式做“这样”的事,当我寻求答案时,随着时间的推移,我当然发现更简洁、有效并且bug更少的代码。总的来说(不仅仅是这篇文章),“那些”事情总共数量是超过我想象的,但这里是第一批不明显的特性,后来我寻求到了更有效的/简单的/可维护的代码。
字典
字典中的keys()和items()
你能在字典的keys和items中做很多有意思的操作,它们类似于集合(set):
1
2
3
4
5
6
7
8
|
aa = {‘mike ': ‘male' , ‘kathy ': ‘female' , ‘steve ': ‘male' , ‘hillary ': ‘female' } bb = {‘mike ': ‘male' , ‘ben ': ‘male' , ‘hillary ': ‘female' } aa.keys() & bb.keys() # {‘mike', ‘hillary'} # these are set-like aa.keys() - bb.keys() # {‘kathy', ‘steve'} # If you want to get the common key-value pairs in the two dictionaries aa.items() & bb.items() # {(‘mike', ‘male'), (‘hillary', ‘female')} |
太简洁啦!
在字典中校验一个key的存在
下面这段代码你写了多少遍了?
1
2
3
4
5
|
dictionary = {} for k, v in ls: if not k in dictionary: dictionary[k] = [] dictionary[k].append(v) |
这段代码其实没有那么糟糕,但是为什么你一直都需要用if语句呢?
1
2
3
4
|
from collections import defaultdict dictionary = defaultdict( list ) # defaults to list for k, v in ls: dictionary[k].append(v) |
这样就更清晰了,没有一个多余而模糊的if语句。
用另一个字典来更新一个字典
1
2
3
4
5
6
7
|
from itertools import chain a = {‘x ': 1, ‘y' : 2 , ‘z': 3 } b = {‘y ': 5, ‘s' : 10 , ‘x ': 3, ‘z' : 6 } # Update a with b c = dict (chain(a.items(), b.items())) c # {‘y': 5, ‘s': 10, ‘x': 3, ‘z': 6} |
这样看起来还不错,但是不够简明。看看我们是否能做得更好:
1
2
|
c = a.copy() c.update(b) |
更清晰而且更有可读性了!
从一个字典获得最大值
如果你想获取一个字典中的最大值,可能会像这样直接:
1
2
3
|
aa = {k: sum ( range (k)) for k in range ( 10 )} aa # {0: 0, 1: 0, 2: 1, 3: 3, 4: 6, 5: 10, 6: 15, 7: 21, 8: 28, 9: 36} max (aa.values()) #36 |
这么做是有效的,但是如果你需要key,那么你就需要在value的基础上再找到key。然而,我们可以用过zip来让展现更扁平化,并返回一个如下这样的key-value形式:
1
2
|
max ( zip (aa.values(), aa.keys())) # (36, 9) => value, key pair |
同样地,如果你想从最大到最小地去遍历一个字典,你可以这么干:
1
2
|
sorted ( zip (aa.values(), aa.keys()), reverse = True ) # [(36, 9), (28, 8), (21, 7), (15, 6), (10, 5), (6, 4), (3, 3), (1, 2), (0, 1), (0, 0)] |
在一个list中打开任意数量的items
我们可以运用*的魔法,获取任意的items放到list中:
1
2
3
4
5
6
7
|
def compute_average_salary(person_salary): person, * salary = person_salary return person, ( sum (salary) / float ( len (salary))) person, average_salary = compute_average_salary([“mike”, 40000 , 50000 , 60000 ]) person # ‘mike' average_salary # 50000.0 |
这不是那么有趣,但是如果我告诉你也可以像下面这样呢:
1
2
3
4
5
6
|
def compute_average_salary(person_salary_age): person, * salary, age = person_salary_age return person, ( sum (salary) / float ( len (salary))), age person, average_salary, age = compute_average_salary([“mike”, 40000 , 50000 , 60000 , 42 ]) age # 42 |
看起来很简洁嘛!
当你想到有一个字符串类型的key和一个list的value的字典,而不是遍历一个字典,然后顺序地处理value,你可以使用一个更扁平的展现(list中套list),像下面这样:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
|
# Instead of doing this for k, v in dictionary.items(): process(v) # we are separating head and the rest, and process the values # as a list similar to the above. head becomes the key value for head, * rest in ls: process(rest) # if not very clear, consider the following example aa = {k: list ( range (k)) for k in range ( 5 )} # range returns an iterator aa # {0: [], 1: [0], 2: [0, 1], 3: [0, 1, 2], 4: [0, 1, 2, 3]} for k, v in aa.items(): sum (v) #0 #0 #1 #3 #6 # Instead aa = [[ii] + list ( range (jj)) for ii, jj in enumerate ( range ( 5 ))] for head, * rest in aa: print ( sum (rest)) #0 #0 #1 #3 #6 |
你可以把list解压成head,*rest,tail等等。
Collections用作计数器
Collections是我在python中最喜欢的库之一,在python中,除了原始的默认的,如果你还需要其他的数据结构,你就应该看看这个。
我日常基本工作的一部分就是计算大量而又不是很重要的词。可能有人会说,你可以把这些词作为一个字典的key,他们分别的值作为value,在我没有接触到collections中的Counter时,我可能会同意你的做法(是的,做这么多介绍就是因为Counter)。
假设你读的python语言的维基百科,转化为一个字符串,放到一个list中(标记好顺序):
1
2
3
|
import re word_list = list ( map ( lambda k: k.lower().strip(), re.split(r '[;,:(.s)]s*' , python_string))) word_list[: 10 ] # [‘python', ‘is', ‘a', ‘widely', ‘used', ‘general-purpose', ‘high-level', ‘programming', ‘language', ‘[17][18][19]'] |
到目前为止看起来都不错,但是如果你想计算这个list中的单词:
1
2
3
4
|
from collections import defaultdict # again, collections! dictionary = defaultdict( int ) for word in word_list: dictionary[word] + = 1 |
这个没有那么糟糕,但是如果你有了Counter,你将会节约下你的时间做更有意义的事情。
1
2
3
4
5
6
7
8
9
|
from collections import Counter counter = Counter(word_list) # Getting the most common 10 words counter.most_common( 10 ) [(‘the ', 164), (‘and' , 161 ), (‘a ', 138), (‘python' , 138 ), (‘of ', 131), (‘is' , 102 ), (‘to ', 91), (‘in' , 88 ), (‘', 56 )] counter.keys()[: 10 ] # just like a dictionary [‘ ', ‘limited' , ‘ all ', ‘code' , ‘managed ', ‘multi-paradigm' , ‘exponentiation ', ‘fromosing' , ‘dynamic'] |
很简洁吧,但是如果我们看看在Counter中包含的可用的方法:
1
2
3
4
5
6
7
8
|
dir (counter) [‘__add__ ', ‘__and__' , ‘__class__ ', ‘__cmp__' , ‘__contains__ ', ‘__delattr__' , ‘__delitem__ ', ‘__dict__' , ‘__doc__ ', ‘__eq__' , ‘__format__ ', ‘__ge__' , ‘__getattribute__ ', ‘__getitem__' , ‘__gt__ ', ‘__hash__' , ‘__init__ ', ‘__iter__' , ‘__le__ ', ‘__len__' , ‘__lt__ ', ‘__missing__' , ‘__module__ ', ‘__ne__' , ‘__new__', ‘__or__ ', ‘__reduce__' , ‘__reduce_ex__ ', ‘__repr__' , ‘__setattr__ ', ‘__setitem__' , ‘__sizeof__', ‘__str__ ', ‘__sub__' , ‘__subclasshook__ ', ‘__weakref__' , ‘clear ', ‘copy' , ‘elements ', ‘fromkeys' , ‘get', ‘has_key ', ‘items' , ‘iteritems ', ‘iterkeys' , ‘itervalues ', ‘keys' , ‘most_common ', ‘pop' , ‘popitem ', ‘setdefault' , ‘subtract ', ‘update' , ‘values ', ‘viewitems' , ‘viewkeys ', ‘viewvalues' ] |
你看到__add__和__sub__方法了吗,是的,Counter支持加减运算。因此,如果你有很多文本想要去计算单词,你不必需要Hadoop,你可以运用Counter(作为map)然后把它们加起来(相当于reduce)。这样你就有构建在Counter上的mapreduce了,你可能以后还会感谢我。
扁平嵌套lists
Collections也有_chain函数,其可被用作扁平嵌套lists
1
2
3
|
from collections import chain ls = [[kk] + list ( range (kk)) for kk in range ( 5 )] flattened_list = list (collections._chain( * ls)) |
同时打开两个文件
如果你在处理一个文件(比如一行一行地),而且要把这些处理好的行写入到另一个文件中,你可能情不自禁地像下面这么去写:
1
2
3
4
|
with open (input_file_path) as inputfile: with open (output_file_path, ‘w') as outputfile: for line in inputfile: outputfile.write(process(line)) |
除此之外,你可以在相同的一行里打开多个文件,就像下面这样:
1
2
3
|
with open (input_file_path) as inputfile, open (output_file_path, ‘w') as outputfile: for line in inputfile: outputfile.write(process(line)) |
这样就更简洁啦!
从一堆数据中找到星期一
如果你有一个数据想去标准化(比如周一之前或是之后),你也许会像下面这样:
1
2
3
4
|
import datetime previous_monday = some_date - datetime.timedelta(days = some_date.weekday()) # Similarly, you could map to next monday as well next_monday = some_date + date_time.timedelta(days = - some_date.weekday(), weeks = 1 ) |
这就是实现方式。
处理HTML
如果你出于兴趣或是利益要爬一个站点,你可能会一直面临着html标签。为了去解析各种各样的html标签,你可以运用html.parer:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
|
from html.parser import HTMLParser class HTMLStrip(HTMLParser): def __init__( self ): self .reset() self .ls = [] def handle_data( self , d): self .ls.append(d) def get_data( self ): return ‘'.join( self .ls) @staticmethod def strip(snippet): html_strip = HTMLStrip() html_strip.feed(snippet) clean_text = html_strip.get_data() return clean_text snippet = HTMLStrip.strip(html_snippet) |
如果你仅仅想避开html:
1
2
3
4
5
|
escaped_snippet = html.escape(html_snippet) # Back to html snippets(this is new in Python 3.4) html_snippet = html.unescape(escaped_snippet) # and so forth ... |