如果你想统计一个单词在给定的序列中一共出现了多少次,诸如此类的操作就可以用到Counter。来看看如何统计一个list中出现的item次数:

1
2
3
4
5
from collections import Counter
 
li = ["Dog", "Cat", "Mouse", 42, "Dog", 42, "Cat", "Dog"]
a = Counter(li)
print a # Counter({'Dog': 3, 42: 2, 'Cat': 2, 'Mouse': 1})

若要统计不同单词的数目:

1
2
3
4
5
6
7
from collections import Counter
 
li = ["Dog", "Cat", "Mouse", 42, "Dog", 42, "Cat", "Dog"]
a = Counter(li)
print a # Counter({'Dog': 3, 42: 2, 'Cat': 2, 'Mouse': 1})
 
print len(set(li)) # 4

对list结果进行分组:

1
2
3
4
5
6
7
8
9
10
11
12
from collections import Counter
 
li = ["Dog", "Cat", "Mouse","Dog","Cat", "Dog"]
a = Counter(li)
 
print a # Counter({'Dog': 3, 'Cat': 2, 'Mouse': 1})
 
print "{0} : {1}".format(a.values(),a.keys())  # [1, 3, 2] : ['Mouse', 'Dog', 'Cat']
 
print(a.most_common(3)) # [('Dog', 3), ('Cat', 2), ('Mouse', 1)]

#most_common() to produce a sequence of the n most frequently encountered input values and their respective counts.产生前n个频率最高的词以及其出现次数

下面这段代码打印一段文字中所有词汇出现的次数:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
import re
from collections import Counter
 
string = """   Lorem ipsum dolor sit amet, consectetur
    adipiscing elit. Nunc ut elit id mi ultricies
    adipiscing. Nulla facilisi. Praesent pulvinar,
    sapien vel feugiat vestibulum, nulla dui pretium orci,
    non ultricies elit lacus quis ante. Lorem ipsum dolor
    sit amet, consectetur adipiscing elit. Aliquam
    pretium ullamcorper urna quis iaculis. Etiam ac massa
    sed turpis tempor luctus. Curabitur sed nibh eu elit
    mollis congue. Praesent ipsum diam, consectetur vitae
    ornare a, aliquam a nunc. In id magna pellentesque
    tellus posuere adipiscing. Sed non mi metus, at lacinia
    augue. Sed magna nisi, ornare in mollis in, mollis
    sed nunc. Etiam at justo in leo congue mollis.
    Nullam in neque eget metus hendrerit scelerisque
    eu non enim. Ut malesuada lacus eu nulla bibendum
    id euismod urna sodales.  """
 
words = re.findall(r'\w+', string) #This finds words in the document
 
lower_words = [word.lower() for word in words] #lower all the words
 
word_counts = Counter(lower_words) #counts the number each time a word appears
print word_counts
 
# Counter({'elit': 5, 'sed': 5, 'in': 5, 'adipiscing': 4, 'mollis': 4, 'eu': 3, 
# 'id': 3, 'nunc': 3, 'consectetur': 3, 'non': 3, 'ipsum': 3, 'nulla': 3, 'pretium':
# 2, 'lacus': 2, 'ornare': 2, 'at': 2, 'praesent': 2, 'quis': 2, 'sit': 2, 'congue': 2, 'amet': 2, 
# 'etiam': 2, 'urna': 2, 'a': 2, 'magna': 2, 'lorem': 2, 'aliquam': 2, 'ut': 2, 'ultricies': 2, 'mi': 2, 
# 'dolor': 2, 'metus': 2, 'ac': 1, 'bibendum': 1, 'posuere': 1, 'enim': 1, 'ante': 1, 'sodales': 1, 'tellus': 1,
# 'vitae': 1, 'dui': 1, 'diam': 1, 'pellentesque': 1, 'massa': 1, 'vel': 1, 'nullam': 1, 'feugiat': 1, 'luctus': 1, 
# 'pulvinar': 1, 'iaculis': 1, 'hendrerit': 1, 'orci': 1, 'turpis': 1, 'nibh': 1, 'scelerisque': 1, 'ullamcorper': 1,
# 'eget': 1, 'neque': 1, 'euismod': 1, 'curabitur': 1, 'leo': 1, 'sapien': 1, 'facilisi': 1, 'vestibulum': 1, 'nisi': 1,