You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
file = open('data.txt', mode='w')
for i in range(10963):
word, count = items[i]
print("{0:<10}{1:>5}".format(word,count))
new_context = word + " " + str(count) + '\n'
file.write(new_context)
file.close()
正则匹配结果
result = open('正则.txt', mode='w')
#存正则匹配的数组
things = []
#正则匹配:人物说的内容
for i in re.finditer("[说|道]:“(.+)\?”", txt):
message = i.group(1)
things.append(message)
#计数和展示
c = Counter(things)
for k, v in c.most_common(51):
print(k, v)
context = k + " " + str(v) + '\n'
result.write(context)
result.close()
输出 data.txt是词频统计的文本数据,正则是匹配人物说的话并且是问句,结果写入 正则.txt
验证Zipf-Law
About
利用jieba库对中文小说进行词频统计并进行简单的正则匹配,同时验证Zipf-Law(Use the jieba library to perform word frequency statistics on Chinese novels and perform simple regular matching, and verify Zipf-Law)