TihuanWords.txt文档格式
注意:同一行的词用单个空格隔开,每行第一个词为同行词的替换词。
年休假 年假 年休
究竟 到底
回家场景 我回来了
代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
|
import jieba def replaceSynonymWords(string1): # 1读取同义词表,并生成一个字典。 combine_dict = {} # synonymWords.txt是同义词表,每行是一系列同义词,用空格分割 for line in open ( "TihuanWords.txt" , "r" , encoding = 'utf-8' ): seperate_word = line.strip().split( " " ) num = len (seperate_word) for i in range ( 1 , num): combine_dict[seperate_word[i]] = seperate_word[ 0 ] print (seperate_word) print (combine_dict) # 2提升某些词的词频,使其能够被jieba识别出来 jieba.suggest_freq( "年休假" , tune = True ) # 3将语句切分成单词 seg_list = jieba.cut(string1, cut_all = False ) f = "/" .join(seg_list).encode( "utf-8" ) f = f.decode( "utf-8" ) print (f) # 4返回同义词替换后的句子 final_sentence = " " for word in f.split( '/' ): if word in combine_dict: word = combine_dict[word] final_sentence + = word else : final_sentence + = word # print final_sentence return final_sentence string1 = '年休到底放几天?' print (replaceSynonymWords(string1)) |
结果
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持服务器之家。
原文链接:https://blog.csdn.net/weixin_44208569/article/details/104048793