用词云的方式展示十九大报告

Posted Oct 19, 2017 2017-10-19T10:12:50+08:00 by Calvin Yan

Updated Sep 15, 2020 2020-09-15T14:31:34+08:00

词云图

源码

import jieba
from jieba.analyse import extract_tags
from collections import Counter
from wordcloud import WordCloud
import matplotlib.pyplot as plt

def stopwordslist(filepath):
    stopwords = [line.strip() for line in open(filepath, 'r', encoding='utf-8').readlines()]
    return stopwords

# 对句子去除停用词
def movestopwords(sentence):
    stopwords = stopwordslist('lib/stop_words.txt')  # 这里加载停用词的路径
    outstr = ''
    for word in sentence:
        if word not in stopwords:
            if word != '\t'and'\n':
                outstr += word
                # outstr += " "
    return outstr

text = open('report.txt', 'r').read()
text = re.sub("[^\u4e00-\u9fa5]", "", text). # 中文的UTF8代码区间
filtered_content = movestopwords(text)


result = jieba.lcut(filtered_content)
count_array = Counter(result)
sorted(count_array.items())

common_c = count_array.most_common(100)
wc = WordCloud(
    font_path="/Library/Fonts/Songti.ttc",
    # 设置背景色
    background_color='white',
    # 允许最大词汇
    max_words=200,
    # 最大号字体
    max_font_size=100,
)
wc.generate_from_frequencies(dict(common_c))
plt.figure()
plt.imshow(wc)
plt.axis('off')
plt.show()
wc.to_file('wordcloud.jpg')

停用词 stop_words.txt

的
了
和
是
就
都
而
及
與
著
或
一個
沒有
我們
你們
妳們
他們
她們
是否

结果

词云图

踩过的坑

WordCloud中一定要设定字体，否则会出现一堆框框

tech

This post is licensed under CC BY 4.0 by the author.

用词云的方式展示十九大报告

源码

停用词 stop_words.txt

结果

踩过的坑

Recent Update

Trending Tags

Contents

Trending Tags

用词云的方式展示十九大报告

源码

停用词 stop_words.txt

结果

踩过的坑

Recent Update

Trending Tags

Contents

Further Reading

为何庆祝星舰发射的爆炸解体？

生成式AI的崛起07：用GPT-4完成整本小说

生成式AI的崛起08：AgentGPT把ChatGPT没干的也干了

Trending Tags