自然语言 · March 12, 2022 0

文本分段算法TextTiling

Table of Content

Texttiling利用了词性共现、分布的模式。算法有三个部分:1. 将文章分成一个一个句子单元 2. 为每一个句子单元算一个分数 3. 根据句子单元之间的"against scores"所得到的图,来得到子话题的边界。

text=“I'm messing around with this one myself just now for the same reason you are and had the same”
ttt = nltk.tokenize.TextTilingTokenizer()
tiles = ttt.tokenize(text)

参考连接

https://www.nltk.org/_modules/nltk/tokenize/texttiling.html

%d bloggers like this: