如何实现短语屏蔽功能 #197

chongqiWang · 2021-06-11T01:56:41Z

我想在搜索到的结果中屏蔽掉某些短语，如搜索"吕布战天下"时，屏蔽掉“战天下”这个短语，结果中允许出现“天下大乱，云长战吕布”，但是不允许“吕布大战天下”。有什么好的解决方案吗？

shi-yuan · 2021-06-12T08:05:13Z

是不让包含这个短语的结果搜出来，还是需要搜出来只是替换成***之类的？

chongqiWang · 2021-06-15T00:53:50Z

是不让包含这个短语的结果搜出来，还是需要搜出来只是替换成***之类的？

不让他搜出来

并且把这个短语弄成读取文件的方式

shi-yuan · 2021-06-15T10:32:33Z

可以在搜索的时候，加must_not过滤掉

chongqiWang · 2021-06-16T00:18:47Z

可以在搜索的时候，加must_not过滤掉

那不是得写很长过滤条件，我主要是对结果过滤，屏蔽短语

shi-yuan · 2021-06-16T01:27:29Z

如果短语很多，
如果变化不频繁，可以考虑写索引的时候放进去，这样搜索的时候直接用这个字段来过滤
如果变化频繁，可以在获取到结果集之后，程序里处理

shi-yuan · 2021-06-16T02:05:08Z

从内容里提取短语，参考：

词典dic_xxx内容：

战天下	a	1000
天下大乱	a	2000

示例：

import org.ansj.library.DicLibrary;
import org.nlpcn.commons.lang.tire.GetWord;
import org.nlpcn.commons.lang.tire.domain.Forest;
import java.util.Arrays;

public class Test {
    public static void main(String[] args) {
        Forest forest = DicLibrary.get("dic_xxx");
        GetWord gw = forest.getWord("如何实现短语屏蔽功能：天下大乱，云长战吕布，吕布大战天下");
        String word;
        while ((word = gw.getAllWords()) != null) {
            System.out.println(word + "============" + Arrays.toString(gw.getParam()));
        }
    }
}

输出：

天下大乱============[a, 2000]
战天下============[a, 1000]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

如何实现短语屏蔽功能 #197

如何实现短语屏蔽功能 #197

chongqiWang commented Jun 11, 2021

shi-yuan commented Jun 12, 2021

chongqiWang commented Jun 15, 2021

shi-yuan commented Jun 15, 2021

chongqiWang commented Jun 16, 2021

shi-yuan commented Jun 16, 2021

shi-yuan commented Jun 16, 2021

如何实现短语屏蔽功能 #197

如何实现短语屏蔽功能 #197

Comments

chongqiWang commented Jun 11, 2021

shi-yuan commented Jun 12, 2021

chongqiWang commented Jun 15, 2021

shi-yuan commented Jun 15, 2021

chongqiWang commented Jun 16, 2021

shi-yuan commented Jun 16, 2021

shi-yuan commented Jun 16, 2021