Translate

Πέμπτη 23 Μαΐου 2019

Publication date: Available online 17 May 2019
Source: Computer Speech & Language
Author(s): Hongseon Yeom, Youngjoong Ko, Jungyun Seo
Abstract
Keyphrases of a given document represent its main topic and they are used as a simple method to represent the document. Statistical and graph-based models as unsupervised approaches have been mainly studied. The statistical models have some difficulty in extracting keyphrases from a single document because most statistical ones generally require statistical information from a large corpus. On the other hand, graph-based models can extract keyphrases by only using the information from a single document; nevertheless, they have some drawbacks. The scores of the edges can be biased because a single document does not contain sufficient information to score the edges of a graph and this influences the score of the nodes. In this paper, we propose an effective combination method of a statistical model, C-value method, and a graph-based model to overcome the drawbacks of each model. A new scoring method for keyphrase candidates is developed by the graph-based model and the scores calculated by the new method are applied to the modified C-value method to estimate the final importance scores of the keyphrase candidates. Subsequently, the proposed model is evaluated using two datasets, SemEval 2010 and Inspec, and its results outperformed the state-of-the-art model among unsupervised models and the existing graph-based ranking models.

Δεν υπάρχουν σχόλια:

Δημοσίευση σχολίου

Αρχειοθήκη ιστολογίου

Translate