H. Ujir, N. Jali, D.T.Y. Wen, S.H. Huspi, S.F.S. Fadzir, and S.C.H. Li (Malaysia)
Summarization, extraction, abstraction, domain specific corpus
Most of the existing summarization tools serve as a general purpose summarizer, rarely as the domain specific summarizer; e.g.: medical [14] and law [15] field documents summarizer. This paper describes a framework of an automatic summary generation of one specific domain that is oil palm literature. In order to support the whole framework, the oil palm corpus is developed. The work is based on two different paradigms which is extraction and abstraction. By incorporating these two important methods in one summarization framework, the quality of the produced summary will greatly improve. A Nearly-New IE (ANNIE) is used as the backbone in extraction process. The sentences are then ranked for potential inclusion in the summary using a weighted word frequency known as Term Frequency-Inverse Document Frequency (TF-IDF). In the abstraction process, the oil palm corpus is used to support the summarization procedure. Using the training corpus, the output will be more precise may gather all the important facts from the pre-determined information retrieval process.
Important Links:
Go Back