A Hierarchical Web Page Segmentation Algorithm using Machine Learning

T. Ito, H. Sano, T. Ozono, and T. Shintani (Japan)

Keywords

Web Page Segmentation, Web Page Layout, Mobile Phone, Machine Learning

Abstract

We implemented a web browsing system to facilitate navi gation, and reading with mobile phones that have a small screen. We need to divide large web pages into small blocks so that they can be displayed on a small screen for the system. The blocks should be the semantic part of the web page, and they have various granularities for each user and application. We propose a new web page segmentation algorithm that uses layout information after rendering. Our algorithm consists of two points. The first point is to seg ment a web page in hierarchical fashion by using eight lay out templates. The second point is to divide a web page into content blocks by using a support vector machine. Experi mental results show that the method has a higher precision than the existing method.

Important Links:



Go Back