|
|
Web Information Extraction Based on Genetic Algorithm |
|
|
Abstract WHISK system is a semi automatic information extraction (IE) system. It works well in extracting information for structured or semi structured web texts. However, but there is no guarantee that the rule learning algorithm can extend rules in an optimal way. Besides, the generation of rule set is time consuming. To solve these problems, the genetic algorithm is introduced to improve the supervised machine learning algorithm WHISK by a heuristic rule expansion, and a removing method is used to generate the rule set. The experimental results show that the proposed algorithm performs well in terms of the efficiency and the recall rate.
|
|
|
|
|
|
|
|
|