“Distributed Web2.0 Crawling for Ontology Evolution”

Citation

Juffinger, Andreas, Neidhart, Thomas, Granitzer, Michael, Kern, Roman, Weichselbraun, Albert, Wohlgenannt, Gerhard and Scharl, Arno. (2007). “Distributed Web2.0 Crawling for Ontology Evolution”. Proceedings of the Second International Conference on Digital Information Management (ICDIM'07), Lyon, France

Abstract

Semantic Web technologies in general and ontology- based approaches in particular are considered the foundation for the next generation of information services. While ontologies enable software agents to exchange knowledge and information in a standardised, intelligent manner, describing todays vast amount of information in terms of ontological knowledge and to track the evolution of such ontologies remains a challenge. In this paper we describe Web2.0 crawling for ontology evolution. The World Wide Web, or Web for short, is due its evolutionary properties and social network characteristics a perfect fitting data source to evolve an ontology. The decentralised structure of the Internet, the huge amount of data and upcoming Web2.0 technologies arise several challenges for a crawling system. In this paper we present a distributed crawling system with standard browser integration. The proposed system is a high performance, sitescript based noise reducing crawler which loads standard browser equivalent content from Web2.0 resources. Furthermore we describe the integration of this spider into our ontology evolution framework.

Downloads and Resources

Reference (BibTex)