Extracting Room Prices from Web Tables - an Ontology-Aware Approach

Birgit Pröll, Christina Feilmayr, Stefan Parzer, Christina Buttinger, Michael Guttenbrunner

Research output: Chapter in Book/Report/Conference proceedingConference proceedingspeer-review

Abstract

The growing amount of semi-structured and unstructured data on tourism Web sites with heterogeneous designs requires information extraction (IE) mechanisms, to create, for instance, tourism portals. In order to build semantic eTourism environments, the acquisition of room prices is of particular interest. Room prices and related information often appear in tabular structures, which still challenge Web information extraction techniques. In this paper, we begin by identifying various price table patterns which are characterized by the position of a number of features that determine a room price. We then describe an extended ontology model for tourism prices. Finally, we present TAINEX, a plug-in for functional and structural analysis and data interpretation of price tables, which extends the existing prototype TourIE, a rule-/ontology-based information extraction system for Web sites with heterogeneous designs.
Original languageEnglish
Title of host publicationProc. of 17th International Conference on Information Technology and Travel & Tourism (ENTER10)
Number of pages12
Publication statusPublished - Feb 2010

Fields of science

  • 102001 Artificial intelligence
  • 102006 Computer supported cooperative work (CSCW)
  • 102010 Database systems
  • 102014 Information design
  • 102015 Information systems
  • 102016 IT security
  • 102028 Knowledge engineering
  • 102019 Machine learning
  • 102022 Software development
  • 102025 Distributed systems
  • 502007 E-commerce
  • 505002 Data protection
  • 506002 E-government
  • 509018 Knowledge management
  • 202007 Computer integrated manufacturing (CIM)
  • 102033 Data mining
  • 102035 Data science

Cite this