Abstract
In this paper, we propose an audio-visual approach to video
genre categorization. Audio information is extracted at block-level, which
has the advantage of capturing local temporal information. At temporal
structural level, we asses action contents with respect to human perception.
Further, color perception is quantified with statistics of color
distribution, elementary hues, color properties and relationship of color.
The last category of descriptors determines statistics of contour geometry.
An extensive evaluation of this multi-modal approach based on on
more than 91 hours of video footage is presented. We obtain average
precision and recall ratios within [87% − 100%] and [77% − 100%], respectively,
while average correct classification is up to 97%. Additionally,
movies displayed according to feature-based coordinates in a virtual 3D
browsing environment tend to regroup with respect to genre, which has
potential application with real content-based browsing systems.
Original language | English |
---|---|
Title of host publication | Proceedings of the 18th International Conference on MultiMedia Modelling (MMM2012) |
Number of pages | 12 |
Publication status | Published - 2012 |
Fields of science
- 102 Computer Sciences
- 102001 Artificial intelligence
- 102003 Image processing
JKU Focus areas
- Computation in Informatics and Mathematics
- Engineering and Natural Sciences (in general)