Figure 1: Two example clips. The left one is considered more interesting.
The amount of videos available on the Web is growing
explosively. While some videos are very interesting
and receive high rating from viewers, many of them are
less interesting or even boring.
The measure of interestingness of videos can be used to improve
user satisfaction in many applications. For example, in
Web video search, for the videos with similar relevancy to a
query, it would be good to rank the more interesting ones
Here we conduct a pilot study on the understanding of human perception of video interestingness for the first time, and design a computational method to identify more interesting videos. To this end we first construct two datasets of Flickr and YouTube videos respectively. Human judgements of interestingness are collected and used as the ground-truth for training computational models. We evaluate several off-the-shelf visual and audio features that are potentially useful for predicting interestingness on both datasets.
To support the study we build two benchmark datasets with ground-truth interestingness labels. The first dataset (1,200 videos) was collected from Flickr, which has a criterion called "interestingness" to rank its search results. The second dataset (420 videos) was collected from YouTube, which does not have similar ranking criteria. So we build an online pair-wise annotation system, and hire 10 human assessors to provide intestingness ratings of the videos. To simulate further research on this challenging problem, the datasets have been released.
We designed and implemented a computational system to compare the interestingness levels of videos, using a large variety of features ranging from visual, audio, to high-level attributes, such as visual SIFT, audio MFCC, and attribute ObjectBank. Given two videos, the computational system is able to automatically predict which one is more interesting. The prediction framework and some representative results are shown in the following figure. Overall, we observed very promising results on both datasets. For more details, please refer to our AAAI 2013 paper.