Intelligent Multimodal Augmented Video Motion Retrieval System

Video is increasingly gaining importance as medium to capture and disseminate information. This is not only the case for personal use but also –and most importantly– for professional and educational applications. With the enormous growth of video collections, effective yet efficient content-based retrieval of (parts of) videos is becoming more and more essential. Conventionally, video retrieval relies on metadata such as manual annotations, or inherent features extracted from the video. However, the most decisive information that distinguishes video content from static content, the movement of individual objects across subsequent frames, so far is largely ignored. This is particularly the case for so-called augmented video where additional spatio-temporal data on the movement of objects (e.g., captured by dedicated sensors systems) is available in addition to the actual video content.

The iMotion project will develop and evaluate innovative multi-modal user interfaces for interacting with augmented videos. Starting with an extension of existing query paradigms (keyword search in manual annotations), image search (query by example in key frames), iMotion will consider novel sketch- and speech-based user interfaces. In particular, novel types of motion queries will be supported where users can specify motion paths of objects, via sketches, gestures, natural language interfaces, or combinations thereof. Several types of user interfaces (voice, tablets, multi-touch tables, interactive paper) will be supported and seamlessly combined so as to smoothly migrate a session from one type of user interface to another during the process of specifying and refining a query. This will be based on novel approaches to representation learning and the extraction of high-level motion descriptors from augmented videos, based on a motion ontology. In addition, iMotion will develop novel index structures that jointly support traditional video features and the additional motion metadata.

A major contribution will be the quantitative and qualitative evaluation and user studies of the intelligent multi-modal interfaces and query paradigms developed in two concrete use cases – sample applications from which the project will select include, but are not limited to, augmented sports videos where users search on the basis of trajectories of player or ball movements, educational videos from the natural sciences where users search for animal movements inside a horde or a swarm, or sketch-based searches for currents in the sea captured by sensors integrated into buys.
The iMotion consortium will openly publish the augmented video collections and the motion metadata created in the course of the project’s evaluation activities.