DOCTORAL THESIS: Motion Annotation in complex video datasets
By Mahmood Muhammad Habib
Supervised by Dr. Arnau Oliver Malagelada
An in-depth analysis of computer vision methodologies is greatly dependent on the benchmarks they are tested upon. Any dataset is as good as the diversity of the true nature of the problem enclosed in it. Motion segmentation is a preprocessing step in computer vision whose publicly available datasets have certain limitations. Some databases are not up-to-date with modern requirements of frame length and number of motions, and others do not have ample ground-truth in them. In this paper, we present a collection of diverse multifaceted motion segmentation benchmarks containing trajectory- and region-based ground-truth. These datasets enclose real-life long and short sequences, with increased number of motions and frames per sequence, and also real distortions with missing data. The ground-truth is provided on all the frames of all the sequences. A comprehensive benchmark evaluation of the state-of-the-art motion segmentation algorithms is provided to establish the difficulty of the problem and to also contribute a starting point.
Ground-truth annotation on motion segmentation datasets of arbitrary real-life videos is a difficult and challenging task. The research community lacks a standard annotation tool for such datasets, which makes it an open research field. We propose in this PhD thesis, an annotation tool for trajectories in complex videos, which provides a publicly available platform to create and reinforce motion segmentation datasets. The user friendly interface allows to refine an initial automatic segmentation result to produce ground-truth annotation on all the motions of all the frames of a given sequence. In long videos with multiple rigid/nonrigid motions containing complete occlusion and real distortions, our tool facilitates rapid annotation of motion in a semi-automatic way.
Motion cue is pivotal in moving object analysis, which is the root for motion segmentation and detection. These preprocessing tasks are building blocks for several applications such as recognition, matching, estimation, etc. To devise a robust algorithm for motion analysis, it is imperative to have a comprehensive dataset to evaluate an algorithm’s performance. The main limitation in making these kind of datasets is the creation of ground-truth annotation of motion, as each moving object might span over multiple frames with changes in size, illumination and angle of view. Besides the optical changes, the object can undergo occlusion by static or moving occluders. The challenge increases many-fold when the video being processed is captured by a moving camera. In this thesis, we also tackle the task of providing ground-truth annotation on motion regions in videos captured from a moving camera. With minimal manual annotation of an object mask, we are able to propagate the label mask in all the frames. Object label correction based on static and moving occluder is also performed by applying occluder mask tracking for a given depth ordering. A motion annotation dataset is also proposed to evaluate the algorithm performance. The results show that our cascaded-naive approach provides successful results in a variety of video sequences.[/vc_column_text][/vc_column][/vc_row]