Application of Laplacian Mixture Model to Image and Video Retrieval
thesisposted on 08.06.2021, 12:40 by Tahir Amin
In this study we present a new approach to feature extraction for image and video retrieval. A Laplacian mixture model is proposed to model the peaky distributions of the wavelet coefficients. The proposed method extracts a low dimensional feature vector which is very important for the retrieval efficiency of the system in terms of response time. Although the importance of effective feature set cannot be overemphasized, yet it is very hard to describe image similarity with only low level features. Learning from the user feedback may enhance the system performance significantly. This approach, known as the relevance feedback, is adopted to further improve the efficiency of the system. The system learns from the user input in the form of positive and negative examples. The parameters of the system are modified by the user behavior. The parameters of the Laplacian mixture model are used to represent texture information of the images. The experimental evaluation indicates the high discriminatory power of the proposed features. The traditional measures of distance between two vectors like city-block or Euclidean are linear in nature. The human visual system does not follow this simple linear model. Therefore, a non-linear approach to the distance measure for defining the similarity between the two images is also explored in this work. It is observed that non-linear modelling of similarity yields more satisfactory performance and increases the retrieval performance by 7.5 per cent. Video is primarily mult-model, i.e., it contains different media components like audio, speech, visual information (frames) and caption (text). Traditionally, visual information is used for the video indexing and retrieval. The visual contents in the videos are very important; however, in some cases visual information is not very helpful for finding clues to the events. For example, certain action sequences such as goal events in a soccer game and explosion in a news video are easier to identify in the audio domain than in the visual domain. Since the proposed feature extraction scheme is based on the shape of the wavelet coefficient distribution, therefore it can also be applied to analyze the embedded audio contents of the video. We use audio information for indexing video clips. A feedback mechanism is also studied to improve the performance of the system.