Two-stream fusion edge detection network
thesisposted on 24.05.2021, 12:09 by Hasan W. Almawi
This thesis introduces a method to combine static and dynamic features in a convolutional neural network (CNN) to produce a motion and object boundary prediction map. This approach provides the CNN with dynamic and static cues and information, thus improving its predictions. The spatial stream of the CNN learns to compute an object boundary prediction map from a single RGB frame, while the temporal stream learns to compute a motion boundary prediction map from the corresponding optical ow map. The streams are then combined through an encoder-decoder architecture, where the decoder learns to fuse the features from both streams to obtain a task specific output. The proposed method yields state-of-the-art results on a motion boundaries benchmark, and systematic improvements in object boundaries benchmarks over methods that solely rely on static features extracted from a single RGB frame.