Texture is area-filling, while silhouette is "just a line", so it sounds natural that texture areas would weigh more in an image? If there is both texture and contour "signal" in the labeled inputs, how to pick what get's weighted more? This is supposed to be learned by training.
I think, this is one of the reasons to use much more video-based training sets and these in time-correct frame order. A moving boundary is something quite significant, but these should possibly not exclusively inspected through a hole ('convolutinal')
Oh, forgot an 'o'.
Did you know you can edit your comments for two hours? Click "edit" above the comment.