Abstract: We present a fully automatic arm and hand tracker that detects joint positions over continuous sign language video sequences of more than an hour in length. Our framework replicates the state-of-the-art long term tracker by Buehler et al. (IJCV 2011), but does not require the manual annotation and, after automatic initialisation, performs tracking in real-time. We cast the problem as a generic frame-by-frame random forest regressor without a strong spatial model. Our contributions are (i) a co-segmentation algorithm that automatically separates the signer from any signed TV broadcast using a generative layered model; (ii) a method of predicting joint positions given only the segmentation and a colour model using a random forest regressor; and (iii) demonstrating that the random forest can be trained from an existing semi-automatic, but computationally expensive, tracker. The method is applied to signing footage with changing background, challenging imaging conditions, and for different signers. We achieve superior joint localisation results to those obtained using the method of Buehler et al.
No Comments.