A 2-D Wrist Motion Based Sign Language Video Summarization

TitleA 2-D Wrist Motion Based Sign Language Video Summarization
Publication TypeConference Proceedings
Year of Conference2021
AuthorsSartinas, E, Psarakis, E, Antzakas, K, Kosmopoulos, D
Conference NameBritish Machine Vision Conference - ORAL PRESENTATION

In this paper we present a keyframe extraction scheme based on the wrist motion using differential geometry. More specifically, the time (t)-parameterized Frennet-Serret frame for tracking the signer's wrist is used and the curvature of the trajectory, is proposed for the identification of the Sign Language (SL) video keyframes. Specifically, a video frame is characterized as keyframe if on that time instance the t-parameterized curvature function attains a maximum value. Finally, in order to properly define the wrist 2-D motion model, a skeleton tracker is used. The proposed scheme is adaptable, i.e., the number of extracted keyframes varies according to the complexity of the signs, while preserving the semantic content. This in turn makes it attractive for applications like video-calling. Its performance in terms of the achieved compression and intelligibility ratios was evaluated on a ground-truth sequence and outperformed its s-parameterized counterpart (s is the arc length); it also outperformed a moment-based SL summarization technique. Furthermore, the proposed scheme was experimentally evaluated on a dataset containing 5500 signs by SL specialists with very promising results. Finally, the proposed keyframe extraction was evaluated against the aforementioned techniques on the same dataset via the use of a GRU neural network on the gloss classification problem; its superior accuracy in identifying the gloss meaning was confirmed.