What makes it hard to do Sign Recognition? I touched this topic briefly in an earlier post. Simply put, there are a lot of things going on in sign language. In spoken languages, you just have to listen to one thing; in sign language you pay attention to the face, the body, the hands and arms simultaneously.
Another part of the problem is the complexity of sign language itself. Facial expressions, and body posture are part of the language. Some form of facial recognition and expression detection is needed (although I ignore this in my research, a topic for another post). The signs themselves vary when used in a sentence, much like the sounds of words change slightly when spoken in different sentences, and in different contexts.
Variation is another source of problems. Each individual performs the sign differently, similar how different people sound different in spoken languages. Even from the same individual, the signs vary slightly when done at different times. And top it off with regional and local variations of the same sign. This is one reason why I restricted my research to signs used in Metro Manila; if I didn't I'll never finish.
The other source of difficulty is general difficulty of computer vision. How do you tell which is the background vs the foreground? How do you distinguish several people in one image/video? Humans have an incredible ability to figure out faces and postures even when viewing from the side, how do we duplicate this ability in computers? To reduce these issues, I recorded one person signing wearing a plain black shirt in front of a plain black background.