Can Large Language Models (LLMs) translate Sign Language?

25 April 2024 15:21

4 people in different frames speaking sign language

Large Language Models (LLMs) are increasingly becoming an integral part of our modern society. LLMs are a type of machine learning technology capable of generating or translating language into a human-esque text. ChatGPT, YouTube auto-captioning, and even search engines such as Google or Bing utilise LLMs trained on large data sets to collate the most natural and “sensible” responses to queries, summarise text documents, complete sentences, and translate spoken or written text from one language into another.

However, the vast majority of LLMs have only ever been tested with spoken languages, with signed languages of the hearing impaired often being neglected in tests of their effectiveness – until now. Dr Hossein Rohmani of Lancaster’s School of Computing and Communications, alongside Professor Jun Liu and PhD students Jia Gong, Lin Geng Foo, and Yixuan He of Singapore University of Technology and Design utilised an off-the-shelf LLM, and prompted it with videos of sign languages by transferring the sign video sequence to a language-like hierarchical structure (i.e., “character-word-sentence” structure) – in order to make the sign language compatible with standard LLMs.

The team’s approach to translating sign languages differs from those typically used, which have predominantly made use of basic explanations of the meanings of each sign, known as “glosses”, to construct the translation. Whilst effective at translating individual signed words, a gloss-based approach to translation tends to suffer from informational loss, as facial expressions and head movements are often not transcribed into glosses. Additionally, this process is heavily reliant upon a specialist to annotate the signs in the first place in order to train the translation model, meaning that this approach is incredibly labour-intensive, and may not be viable or even possible for rarer signed languages. By using an LLM instead, Dr Rahmani and the team bypass the need to transcribe the language and adjust a pre-trained model to translate the sign languages – the LLM is capable of performing the translation once the videos have been converted into a structure it can understand. This project marks the first time an off-the-shelf LLM has been used to translate any form of sign language.

The team tested this new LLM-based framework on two sign languages, one German and one Chinese, requesting the algorithm to translate the signed languages back into the spoken form of their respective languages. To measure the relative success of their approach, they employed a number of commonly-used metrics designed for evaluating machine translation. These metrics assess the accuracy of the translation, calculating how many words the LLM translated correctly as well as how many of the words were in the correct order.

They found that their framework – dubbed SignLLM – was more accurate at producing human-like translations of the signed languages than other contemporary experimental trials. In particular, SignLLM performed significantly better than other gloss-free sign language translators, particularly when it came to the translation accuracy of longer sentences.

On the success of the SignLLM tests, Dr Hossein Rahmani commented: “By harnessing Large Language Models (LLMs), we can effectively leverage their strong linguistic abilities and semantic understanding that have been acquired from large-scale training over many languages. This can be very helpful for facilitating the translation of sign languages, especially the lesser-known sign languages where there is often limited available data. Thus, our work represents an advancement towards better accessibility and inclusivity in communication with the hearing-impaired community.

Moreover, our work hints at a potential paradigm shift in the technical aspects. Rather than solely focusing on direct modelling and learning of sign language translation, it suggests a pivot towards enhancing the extraction of language-like representations from sign videos which can be well-understood by LLMs. This also underscores the remarkable versatility of LLMs, pointing towards potential applications across modalities such as video and 3D models”.

Back to News