Over the past decade, the ALT group has played an important role in advancing the field of Natural Language Processing (NLP) and developed language processing capabilities that support a range of content analyses, from name entity recognition to discourse analysis and dialog processing. By tackling challenges of Arabic languages and dialects, the ALT group emerged as a world leader in speech recognition, machine translation and question answering.
ALT Research Areas
Natural Language Processing
The ALT group has developed highly effective techniques and toolkits for fully automatic processing of Arabic text, including morphological analysis, parts-of-speech tagging, parsing, diacritization, named entity recognition, and spelling correction.
The ALT group has developed technologies to process large volumes of social media content in English and Arabic and created models for extracting entities and detecting specific textual properties.
Following the general concern about mis-information and bias in published content, the ALT group has engaged with the international community on developing methods for exposing fake news and enabling readers to assess the validity of the facts presented in the text.
Learning to speak a language requires training and practice. Supporting that process is now possible through technologies that can provide real time feedback to the learner about the pronunciation of specific words.
ALT researchers are working with the international community on developing methods that are effective in analyzing speech with limited data and linguistic resources, as is the case with Arabic dialects.
Linguistic Models Understanding
Deep neural network (DNN) models have proven to be effective in automating content generation, classification and prediction tasks. However, systems and solutions that are subject to quality assurance require an in-depth understanding of the methods to ensure consistency in performance and to control the quality of the output.
ALT team has made significant inroads in understanding the properties of the DNN models derived from textual data and developed methods to analyze concepts learnt by the models.