Muhannad Alomari


My PhD: Natural Language Acquisition and Grounding from Language and Vision.

​Understanding how children learn the components of their mother tongue and the meanings of each word has long fascinated AI and cognitive scientists; equally robots face a similar challenge unless this knowledge is pre-programmed, which is no easy task either (nor does it solve the problem of language change over time). In this thesis I show how a robot can start with no such knowledge and can gradually acquire certain components of natural language and their groundings in the perceptual world.

I present a novel individual learning approach capable of acquiring symbolic knowledge in both language and vision simultaneously, and use this knowledge to parse previously unseen natural language commands. The learning is accomplished using a show-and-tell procedure; this is inspired by how children acquire knowledge of their everyday physical world by interacting with their parents. Volunteers controlled a robot to perform a variety of table top tasks, which were subsequently annotated with natural language commands, as shown in the figure to the right.

Cognitive Plausibility.

This work aims to answer the following two questions, (1) can a robot bootstrap its knowledge in language and vision? and (2) can it ground language to concepts in vision?
To answer these questions in a cognitively plausible setting, I take into consideration that human learning is incremental and is typically loosely supervised. Further, my system is tasked with learning incrementally from human description of the world, and that the outcome of the learning process is representable in a form understandable by humans.

For more information, please refer to my personal website.