Researchers have trained an AI model to learn language through the eyes and ears of a single child, using headcam video recordings from when the child was six months old until their second birthday. The study, published in Science, reveals new insights into how children learn language and how AI can mimic human learning processes.
How the AI model learned from a child
The researchers used 61 hours of video and audio data from a head-mounted camera worn by a child, capturing about 1% of the child’s waking hours. They identified 250,000 word instances spoken to the child and associated them with corresponding video frames of what the child saw. The video footage showed the child engaging in various activities throughout their development, such as mealtimes, reading books, and playtime.
The researchers trained a multimodal neural network, which combines visual and linguistic data, using an algorithm called contrastive learning. This algorithm learns by making associations within the input data, such as linking words with visual contexts. The AI model was able to learn words and concepts present in the child’s everyday experience by aligning the visual and linguistic cues.
What the AI model learned from a child
The researchers tested the AI model by presenting it with a target word and four different image options, challenging it to select the image that matched the word. The AI model learned a substantial number of words and concepts from the child’s experiences, such as animals, food, body parts, colors, shapes, and numbers. The model also displayed an ability to generalize learned words to visuals that were different from those encountered during training, such as recognizing a dog in a different pose or a different breed.
The study also compared the AI model’s performance with that of human children studied in laboratories. The researchers found that the AI model learned words at a similar rate and with similar accuracy as human children, despite receiving much less input data. The AI model also showed similar patterns of generalization as human children, such as being more likely to generalize words for animals than for artifacts.
Why the AI model learning from a child matters
This study challenges traditional beliefs about language learning, indicating that associative learning with minimal input can lead to substantial language acquisition, much like in human children. The study also suggests that children may not rely on language-specific biases or innate knowledge to learn words, but rather on their ability to make associations between words and visual contexts.
This research also demonstrates the potential of AI to mimic human language learning processes and to learn from naturalistic data. By studying how AI models learn from real-world experiences, the researchers hope to gain new insights into how children learn language and how to improve AI systems for natural language processing.: