Parallel developmental changes in children's drawing and recognition of visual concepts.
December 16, 2020
December 12, 2020
Bria Long, Stanford University
All Captioned Videos SVRHM Workshop 2020
PRESENTER: Our first speaker of the third session is Bria Long. And she's been, I believe, two years already at Stanford right now as a postdoc. And Bria has been at the Division Sciences Lab at Harvard before that, where she finished her PhD advised by Talia Konkle, actually, one of my former postdoctoral advisors, still postdoctoral advisor at Harvard, George Alvarez, and Susan Carey. And Bria today is going to talk to us about parallel developments, changes in children's drawings, and recognition of visual concepts. So the stage is yours, Bria. Please, take it away.
BRIA LONG: OK. Thanks, everybody, for having me. I want to start off today with the observation that if I show you an array of colored pixels arrayed in different locations on the screen, you can really easily derive meaning from what you see. Instead of seeing colored blobs, you see bikes, bunnies, couches, and cups. There's incoming visual input that gets transformed and made contact with a wealth of knowledge that we have about these objects and the categories they belong to. And broadly, my research is concerned with trying to understand how it is that we develop these representations that allow us to derive meaning from what we see.
For example, in just a quick glance, we know that all of these are objects that are big in the real world. They're objects that are big enough to support a human, things we navigate with or around. And in turn, all of these are objects that are small in the real world, small enough to hold with one or two hands, and objects we use to interact with the world. And these are all animals. They're alive and capable of self-generated motion.
And we can make these inferences when these pictures are in full color, or in black and white, or, as some of my other work has shown, when they're reduced to their primitive texture and form features and unrecognizable as bikes or bunnies, per se. But, of course, our knowledge about these objects goes far beyond their animacy and their size in the real world. We know that bicycles have wheels, that cups have handles, and that rabbits have long ears. And given a verbal label for these objects, we can access our mental representations of them and choose the visual features necessary to convey their identity.
At the same time, we know, of course, that we weren't born with these detailed representations. Somehow, we had to learn how to connect these visual features with the appropriate conceptual knowledge about these objects. So, how it is that we develop these visual concepts? How do we develop this flexible generative knowledge about what things look like?
Consider, for example, the six-month-old, my daughter Eloise who, now a 15-month-old toddler, is still at the very beginning of this journey. How, for example, is she ever going to learn the visual concept of "whale," and understand that it refers to her stuffed animal, her bath toy, and the largest mammal on the planet? And how might her concept of whales change across development as she learned more about her planet and the other animals in it?
Of course, this is not a new question. The prevailing picture of visual conceptual development starts with the picture of an infant as a relatively passive recipient of the regularities in their very rich visual environment, where they see thousands and thousands of exemplars of the different categories they need to learn. In turn, our visual system is thought to do a very good job of extracting these regularities, of carving nature at its joints, with processing mechanisms developing quickly, with an end goal of recognizing things at the basic level as bunnies, cars, and cups.
In turn, we've often thought of children's physical representations as maturing relatively early in life. Indeed, even very young infants can distinguish between different basic-level categories and certain paradigms, and by the age of two years, children do a very good job of generalizing visual concepts to their abstracted versions. Today, I'll produce some evidence, however, that this data and this view is incomplete, joining a long literature now suggesting that there's mounting evidence that visual concept learning continues well throughout childhood. In fact, children become steadily better throughout childhood at discriminating between perceptually similar exemplars, such as faces or houses, even at recognizing objects across unusual viewpoints or lighting, at discarding irrelevant features when trying to learn new categories, and making explicit categorizations based on taxonomy versus irrelevant perceptual features.
And we know that these abilities mature rather gradually, as children experience both real life and depicted versions of various different categories in the world around them. In addition, some work on children's drawings points towards the protracted developmental trajectory for visual concepts. For example, when asked to draw different categories of people, only older children include individual features that tend to distinguish between them. Other work suggests that short-term experience in the lab can change what children draw. Here, children were shown a novel object, such as so, and then asked, can you copy this drawing of this object? And children who had experience with that object beforehand tended to include information about the hidden side of an object, suggesting that children draw what they know about the objects in the world around them.
And while we sometimes think of drawings as merely symbolizing objects, I want to point out that there's quite a bit of evidence now that drawings do really resemble the physical world. For example, populations without extensive experience with drawings nonetheless succeed in recognizing them, including case studies of adults in an Aboriginal community, a 19-month-old raised with minimal exposure to depictions, and later layers of deep CNNs that have never been trained on drawings nonetheless can succeed in recognizing them in certain cases. Furthermore, evidence from cognitive neuroscience suggest that high-level visual cortex is similarly recruited when recognizing either drawings or photographs of objects, as well as when producing drawings of object categories, suggesting that recognizing objects and producing drawings of them both rely on general-purpose visual processing mechanisms.
So our hypothesis was that children's visual concepts become more specific throughout childhood. In other words, children might actually get better at knowing what makes a bird look different from a rabbit. We reason that if this was the case, we should see two different kinds of effects-- both in children's visual production abilities or their drawings, and enhanced ability to include the diagnostic features of different categories in their drawings, as well in visual recognition. That is, if it's really about children's internal representations that are changing, we should also see children showing enhanced sensitivity to these diagnostic features when they're asked to recognize drawings of different object categories.
And this will be the outline for the rest of the talk. And I'm going to start with this first section on children's visual production. In order to ask these questions, we needed to develop a way of collecting drawings, and we did so via a touch screen and video prompts where an experimenter asked, what about a bird? Can you draw a bird? And here is one child's answer to that question. She draws a head, beak, eyes, mouth, tail, a couple of legs, and then she's done.
Using this method, we installed a freestanding station in the San Jose Children's Discovery Museum where children came and drew on their own. Doing so, we collected data from 48 different categories, including prompts for both commonly-drawn categories, such as houses and trees, as well as infrequently-drawn categories, such as scissors, couches, and octopuses. After filtering for off-task drawings, we obtained a data set containing more than 37,000 drawings from over 8,000 children from ages 2 to 10 years of age, with, in fact, many, many exemplars for each age and category, showing just a few examples here.
Now, with these data in hand, we needed a principled way of analyzing them. And here, we capitalized on prior work validating the use of deep convolutional neural networks optimized to recognize objects in actual photographs as a good basis for predicting many aspects of human recognition. Indeed, as you all probably know, a wealth of work has shown now that activations in these later layers predict neural responses to object categories in object responsive cortex, perceptual similarity judgments for both novel and familiar categories, category typicality ratings, to some extent, as well as the correspondence between sketches and photographs of object categories, and, indeed, how well humans can recognize sketches of object categories.
So here, we're going to be using deep CNN activations as a tool for analyzing this large data set. We're going to be taking activation from VGG-19, which, although it is a relatively old deep CNN now, it isn't fine-tuned or training on drawing recognition, anyway. And it does rank second on brain score. That is, it's still among the best at protecting both human and neurobehavioral metrics of object recognition.
There are some significant advantages to doing so. One of the pros is that it's less biased than humans in some ways, it doesn't have any knowledge of drawing conventions, of how we might typically draw a fish or a bird. It also doesn't have knowledge of high-level semantic knowledge or the semantic relationships between categories-- information that humans are notoriously bad at discarding when they're making perceptual similarity judgments. And so in some ways, these model features are more objective than humans.
Of course, however, we VG-19 is not a perfect model of recognition. In particular, it lacks knowledge of ovid part structure, doesn't know what beaks or wings are, and it may not always recognize drawings like humans would. Nonetheless, for our purposes, we think that the pros outweigh the cons, especially for analyzing this large data set.
So our analysis strategy is going to be to ask how recognizable drawings are based on their high-level visual features as extrated by VGG-19. We're going to extract visual feature vectors for every drawing using our model, taking activations from the second to last network layer, or Fc6, which you know from prior work contains the features relevant to basic-level recognition. And then, we'll predict the category levels of held-out drawings from these visual feature vectors using logistic regression, akin to embedding all of these drawings in a high-dimensional feature space, as illustrated here.
So these logisti regressions are going to give us two different metrics. So we'll assign two outputs for each drawing. First, a binary decision, or a category classification accuracy, indicating whether this drawing was correctly recognized based on these visual features. Did it tell that this drawing was actually of a rabbit? The second one is classifier evidence for each of the categories in the data set-- so reflecting the degree to which a particular drawing contained evidence for another category. For example, to what degree did this rabbit look like a rabbit, or even a bird or a house?
So now, armed with these analysis, we can ask our first basic question. Do older children actually produce more recognizable drawings? I'm going to be plotting the category classification accuracy, that is, using the VGG-19 visual features on the y-axis, against the age of the child drawing on the x-axis. Each dot is going to be data from one category. And overall, we see strong age-related gains in the recognizability of drawings, supporting our basic prediction that children's drawings increasingly contain the visual features needed for recognition across development, and our original intuition that this would be true as well.
Now, the prevailing interpretation of this kind of trend, that this is really about children's ability to draw-- that is, it's about their ability to control and plan their motor movements. So one of the first things we wondered was, are these developmental changes attributable simply to visual motor control? To get a first pass of this question, we first administered two shape tracing trials to all children, asking every child before they entered the drawing task, can you trace this shape, as well as another one? And then, we use a semi-automated method to assess how good these tracings were. We used an image registration algorithm to estimate tracing error and then validated this estimate with adult ratings of how good the tracings were.
So now, what I can do examine the degree to which the developmental trend is explained by differing abilities in visual motor control, [INAUDIBLE] plotting data from the best tracers in light blue and the worst tracers in dark blue. So visual motor control has really explained away this trend, which we'll see relatively flatlines for each one of these groups. Instead, what we see is that while tracing abilities do explain substantial variance in how recognizable children's drawings were, it doesn't explain away the broader developmental trend that we see.
Another way of looking at changes across development is to ask how children's recognizable drawings change across development. That is, some drawings can be more or less recognizable as cars or cats. To analyze this, we're going to quantify the classifier evidence assigned to the target category. And to give you a sense of what this is like, here, I'm going to show drawings organized by the degree to which they had higher classifier evidence assigned to the target category. These drawings have higher classifier evidence, and as we go down bluer and bluer, these have the lowest classifier evidence assigned to them.
So now, our question is, among drawings recognized by the model, do those made by older children have higher classifier evidence? Are they, on average, these redder drawings? So what I'm going to plot now is classifier evidence on the x-axis. The y-axis is going to be all the categories in the data set. And for clarity, I'm going to separate out scores for three different age groups of children.
And what we see is that drawings made by older children-- those, the green and the blue-- these have higher amounts of classifier evidence even for the recognizable drawings. This is restricted to just the drawings that were recognized by the models, indicating that even older children include more diagnostic features in their drawings systematically. When we look at the categories that are among the best versus the worst recognized and go the top house, clock, mushrooms, snails, whales, airplanes, and dogs at the very bottom.
We can see that there really isn't an ordering by how frequently these items tend to be practiced or drawn by children. For example, clocks and mushrooms are among some of the best recognized categories, whereas dogs are among the worst. In other words, we see these effects for both rarely and frequently-drawn categories.
However, when we look back at these relatively poorly-classified drawings, those that have low classifier evidence, they don't seem meaningless at all. There's still some structure that we can see in these drawings. And in fact, we found this to be true empirically as well.
So here, the category that was drawn is on the y-axis, and I'm showing the classifier probabilities for all the categories in the data set on the x-axis, where lighter values indicate greater confusions. And there seem to be some relatively systematic model confusions here. For example, octopus is often confused with spider, face us often confused with clock, phones are confused with books and bottles, beds are confused with piano, house, and chair.
But I've also ordered these categories by their animacy and their size in the real world, highlighting here that animals are most often confused with other animals, and inanimates are often confused with other inanimate objects. And in fact, among the unrecognizable drawings, we found that we could still decode the intended animacy that children were trying to draw across all ages. And to give you an intuition for what this means, here are some drawings that were correctly identified as animals or objects, yet aren't really recognizable as belonging to a certain category per se. And I'll just note that this is consistent with my prior work showing that animals and objects tend to have different visual features.
So we also now looked at whether we could find the real-world size of the category children were trying to draw. So some of these categories they were trying to draw are small in the real world, like phones and clocks, and others were large, like beds and airplanes. And we found that to a slightly lesser degree, we could also decode the real-world size of the category children were intending to draw.
And here, to pull out some examples, here are drawings that were correctly identified as big or small objects but weren't correctly recognized as the category they were trying to draw. This is also consistent with my prior work suggesting that objects of different real-world sizes tend to also differ in visual features, including their curvature.
So so far, what I've shown you is that older children tend to include more diagnostic features in their drawings, even among the recognizable drawings. Yet, but unrecognizable drawings made by children of all ages still contain rich information, in particular about their animacy and their size in the real world. If these developmental changes truly reflect changes in children's internal visual concepts, however, we should also expect to see changes in children's recognition of visual concepts. That is, are older children differentially sensitive to these same diagnostic features when they're trying to recognize drawings-- here, drawings made by other children?
To test this idea, we designed a guessing game at the same museum kiosk where children were presented with drawings one at a time and had to tap which visual concept the drawing that went from a set of four options. Here, can you tap the animal that goes with the drawing. We use four different versions of this guessing game and doing so collecting recognition data for 16 different classes from over 700 participants in age 3 to 10 years.
To make the task sufficiently challenging, options in this guessing game were all relatively similar. For instance, in this version of the guessing game, they're all small animals. Drawings were randomly sampled from the larger data set made by children ages four to 9, and to ensure that children were actually on task during this guessing game, we also included photo [? cash ?] trials where they had to match a very similar photograph of the different category to their button. And we set a threshold for inclusion, to only include children that were actually participating in the game.
Now, to remind you, the drawings that were included at the station did vary in the degree to which they contained diagnostic features of the different categories. For example here are some rabbit drawings, ranging from the least to the most diagnostic, but varied in the amount of classifier evidence that was assigned to them. And so our critical question here is going to be whether children's sensitivity to this dimension, to these diagnostic features, changes across development.
So first, we're just going to look at overall drawing recognition. So here, I'm going to plot the proportion of drawings that were recognized on the y-axis with the age of the child recognizing on the x-axis. And here, each dot is going to be data from one child, scaled by the number of trials they actually completed. When we looked at the data, we found that older children are systematically better at drawing recognition overall when we collapsed across all levels of diagnostic information.
Now, as I mentioned, our critical question is whether older children are differentially sensitive to increasing levels of diagnostic information. So now I'm going to re-plot the same data, separating out our recognition data for each age and as a function of the diagnostic visual features in their drawings, or classifier evidence. Our critical prediction here is a change in the slope of these lines relating to these two factors, and that is, indeed, what we found. We found that older children were better able to extract what signal there is in these drawings, such that they're better able to take advantage of increasing diagnostic information to respond more accurately. In other words, it suggests that children become more and more sensitive to the presence of these diagnostic visual features as they grow older, and the sensitivity manifests when they're recognizing drawings made by other children.
So to recap, what I've argued is that older children include more diagnostic features in their drawings, yet that their unrecognizable drawings still contain rich information, such as animacy and object size. In turn, we found that older children are overall better at recognizing drawings and more sensitive to the diagnostic features during recognition. Taken together, these findings support the notion that across development, children are becoming better at connecting their internal visual concepts with external representations of them, including these ambiguous, lossy drawings produced by their peers.
So how is this happening? One possibility is that children are becoming practiced visual communicators. That is, through producing drawings that they want their peers to engage with, then we learn which features are most diagnostic of different categories and then tend to more systematically include those features in their drawings and capitalize on those during recognition.
A second, non-exclusive possibility is that children are enriching their semantic knowledge about these categories as they learn why categories have the features that they do. For example, as children learn that whales are mammals, they might tend to draw them differently. A final contributor could simply be that children are gaining more and more extensive categorization experience throughout childhood. After all, this is a rather wide age range.
Nonetheless, I think that when we step back, we still have some updates to our view of visual concept learning. First, I think we need to take seriously depictions as a source of visual input to our models and our theories of visual recognition. Our knowledge about drawings and sheet parts are deeply ingrained in our representations and children have experienced not only seeing depictions but also producing them throughout childhood. Probably, these results add to the growing literature that there are greater changes in children's visual recognition abilities throughout childhood, and in particular, it suggests that there's changes in the ability to express and use diagnostic features of different object category during both physical production and visual recognition.
In turn, I think these results point to a much more protracted development of introductory visual concepts than we might have imagined, one that emerges in tandem with many other aspects of cognitive development. And I think that by visual missile development from this ecological and multidisciplinary perspective, one might start early, but have a complex and protracted developmental trajectory. We can start to understand a fundamental question in both eye and cognitive science-- how we learn to derive meaning from what we see. With that, I'd like to thank my co-authors, my lovely lab, funding sources, and you for listening.