Wobbling, drooping, bouncing - Visual perception of materials and their properties
Date Posted:
November 9, 2021
Date Recorded:
November 2, 2021
CBMM Speaker(s):
Vivian Paulun All Captioned Videos CBMM Research
Description:
Visual inference of material properties like mass, compliance, elasticity or fragility is crucial to predicting and interacting with our environment. Yet, it is unclear how the brain achieves this remarkable ability. How materials move, flow, fold or deform, depends not only on their internal properties but also on many external factors. For example, the observable behavior of an elastic bouncing object depends on its elasticity but also on its initial position and velocity. Estimating elasticity requires disentangling these different contributions to the observed motion. Predicting the future path of the object requires a forward simulation given the estimated latent parameters. I will present a set of experiments which we investigated how accurately human observers estimate the elasticity of bouncing objects or predict their future path. Furthermore, I will discuss the nature of the visual information observers use as well as the limitations of their internal model.
PRESENTER: Yes, so today I will share some of my research with you about the visual perception of materials and material properties. So when we look at the world around us, we determine not only the identity and location of objects, but also what they are made of. We see what properties they have. For example, we can see that the cloth is deformable and soft, the wood is solid and has a specific texture, and we can estimate the weight of the books or anticipate that the silverware will be cold.
Visual estimation of material properties is an integral part of what makes our visual experience so rich. It's multimodal. Only by looking at an object you know how it will feel to the touch. It's part of the deep understanding that we have of our surrounding world. We can, for example, anticipate how objects are going to respond to an applied force. This is why visual estimation of material properties is crucial for interacting with our environment, because we can anticipate and predict what's going to happen and we can plan our actions accordingly.
So for example, we can adjust the force with which we pick up an object so that it's enough not to lift the object, but not too much to squish it, we can decide on which of these two piles we can place another book, and we know how to grasp an object, even if it's very slippery or how we catch a ball and how we handle a container with the liquid. Vision is often our first and primary source of information about this.
Yet visual perception of mechanical material properties is a very hard task. So if I show you an image like this one here and I ask you which color the objects have, then you could look, for example, at the pixel values to figure this out. And similarly, if I ask you about the translucency you probably look at the distorted images that you see underneath other objects. Or if I ask you about the glass, you will attend to the highlights that you see on the objects.
But if I now ask you how soft these objects are. Put simply, there is no pixel in this image that tells you how soft, elastic, or heavy these objects are. Estimating mechanical material properties goes beyond pure pattern recognition. So the brain must rely on different sources of information. And these images at the bottom reveal more information about the internal properties of the objects shown and you get an even stronger impression if I add motion to it.
So the characteristic ways in which objects fold, fly, deform, or move often tell us about their properties. But this is computationally challenging because the specific behavior depends also on the external forces. So for example, how much this object deforms depends not only on the softness, but also in the weight of the object that's falling onto it. And the brain somehow has to disentangle these different contributions to the observed behavior.
So predicting the future behavior of objects then also requires a forward simulation of these estimated parameters. And this is exactly the computational challenge that I'm interested in. How does the brain accomplish this task? And at this point, you might think, this is all nice, but as soon as you showed us this picture I recognize that these objects are made of glass and therefore I know that they're rigid and not soft.
And true, much of our perception of material properties might work this way. We recognize materials and then we have learned to associate these materials with certain properties. Indeed if we show participants static images like these ones all showing the same shape but rendered with different optical material properties and we ask them to rate the apparent softness, then this increases from left to right in this case. And this is likely in accordance with priors that you have about different material types that you might recognize these objects as.
If however, we now dynamically deform these objects in the same way, and because we use computer graphics we can do this in exactly the same way for all of the objects, then the perceived softness is the same for all of these three objects. So perception is dominated by these dynamic cues and not by prior expectations you have about materials.
We tested this also with different types of motion, like the one that you see here. If you're still not convinced, then I can also remove the texture completely and show you that you will still get a sense of the softness even just with this dynamic information. It's very powerful.
Yet again, the brain somehow has to disentangle the forces applied to these objects and its material properties. That's exactly what my research is focused on. How the brain solves this problem and how we visually estimate mechanical and material properties. And today I want to present a case example to study this phenomenon, and that's elastic bouncing cubes. Here are some examples.
When I say elasticity, what I mean is the coefficient of restitution like you learn about in high school. So how much energy basically is retained at each bounce. If you look at these videos here, you probably find it very easy to see that the clips on top show objects of lower elasticity and the ones on the bottom are highly elastic. This is almost trivial, right?
If I stop the videos now and overlay the individual frames, I can illustrate that the exact trajectory depends not only on the elasticity, which is the same within each row, but also on other factors. So to create these little videos I also manipulated where the object starts and whether it has an initial velocity for example.
So now there is no direct mapping between the visual information that reaches our eyes and the elasticity of the object. So the key challenge for the brain, and therefore also for our model that we want to develop, is to find the features that vary with elasticity but factor out all of the other latent variables such as the initial speed.
In order to study this phenomenon, we have the following approach. We first created a benchmark data set of bouncing cubes that vary in elasticity but also in other factors. You see some examples in the background. We use this data set to test human perception and to identify diagnostic and visual features that may guide elasticity perception and we then test these features in more experiments.
So for the remaining part of the talk, I'm going to first briefly describe this data set, because that's what all of my experiments are based on, and then I will present a study that's focused on the visual estimation of elasticity in these stimuli, several experiments in which we aim to measure the human accuracy and also tease apart the visual parameters that influence the judgments, and then I will present another study in which we asked whether people can also make predictions about the future status of these objects.
OK. So for this data set, I created 100,000 simulations of bouncing cubes like the ones that you have already seen and I simulated four seconds using the physics simulator of the software Real Flow. I randomly varied the initial x, y, and z rotation velocity of this cube. So it could start like this or like this or like this or 100,000 other ways. We also systematically varied the elasticity of these cubes.
So we have 10 latent variables that determine each trajectory and we're trying to figure out just one of them, elasticity, and ignoring all of the others basically. So here you can see again some examples just to illustrate the diversity and also show why we use cubes and not spheres, which would be more classic maybe, because they produce more diverse trajectories.
In our first step, we want to test how accurate observers can estimate elasticity. So we selected 150 random simulations and in an experiment we showed each animation to observers. They can see the animation as often as they like and it's just interleaved with this noise pattern so they know it's starting all over again until they make their estimate and simply adjust the slider to say whether it's elastic or not. In all the experiments I'm talking about look basically like this.
OK, so here I'm going to plot the results of this first experiment, perceived elasticity as a function of the physical elasticity. Here's the average of 15 observers. Each corresponds to one simulation and the color indicates the physical elasticity. You can make two different observations here. The first one is that observers are quite good at this task. As you can see, the upward trend and the high correlation between the two. So the estimates increase systematically with physical activity.
The second is that there are differences between the ratings of the same elasticity. So there is no perfect constancy. The ratings are also influenced by some of the other factors that we varied. Another observation that is not shown in this plot but we measured separately is that observers are very, very consistent. So the pattern of errors is systematic across different participants.
So observers are good at this task and to get more insights into how they do it, we conducted another experiment in which we systematically limited the visual information. So in this experiment, observers either saw full renderings of the scene as you have seen before or they saw a version without the background, but sort of the motion of the cube is the same, or I'm just going to play this again, they saw a version without the cube deforming or rotating.
So here the cube is rigidly following the same path with the same speed. So we had 30 simulations and all these three conditions and we want to compare whether the ratings are the same or not. And here are the results. As you can see, the three different conditions, I'm not sure whether you can actually differentiate the different colors of the lines. That doesn't matter because they are basically the same.
So the curves are very similar. They have a comparable slope and also comparable patterns. So even this dip here that was driven by one specific simulation that all observers consistently and in all of these conditions rated as much lower in elasticity than it really is. So it seems that the trajectory alone without all of the background information is enough to judge elasticity. Now the question is what about this trajectory contains the information.
We propose that humans use 3D motion features to estimate elasticity. These are typical characteristics of the behavior of bouncing objects that depend on elasticity. For example, the number of bounces or the rebound velocity. And these features capture the resemblance across different scenes of the same elasticity, such as the shorter trajectories up here, and they are preserved even though the exact trajectory varies from scene to scene. I'm going to go through some example motion features to explain to you better what I mean with this.
So an obvious candidate for a visual feature of elasticity is bounce height. It's very close to the physical concept the coefficient of restitution and it has actually been suggested by previous work on elasticity, but they have only used much simpler 1D or 2D motion. To see whether this plays a role in our experiment, we can now take our simulations of which we have ground truth data and look at how the height of one example cube changes over time.
We can determine the bounce heights. That's what you see in red here. Now because we want to have just one estimate, not in this case six different ones, we can take the maximum or the mean, we try both of these, and see whether that is a good feature for elasticity. So we did this for all 100,000 stimuli and then we can see how this feature is distributed across different levels of elasticity.
That's what I'm going to plot here in this histogram. It's actually 10 different histograms, one for each level of elasticity. The fact that they all overlap tells you that this feature doesn't really differentiate between different levels of elasticity. So in a little bit more complex and more naturalistic scene bounce height is not actually very diagnostic for elasticity. The shared variance is only about 6%.
So let's look at another feature. A straightforward strategy for observers would be to just sit there and count the number of bounces off the floor. The more elastic the object is, the more often it's going to bounce on average. We can count the number of bounces for each trajectory. This is represented in the red dots here.
Then again, we can do this for all 100,000 and see how this feature is distributed over the larger range of elasticities. That's what you see here. This is again 10 different histograms with the physical elasticity color coded. The fact that we see this color gradient from blue to red shows us that the number of bounces really systematically varies with elasticity. In fact, both variables share almost 80% of variants.
We didn't just come up with two features. We came up with a whole list of different features. Some of them capture the characteristics of specific bounce events, others integrate the statistics of the motion over time. Here's the whole list. I don't expect you to start reading all of these. I'm going to go through the important ones during the talk. This is more for completeness.
So we now have 28 different motion features and all of them are stimulus computable. They're not image computable because we don't intend to also solve depth perception here, but we take this for granted and calculate everything on the underlying 3D data. So they are based on observable quantities, the positions, and the changes of position over time.
We make no assumptions in this model of how these features are actually estimated by the visual system or how noisy these estimates are. Here they are based on perfect knowledge of the positions. In that sense, they are an ideal model.
Because it's a list of 28 different features, we first wanted to test whether-- Before we test whether observers actually use any of these, we wanted to see whether we can narrow down our hypothesis base a little bit. We started by asking which of these features are actually good at estimating the physical elasticity. So here I'm going to show how much variance in terms of physical elasticity they can explain.
Again, you don't have to read all the labels. I'm going to point out to you the most important ones. So as you can see, this varies quite a lot between features. Some of them, like the movement duration or the number of bounces, are the features that sort of summarized motion statistics. They seem to be very good. Whereas others, for example the average height of the cube, seemed to be used to estimate elasticity.
We decided to eliminate all features that explain less than 5% of the variance. They just don't seem to be a good hypotheses to start with. So we continue with 23 features. These are complementary in that they measure different aspects of the motion trajectory, but over the whole data set they are also correlated quite strongly, actually.
So what we did next is a principal component analysis with these features. So we calculated all features for all 100,000 and then run a PCA. Here I'm going to plot in the space of the first two principal components all of the 100,000 simulations. Each is represented by one dot.
You can see that the first two principal components explain quite a large proportion of the variance already. In fact, we maybe even only need about six principal components to explain the variance. But that's not really what we are interested in, the variance in the motion features. What we want to know is the variance in elasticity.
So here I'm color coding each simulation by its elasticity. As you can see we get this color gradient along the first principal component from low elastic simulations on this side to highly elastic ones on the other side. This line here shows how that the-- explain variance in terms of physical elasticity doesn't increase if we add more principal components. So this single component is enough to explain most of the variance.
So a weighted combination, as in this principal component, seems to be a good proxy for elasticity. So I'm going to treat this now as our predictor for elasticity. We can also show that this predictor is very specific. So here I'm plotting-- this is basically the number you have already seen, how much variance this principal component can explain in terms of elasticity. Here I'm plotting it for all of the other variables.
It's not that I'm going to animate this plot and the bars will show up. They're already here. They're just so tiny that you can't see them. So we have identified a combination of motion features that is able to successfully estimate elasticity while ignoring all of the other latent variables.
So if the visual system wants to estimate elasticity from 3D motion features, then this combination of features would be very good to use. Yes, so if the visual system wanted to estimate elasticity, then this would be a very good way to do it. This combination is just emerging from observing the variance in lots and lots of examples. So that's really cool.
Now the key question is, can we actually explain perceived elasticity? That's what I will show you here. So perception as a function of this multi feature model. And again, this is just principal component 1. Indeed, we can predict data from experiment 1 very well. And this is impressive. Again, we did not fit the model to perception. We basically just observe the covariance in this large data set.
However to our surprise, we also found that some of the individual features predict perception as good as the multi feature model. Here I'm showing the best one. It's movement duration. This was surprising to us because we were convinced that a combination should be much more robust and if you want to measure it, you should really use this. On the other hand, if one feature does the job, then in terms of the simplicity this is the favorable model.
To investigate this further, we looked at the productivity for each feature. So here I'm plotting how well it can explain the physics. So is this a good predictor to use? And here I'm showing how well it can explain perceptions, so weather observers actually used this or whether the data is in line with this feature.
So the red dots show it's the model. That's basically the data I've already shown you. This is the best feature movement duration. Here are all of the features. Again, you don't have to read all the labels. What I want to show here is that there are actually quite a few different features that are good candidates that can explain the perceptual data. The nice viewing here shows how well the variants can explain each other.
So what does this mean now? Do observers used a combination of features or do they just rely on a single feature? Since the features and model and the physics are all highly correlated, based on our data, we cannot really differentiate between those. So what we need to do is we need to decouple them systematically.
This is what we did next and I'm going to walk you through it. So for this next experiment, we simulated another 100,000 stimuli of just one elasticity. So all of them now have the same medium elasticity and we have decoupled everything from ground truth. So if we now find perceptual differences, then they are all caused by the visual properties of the stimulus and any model of perception has to be able to explain perceived differences in this data set.
I'm going to walk you through one example for one feature. How we decouple now the feature also from the model predictions. So here I'm plotting future prediction against model prediction. Then this area shows the space of all possible features in this data set of just one elasticity and there is a linear relation between the two that are highly correlated.
However, we now select 10 different stimuli that you see here in the colored dots for which the feature and the model are essentially uncorrelated. So the same stimuli, if I order them according to this order, if the model is true, then the perceived elasticity should increase. Whereas if the feature prediction is better, then it should increase if I order them according to this order.
So we showed these 10 stimuli to a new group of subjects and asked them to rate the elasticity. I'm going to show the results here. First perceived elasticity as a function of model production. You can see, so this is now not average data but individual data, that's why it's a bit more noisy, but you can see a clear linear relationship between the two. And the correlation is high.
Now I'm taking the exact same data and I just shuffle around the order on the x-axis so that it is in accordance with the feature prediction. We can see that there is basically no correlation between perceived elasticity and the future prediction. So for this specific example feature, if it is brought into conflict with the model, then the model does the better job of explaining the data.
We follow the same logic for all 23 features. I'm not going to show you 23 of these plots. Instead I'm going to take these correlation values and put them in a new plot. That's what I'm showing here. So the correlation between models and perceptions and between features and perceptions.
Everything below the diagonal means that the feature can explain the data better, everything above the diagonal means that the model can explain the data better. The noise ceiling is again how well observers can explain each other. The feature that just showed you falls here, coincidentally directly on the noise ceiling. Here are all of the 23 features.
If the symbol is white, that means there was no significant difference between these two correlations. So for most features, the or is the better predictor, but there is one feature, and it's movement duration, that can explain the data better than the multi feature model.
In post hoc analysis, we also found that the movement duration can explain all of these high correlations as well. So if we factor this out, then this will all drop. So it seems like movement duration wins and is the best predictor for perceptual data.
I'm not going to show you more data plots. Instead I want to show you an example. So here we see the same physical elasticity and what's varying from left to right is the movement duration. This probably makes you perceive them as different in elasticity. I find this effect quite striking. All of this is basically an illusion, because physically they are the same.
Now that works pretty well, but we were still puzzled and surprised by this finding that humans should just rely on this one very stupid cue, if I may say so, when a combination seems much more robust. So we were wondering, what if we removed movement duration? Are people unable to judge elasticity or are they switching their strategy? This is what we tested in the next experiment.
This is how our stimuli looked like. So we took the same stimuli in the first experiment but we only showed the first second, like you can see here, and then ask participants again to rate the elasticity. You can judge for yourself whether you are still able to see the differences in elasticity.
So here I'm going to plot the results of this experiment, the one second ratings as a function of the ratings in the four movies. As you can see, they correlate almost perfectly. So people are as good as before, even though their most important feature is not available anymore. So they have to use some sort of different strategy.
Again, there are several. So this is the same plot as before, now only calculated for one second movies, and we see that there are, again, several different features that could explain the perceptual data. So now in order to decouple the different hypotheses from one another, we follow the same logic as before and ran a separate experiment to decouple the physics from the features and the features also from the model.
We followed the same logic as before. I'm not going to walk you through all of the steps again, I'm just directly showing you the end result. So here we found that there were two features, but mostly it's maximum bounce height that can explain the data better than the model if we bring them directly into conflict.
Instead of showing you more data plots, I'm going to convince you with an example instead. So here again the physical elasticity is the same but what's varying is the maximum bounce height. You probably perceive this one to be much more elastic than the other ones. So yes, when the movement direction is not available as a cue anymore, people seem to flexibly switch to a new strategy.
OK so now I want to sum up this first part. Visual estimation of elasticity, we found that human observers can accurately estimate elasticity based on visual information from the motion trajectory. We found that many 3D motion features are diagnostic foreign objects elasticity. Our results suggest that people use a resource rational strategy. They rely on the single best feature and they can flexibly switch to another one if this information is not available anymore.
This implies that observers represent multiple features simultaneously. Potentially they are derived from a generative physics model. In order to probe this idea that observers still have sort of an underlying richer physics model, we need to use a different task that cannot be solved with heuristics like this one.
This is what we did in the last experiment that I want to talk to you about today in which we asked participants to make predictions about the future state of the object. The problem with these features is that they are all correlated with each other and in order to basically show you these examples I had to cheat a little bit.
So the D correlation basically only works if I take all of the 10 stimuli, like if I just randomly pick three stimuli out of that set, then some features can still be correlated with each other, but I cannot-- It's very hard to show you 10 at the same time. That's why I picked three here. And it might be that in these three there are also other features that could explain your perception. But in the experiment we sort of control this better, I would say.
Let's jump back to the visual prediction task. So we know from other studies that observers can predict the future behavior or status of objects. For example, whether this block tower is going to fall left or right. In the first part of the talk, I hopefully convinced you that observers are able to visually estimate elasticity also in this short movie clips of just one second.
Yet the internal physics model is often reported to be noisy and imperfect. Given bouncing cubes this is very challenging because even small deviations of individual parameters can drastically alter the trajectory. So here I am showing the trajectory of a cube that's falling down under gravity from this position and with this orientation.
If I rotate the cube by just 5 degrees around one axis then I get a different trajectory. If I keep doing this and rotate it by 5 degrees I get this whole family of trajectories. This shows that predicting bouncing cubes is very difficult. So we want to know how accurately observers predict the future path of bouncing cubes, how consistent are they, and how do they solve this task?
So this experiment doesn't have a slider. This looks a little bit different. So we again show these computer simulations then the movie stops midway and participants were asked to predict where the cube will bounce and finally land. In order to do this they can move the cursor and it moves along the 3D walls and they simply click on all the bounce locations one after the other like you see here.
In the experiment we again begin with a short practice for them to get familiar with the setup and then we show 30 test stimuli and observers have to predict the future ones' positions. We repeat these same 30 trials three more times during the experiment in order to see how variable the predictions are within individuals.
Then in between the test blocks, we had three training blocks. The task is the same, but in each block we show a new set of stimuli and half of the participants get feedback after they make their predictions. So they will see their response together with the end of the movie.
For the other half of the participants, they actually don't notice the difference between test and training trials because it's just the same. So with this we want to see where the accuracy can improve with training compared to this control group. So again, whether they learn something or use some strategies they come with into the lab.
Then crucially, we also have this last block in which we again show the test stimuli, but instead of asking participants to make predictions we show them the bounces and ask them to draw what they saw basically to get an upper limit of their performance. Because this all sounds maybe a little bit complicated, I made another slide to show the difference between prediction and reconstruction trials, because that's crucial.
So on each prediction trial observers see the first second of the movie and then they have to predict the rest of the motion trajectory. It can be between half a second or up to three seconds. They don't know this, of course. At the end of the experiment in the reconstruction trial, we use a subset of these stimuli, but we never show the first second. We only show this bit that in the previous trial they had to predict it. But here we show it and they simply have to draw all the bounces that they observed while they did it.
So with this, we can measure how a perfect prediction would look like if they were perfectly able to do this and the only noise is because they are not able to click directly where they intended to or something like this. OK, so how accurately can observers predict the positions? In this plot I'm showing the normalized frequency of errors and the errors are always given in meters. I forgot to say probably about the room is 1 cubic meter in size.
This is the data from the prediction trials. As you can see, the arrows are smaller than can be expected by chance which you see in gray. However, they are larger than when you compare them to this reconstruction trial in which they could observe the bounces. In the second plot here I'm showing again the error, but now the size of the error depending on the number of consecutive bounces that they have to predict. This increases in the prediction trial. So the errors get larger the further you have to predict into the future, if. That makes sense
This is also shown here. So for the first five bounces I'm showing the full width have maximum distribution of bounces. And now this is all centered so the two bounce position is set at true zero. You can see that for prediction trials, the further you have to predict in the future the larger the error gets. And for reconstruction trials as a control this is not the case.
We can also see that the errors at directional, so they are larger in depth, which makes sense because we have a 2D projection of the 3D room. OK, so does accuracy improve with training? To answer this question, we compared the error in the first block, light green, compared to the last block for the training group and for the control group.
What we found is that overall, participants get slightly but significantly better over time, but there is no difference between the two groups. So it seems like they use the knowledge or strategies that they have acquired outside the lab and they probably only get slightly better because they know how to move the mouse in the 3D world.
OK, so they make some errors. I've already shown you that. Now the question is are these errors systematic or are they just random noise? So how consistent are observers? This is one example stimulus. It's a difficult one because the cube is now bouncing towards the observer.
In black I'm showing the true bounce positions connected with a line. And in purple, you see the reconstruction trial of one example participant. Here you see four predictions from that same observer collected at different times during the experiment. As you can see, although this observer predicted the wrong direction, they did so consistently across all repetitions. So this is an example for the high consistency we found within individuals.
Now let's look at between individuals. Here's another stimulus example. The black line is again the ground truth and each line here represents the prediction of one of 40 observers. Participants are not only highly consistent when measured repeatedly, also different observers are very consistent with each other. So the predictions are imperfect, but they're very systematic.
In fact, here I'm going to plot for each stimulus how far the predictions were on average from the ground truth versus how far they were from the average observer. So points below the diagonal means that the other participants can explain the data better and points above means that the ground truth can explain the data better.
Here is the data. So the other observers can predict data as good as or better than ground truth does. This shows again that the pattern of predictions is very consistent between different participants. OK, so it seems that they use some sort of simplified model of the physics, because they make these systematic errors, and we were wondering whether maybe they mentally simulate a sphere instead of a cube. We tested this.
In order to do this, imagine this is the room and in black, you see the observed part of the trajectory, and then in blue you see where the cube moved. So this is the bit of the motion that had to be predicted in the experiment. Now starting from this position where the movie ended and with the same speed and everything, we simulate the path of a sphere. This is what you see in red. And now the predictions from our participants fall right in between these two predictions.
I did not connect the individual predictions with lines because it was very chaotic, but it's not the case that some of them follow this and the others right here. They are just all over the place. OK, so we did this for every stimulus. And here I'm going to plot the average distance of the predictions to the prediction of a cube or the sphere. So if the sphere is the better predictor then everything will fall in this red area.
This is where this example falls and here are all of the stimuli. As you can see, this sphere's path does not seem to explain the data any better or worse than the cube does, so we probably have to include some further constraints, such as noisy estimates of velocity or position in order to better model the responses. But that is still ongoing, so for today I'm going to stop and summarize this study.
So we found that observers are surprisingly accurate in visually predicting the future bounce locations, the predictions are imperfect yet they're very systematic, and they probably use an internal simplified physics model.
To bring both of these studies that I talked to you about today together, we found that observers are very accurate and very systematic in visually estimating the elasticity. Their estimates seem to be based on single features, and they can flexibly switch to new features of the information is not available anymore.
So this is more an approach of using fast heuristics. Whereas if they have to predict the future path of objects, participants are still very systematic and surprisingly accurate, but here they need to base their answers on a forward simulation and they probably have a simplified or noisy physics model. Taken together, I think this shows the high flexibility with which the brain solves these different tasks.
It also shows maybe resource rational strategy in which, if you can get away with fast heuristics, then people maybe choose this cheaper strategy in the case of visual estimation whereas, if the task requires a more accurate representation and forward simulation, then observers will use this more costly strategy.
There are, of course, a lot of open questions. For example, how the brain switches between these two strategies. We did not explicitly model this here, so I think that's a missing bit. Also how the brain learns these representations in the first place and whether our results generalize to other scenes or properties or objects. Then also what is the neural basis of dynamic material perception?
I could continue with more questions that I have, but for today I want to thank you for your attention. I thank Roland and Florian at Giessen, and Josh here for contributing to this work. And I'm happy to take questions. I'm going to leave you with the summaries slide.
Associated Research Module: