Tutorial: Psychophysics and Data Analysis (44:52)
Date Posted:
August 15, 2018
Date Recorded:
August 15, 2018
Speaker(s):
Kohitij Kar
All Captioned Videos Brains, Minds and Machines Summer Course 2018
Description:
Kohitij Kar, MIT
Introduction to common psychophysical methods, including magnitude estimation, matching, detection and discrimination, the two-alternative forced choice paradigm, psychometric curves and signal detection theory, and using Amazon Mechanical Turk for large-scale experiments.
Download the tutorial slides (PDF)
KOHITIJ KAR: I'm going to talk about psychophysics and data analysis. I didn't put that here because I'm not going to show much about data analysis today. So it's kind of ironic, maybe, that the last tutorial is about psychophysics. Because typically, when we think of problems in neuroscience, behavior almost motivates every other problem.
Because if the brain wasn't able to produce the behaviors that we thought of display, we wouldn't have studied the brain. We don't study our toenail as often as we study the brain. So obviously, behavior is important. And it motivates a lot of problems.
If you kind of remember, this was one of my slides last time, kind of give a picture of how people think in system neuroscience, like there's a sensory stimulus. And then this stimulus goes into the brain and evokes some perception, sensation, whatever you may call it. The example I gave was a glass of water.
You can see a glass of water. And I can ask you, was there water in the glass? And you needed to have certain sort of perception evoked through this image to answer me correctly.
So if I show you this glass, the answer will be no. If I show you this glass, the answer is yes. And those are based on some sort of perception that is generated by this. I spoke about studying encoding models, studying decoding models. And I also showed you this other link that goes directly from the sensory stimulus to the perception.
So today, I'm going to talk about this link. I think last time [INAUDIBLE] spoke about encoding models. I spoke about decoding models. So today, I'm going to just let the brain [INAUDIBLE].
So psychophysics is a quantitative study of relationship between this physical stimulus. Could be, like, visual, auditory, touch. And the perception that it basically evokes. In this case, I will be mostly talking about humans.
So the whole tutorial is kind of divided into three parts. The first part, I will talk mostly about three methods of measuring perception. The second part I'll talk about a specific task, two alternative forced choice experiments and a little bit introduction to signal detection theory.
The third part I will give you a brief introduction to Amazon Mechanical Turk. None of them are comprehensive. So if you're interested, I can post more stuff on Slack. Or you can pick up a book and read about any of these items.
So the first, you should treat this tutorial as, like, a global overview. And maybe some take home messages from each one of them, what is good, what is bad. This is not a comprehensive review of any of these particular techniques.
So three methods for measuring perception-- so the three primary methods that I'm going to talk about are magnitude estimation, matching, and then detection and discrimination. So let's see an example of magnitude estimation. OK. All right.
So I'm going to start doing an experiment here. So you can maybe tell me what you think of this. So here is a line that-- I'm calling this 50. So you can think of this whole screen as 100, and this is the center.
So the task would go something like this. You would fix it at the center. I would tell you that the top line is 50. What do you think this line is? So what do you think this line might be?
AUDIENCE: 35.
KOHITIJ KAR: 30, 30, 35? OK. So I'm not going to do the whole experiment here, though. But you will get a lot of lines like this. I can put arbitrary numbers here and then keep going. You will see, like, there are many lines that will be shown.
And at the end of the day, I can plot, given the true distance of the true length of this line, what was the perceived magnitude of this. So this is one way of basically knowing our idea of how long things are. So if we want to probe how long we think things are, we can do it like this. , Basically reference to some ground truth, like this.
Similarly, I can ask you what is the brightness of this dot. So I don't think you even saw that. So what's going to happen is, if you fixate, there will be a dot that comes up around here that will be of intensity 50.
And then you will see another dot come up. And the task is what is the intensity of that dot compared to 50. So I just put some random number here. I just put a 0. So you can see it right now.
Does anybody see that dot? It's really, really-- I think the light is not helping at all. I don't think you will see the dot. It was very light.
Anyways, what happens, generally, is that I think it's intuitive enough that at the end of the day, you will get a graph like this. You have basically changed stimulus intensity. And then you can plot what was the magnitude estimate that you get.
So you'll get a nonlinear curve for brightness. And you will get a straight line for apparent length, which was the first task. So this is an interesting point here, though.
So this was a behavior that we observed when some people went to model this behavior. So Stevens came up with this power law that encapsulated a full range of possible data sets, like, a lot of variations in dot intensity brightness, length, color. So he came up with this formula, which is basically a model with only two free parameters, which is one is a constant that controls the size of the response. And there's another exponential which controls whether it's linear or linear.
So this one is a straight line. And if it's not one, it's kind of curved. So here is an example where many, many behaviors were explained by a very simple model. And depending on who you are listening to in this course, you'll often hear things like we need understanding. CNNs are not good.
We need understanding. We need some sort of, like, really kind of very simple principles. And I think what they mean by simple principles are models that have low parameters.
Apparently, CNN has many parameters. The model. This model has a low parameter. So it's an elegant solution, according to them.
It is not necessary that the brain has such an elegant or low parameter solution. So it's just a side note.
These are many sort of continuums that were tested. And you can see people have measured. These are basically estimated the B parameters for them. So for brightness, it's 0.33. This is what we were checking.
So the next technique that I'm going to talk about is matching. So the first one was magnitude estimation. You show something, and you then ask, like, how long is it, or how bright is it.
The next one is matching. So you have two things side by side, and you have to basically match them up. So I have ground truth on one side. And then I don't have ground truth-- I have a test of basically [INAUDIBLE] on the other side.
So let's see an example of such a case. So I can put any color as the target color here. So let's say I make use of the color spectrum and I put some color here, a color that is sort of a mixture of red, green, and blue. So this is a mixture of this much amount of red, this much amount of green, this much amount of blue.
So the experimenter just gives you this sort of palette. And you have to change the knobs in these three tubes to then match the color of this color, this patch to that. Right? So what happens at the end of the day is that the subjects are going to keep turning some of these knobs and they will try to match these two things up.
So for example, the way to match this would be you go here. I mean, I can just directly click on the answer. So this is the distribution of points. To save time, this is the distribution of points that will produce the exact same color patch.
So you can imagine one experiment here. So let's say I just assume that color is a three dimensional space. And I just put anything that I see here, any color that I find from any natural image, and I keep putting these things here.
And you start pulling these things around. So for every image, you would put a dot, three dots somewhere in this space. So ideally, what should happen is, if I give you enough images, you should be able to trace out these curves. And these curves, if they have physical correlates within the brain, you might be able to use these graphs as sort of an hypothesis space.
So these three, in other terms, could become basic sets of color perception. So you can look at any region that you think, oh, this region encodes color. So let me see if the neurons or something in that region has sort of responses based on the color spectrum, has these kind of, like, profiles.
So this is kind of what happened. You read this paper. It's an old paper. They found these things in the [INAUDIBLE], as well.
So it's kind of a very similar technique. It's not as clean. The data isn't as clean. But they do kind of show that you can use psychophysics data to then go into experiments to figure out stuff.
So this worked. And this was great. Because these are only three dimensions. And then, you can basically project any color into these three dimensions.
It might not be the same for objects. So that's another question that we had, is, like, how many dimensions do the objects span? That's a separate question. We don't know the psychophysics experiment, maybe, to do for that. But this is a good example of where psychophysics really works.
OK. So the third one, and this is probably the most used one nowadays, is detection or discrimination. So [INAUDIBLE] subject's task is to detect very small differences in the stimulus. So typically, this is done by using three of these methods, either of the three. Some of them are not so good. Some of them are kind of OK-ish.
So method of adjustment-- so you can think of method of adjustment as this example-- that you get fitted for a new eyeglasses prescription. Typically, the doctor drops in a different lens and ask you if this lens is better than the other one. So the method of adjustment is very well-seen in this experiment.
Again, maybe it's better to turn off the lights. I don't know. But do you see a dot? Who sees a dot in here?
Oh, good. So basically, the task is you'll see a dot that is barely visible. And then you have to-- oh. You still see the dot? OK.
[INTERPOSING VOICES]
KOHITIJ KAR: Yeah. I think that's probably OK. I think the first one was fine. Yeah. The first light levels were OK.
So here's the dot. And the task would be, like, can you reduce the intensity? You have to keep reducing it until the dot goes away.
Now it kind of looks very stupid. But this was probably the first idea that came to people's head. So don't get excited by easy ideas. That's one lesson could be had.
So again, so you keep doing it multiple times. A dot, again, shows up. And then you have to move it until you don't see it anymore. And then, every time, I record what level you went.
And then, I can basically give a number for your brightness perception, like, at what point, at what threshold you stopped seeing this particular dot. So this is one way of, again quantifying this behavior of luminous perception. I can change the size of the dot and I can ask this question again.
I can change it from a dot to a square. I can ask the same question again. So I can ask, how does that particular value depend on the shape of the image or the object that I'm showing you. So the problem with this is that it is a terrible method.
And the reason is that it depends on the subject a lot. It's a very subjective measure. So if I was a person who thought, oh, I'm going to just fool the doctor, because I know the answers, I can really play around with it. And there is no way for the doctor or for the experimenter to test how many times I'm just reporting something because I think that is the right answer and I'm kind of motivated somehow to just give the right answer and not give a good estimate of my own psychophysical state.
So that's why I don't think people use this method that much unless they have to because the subjects cannot do a certain kind of task or something. So typically, this is not used at all, the method of adjustment.
The yes/no method, on the other hand, is still used in some studies. The yes/no method, again, you change the intensity of the dots. In this case, it will be dots because I was showing dots. You keep changing the intensity of dots from a very low value to a high value, and you keep asking, do you see the dot? Do you see the dot?
So you can do staircase methods of making the dot disappear or stuff like that. So again, there was this example-- you can go -- I will give you the link to this website. And you can go and do a lot of tasks here. And I'll tell you in a second why that might be useful.
This is a task where the dot already came. I didn't cue you guys, so you have to fixate on the center and tell me if you see the dot or not. I don't think-- oh, yeah. You see it? No? Do you see it? There you go. No? That was there. It's just a small dot that's showing up here. You see it now? You see it now? Anyways, these experiments-- so if you have done psychophysics experiment, you'll be in a confined room in a lot of darkness.
So whoever said this other-- saw the dots would be part of this curve that detected the dot at a lower dot intensity. And if you didn't see it, you probably would be part of this graph. But now, I could have known this answer because there was a dot intensity even for a very low level. And I could have just said, yeah, of course there is a dot. And I can have something-- a graph that looks like maybe up from here.
So this is the problem with this kind of study is because there is no way to measure a false positive because all the trials are signal trials. There is signal in every trial. So there is no noise in any trial. So that's why I think it took a while for maybe psychophysicists to realize that it's also important to give catch trials in experiments where you know that the ground truth is that there is no dot. But the subject has some prior bias and brings that in.
So as I said, all the trials here were signal trials. And there are no catch trials. So you only get hits and misses. We don't get any estimate of the false alarm. So that was fixed in the false choice experiment.
In this experiment, you have to say yes or you have to also say no it wasn't present. So your correct answer matters. So this was fixed for the first trials experiment. I'm going to start showing you this specific forced choice experiment now, which is a two alternate forced choice. So there will be two alternatives, and you have to say is that alternative a or alternative b. For example, I can show the dot on-- sorry, go ahead.
AUDIENCE: [INAUDIBLE]
KOHITIJ KAR: There is no way to get to a catch trial in those previous experiments without having a condition that has no signal. So if every condition that is tested has a signal, then there is no way to test whether there is a subject that is just knows that-- oh, it's going to be this answer, and keep saying yes.
AUDIENCE: [INAUDIBLE]
KOHITIJ KAR: I think there are examples in the history of psychophysics where-- yeah.
AUDIENCE: [INAUDIBLE]
KOHITIJ KAR: That generally doesn't happen. I think we don't hear about them because we have gone past those stages. But I think sometimes there are still studies you'll find that where they haven't corrected for the false alarm, or haven't gotten [INAUDIBLE] measures.
AUDIENCE: [INAUDIBLE]
KOHITIJ KAR: Yeah, exactly. So it's useful to remember how we got into this position, and what are the pitfalls of one versus the other. OK, so this is a two AFC task, and I was explaining this task where-- instead of the dot showing up always on top of the fixation point, it can show up either on their left or on the right or not show up at all. And by doing it this way, the answer-- you would basically get rid of the previous problem of not having any false positives.
I showed you all these different techniques to give you a brief-- maybe, a kind of history on some of these techniques that were used before. But I want to get to some real problems. What people really use in different kind of studies for-- and what kind of psychophysics is used. So I think I told you that in this lecture before, I was using ventral stream for decoding. I was going to use some studies that are related to the dorsal stream.
I also mentioned this paper last time. So I'm going to motivate the rest of the tutorial based on a task that is typically thought to be linked with the dorsal stream. And I might have criticized this task because this motion is very relevant for us. We probably look at objects moving around. But there has been more than three decades of work done on dot motion. So I cannot ignore it.
So I'm going to give you an explanation of how we think about this kind of a task where-- let's say, you have 10 dots here, and all of them are moving upward. It's called random dot motion stimulus. So that's why I'm calling the coherence 100%. If I show subjects this moving dot and I ask them, do you see the dots moving up or down? And I'm basically plotting their responses as a function of the coherence.
So in this case, the coherence is 100, and a stimulus looks like this. I show the stimulus for 100 millisecond or 300 millisecond, what do you think the proportion of upward choice will be? 1, right? Everybody will see up.
If I have no signal in this dot motion stimulus, if it's at 0 coherence, which means every dot is just moving randomly, you're going to be in the center. If you're an ideal observer, if you have some biases, you might not have that. Then again, the same thing. If the dots are moving down, you will have no trials where you actually say up. So you are perfectly correct again.
So you can test everybody in the center as well and then draw the entire graph. And typically, if you don't know that, this is known as the psychometric function. People quantify motion perception based on some properties of these psychometric function.
So the two properties that are mostly used are one is this PSE, which is point of subjective equality. You can also think of this as some kind of threshold. So point of subjective equality is because at 0% coherence for an ideal observer, there will be a chance in saying up or down because there is no signal.
The other one is slope sensitivity. So you can think of this as that how much motion energy do I need to insert before the performance of the subject goes up by a certain amount of value. So if instead of a subject that had a psychometric function like this, if the subject had this kind of psychometric function, it would mean that the subject is more sensitive to the task.
So for the same amount of increasing motion-- the motion performance, that subject needed less amount of motion to be introduced to the stimulus, right? So here is a way to think about a good subject, bad subject, or something I'm doing to the subject that is making him bad or good.
Now, these sort of graphs have been used to study certain perceptual phenomena that we observe a lot. So I'm going to show you one such phenomena you might have already seen this many times. How many guys are-- you know about a waterfall illusion? You guys know about this? Everyone knows about this.
OK, so the waterfall illusion, although it looks very different when you look at a waterfall, so I'm just going to say it anyway. So if you look at a waterfall for a long time and then you move your gaze and look at a stationary car, the car seems to be moving upward. It's because of [INAUDIBLE] caused in the neurons and adaptation and all that.
So I'll just show a demonstration of it here. So if you look at the Center for a long, long time. So here, you have to focus on the center for 40 milliseconds-- or sorry, seconds. The second was already over. So yeah, look at the center. Don't move your eyes if you want to see what's going to happen next.
When this counter goes to 0, either keep looking directly at the screen or look at the back of your hands, OK? So keep looking at it. Feel at one with the stimulus. Things are going in, things are coming out, things are happening, and it's going to change. So the moment the stimulus changes, either look at the back of your hand or look at the screen. OK. All right. Oh, OK.
So this is-- I guess, so whoever saw the back of their hand-- so if you tried later, the back of the hand trick is not going to work because you are probably already have passed the adaptation stage. But you might have seen the clouds move. And if you see the back of your hand, you see the back of your hand move.
And that's because of-- it's called motion after effect. And we have these rich phenomena in the world and we see this all the time. We are probably influenced by them. But as psychophysicists, we have decided to ignore that. We will bring this into dots, OK?
So motion after affect, the way it's studied in the lab-- so we have this nice little psychometric curve from the experiment that I just told you before. So now, I'm going to repeat this experiment. But the only difference will be that before I do the test, I will show you a stimulus that is always moving upward. So I'm going to adapt you with stimulus that moves upward, and then I will test you with all the different stimuli.
So the way this motion after effects manifests in this type of task is that if you show this for a long time and then show a random moving dot, you're going to expect your report to be somewhere down here. So you're going to see more down than up because you have been adapted to one direction. So that was the motion after effect that I showed before captured like this.
So now, you can do the entire motion strength dimension, and you'll get a graph like this. And typically, you can quantify how much motion aftereffect you have had by looking at the difference between these two points. So that quantifies this perception.
AUDIENCE: [INAUDIBLE]
KOHITIJ KAR: In this one, there is no reward for correct or incorrect answer. The subject doesn't know what is the correct or what is incorrect.
AUDIENCE: [INAUDIBLE]
KOHITIJ KAR: They're getting money to sit there and do the task. But it's mostly-- I think that kind of motivation, if you compare it to the monkeys, it's not there in this kind of task. It's most likely have to sign-- they have volunteered to work for science, and they're giving their honest answer. And it's up to the experimenter to kind of figure out how to keep things in check so that they don't get biased responses.
OK, so that is the example of the two alternate forced choice experiment that I was talking about. And I will try to motivate the signal detection theory problem with this previous task of motion direction discrimination.
So signal detection theory means exactly what it says. It's a theory to detect signals from various sensory inputs or sensory stimulus. So there are three main messages from-- that you can take home from this theory. So one is that your ability to perform a detection or discrimination task is limited by internal noise. So if you were noiseless, you would have been really good. But if you have some noise, and that is what is basically limiting your performance.
The other thing is that signal strength and criterion-- and I'm going to describe both of these soon. So signal strength and criterion are the two components that affects your decision. So where you are putting your criterion and so on and so forth. And they each have different kinds of effects on decision.
The third one is that you have to measure both hits and false alarms. So by measuring hits and false alarm, you can get an estimate of d prime, and I'll explain what d prime is. That is a measure of task difficulty that is independent of the criterion.
So let's go a little bit deeper-- dig a little bit deeper into the psychometric function, OK? So as I showed you, you have a stimulus that is going all the way up always. 100% coherence. You get a proportion of output choices one. Same thing here. 0 coherence. Your at chance. Going down, you're here.
So now, let's try to explain this in terms of a model from signal detection theory. The other thing is that-- so if you were an ideal observer with 0 noise, your graph should not look like this, slanted. Your graphs would basically look like a step function. Because anything that is greater than 0, the ground truth is it's moving upward. Anything that is less than 0, the ground truth is that it is moving downward.
So if you had absolutely no noise inside your brain-- in the detectors inside your brain-- because you think of this as I'm always thinking of one detector that is basically encoding the stimulus. And then, they are applying some criterion and producing a decision. So if the detector had no noise, this is how it should look like.
Now, let's look at the-- how you might want to think about it in terms of signal detection theory. So in the signal detection theory, you can think of-- you have a neuron that fires whenever some stimuli comes up. There's a detector that responds. But there is a distribution that basically obeys. So the detector is going to fire with some amount of noise. This is the distribution of firing rate of the detector, let's say.
And on top of that, you have some criterion. And based on both of these, you're making the decision. So let's say I'm giving you a stimulus at-- this one is at minus-- a very low moving-- so the dots are all moving down. Are you guys with me. I'm still here. So this is a demonstration of how you can think about this problem in terms of signal detection theory.
So there is some noisy responses, but they're all less than the criterion. So if the decision making system has decided to put the criterion right here, all of these responses will be classified as down. And that's what you are seeing in the subject's responses. So you move it a little bit-- still everything is below this criterion. You move it a little bit more. Again, below the criterion. So if you keep doing this, you will basically carve out the entire [INAUDIBLE].
So at some point, it leaves the criterion, it gets a little bit of signal, and then at some point it goes back right here-- right up. So because everything will be higher than the criterion. So that's one way of looking at how you can model internal noise.
So what happens is that, if you think of another human being or another subject that has detectors with much bigger noise, so this one had a tighter noise distribution, this one has much more-- much wider noise distribution. For a subject who has an internal model like this, you'll see that the slope of this graph is much shallower. So that is-- speaks to the first point of signal detection theory that it's-- your performance is limited by the internal noise model that you have.
Now, there are a lot of other theories that talks about not only internal model-- a noise model, but also external noise models and stuff like that. But I won't go into them today. This is just one way of thinking about these kind of signal detection theory based explanation of what is going on. There is no guarantee that this is actually happening. But this is a simple model to explain this change in sensitivity of the subject.
All right, so I haven't said this before, but just to tell you in more concrete terms, that these are the four main things you're looking for during these kind of tasks. So you want to know when the subject is correct, and you also had the correct trial. So basically, if you're doing a dot detection task, if your dot is present and the subject says, yes, it's a hit. If the dot is present and the subject says no, it's a miss. If the dot is not present and the subject says, yes that's called a false alarm. And if the dot is not present and the subject also says, no, he's still correct-- she is still correct. That's a correct rejection.
And you can think of each of these in terms of models of internal lines of detectors. So here is a graphical representation of that scenario where you have your noise distribution, you have your signal distribution. And you can put a criterion on these two distributions and choose what's going on. So once you get the data, you can then from the data depending on some assumption of how much how wide these distributions are and where you can put the criterion, and you can basically model that behavior. So that's pretty much the idea that, based on these factors and where you put the criterion, you want to quantify the behavior.
I spoke about d primes before. So if you have your noise distribution somewhere here and you have your signal plus noise somewhere a little bit forward. So you can increase the signal in sensory intensity. And the separation of these two distributions divided by the spread, which is quantified by the standard deviation of the distributions, that's typically the way you quantify a d prime.
Here is a little demonstration of how we make these curves-- receiver operating characteristic curves based on these kind of data. So you can think of these as, again, one is noise, one is signal. And your plotting hit rates versus false positives. If you increase this mean, this is going to go slightly up. So the area under this curve is typically what is of interest was. As the noise separates, this area under the curve becomes larger and larger. When the noise on top of the signal-- there is a line at the unity.
I think the way we should think about them is that these are different ways of quantifying the behavior. And once you quantify the behavior in some way, and you want to test some other model of that particular behavior, you're going to recreate this kind of analysis and check whether the other model, basically, has the same parameters for this kind of characteristic curves.
So I think that-- at least that's the way I think about these things. Because these don't tell you about the internals of the system at all. This is more like a very abstract modeling of behavior, so to say.
OK, so the last thing, and I wanted to start this with the introduction to not Amazon Mechanical Turk, but the original Mechanical Turk. So how many of you know the history of Mechanical Turk? Oh, OK. This one, not everyone knows. That's good.
So before Amazon Mechanical Turk, you have the real mechanical Turk, which was to impress the empress of Austria. And I'll play these videos. Maybe you can follow, yes.
[VIDEO PLAYBACK]
- Built In 1770 for an Austrian empress--
[INTERPOSING VOICES]
- --traveled through Europe playing chess and defeating
[INTERPOSING VOICES]
[END PLAYBACK]
KOHITIJ KAR: So this was a fake chess player, basically. So it looked like this-- OK.
[VIDEO PLAYBACK]
- Built In 1770 for an Austrian empress, this life size automaton traveled through Europe playing chess and defeating commoners and kings alike. After a century long career, he's recently been restored to full working order.
With lifelike movements, an error in judgment, and his expressionless face--
[END PLAYBACK]
KOHITIJ KAR: So he would play chess. And so the idea was that just people claim that this was some kind of automation, and they figured out how to play chess. But this was a fake thing. There was a guy who sneaked in who was a chess master and used to play chess. And then, he went around Europe foolish [INAUDIBLE]. I think like a lot of famous people got tricked into this.
Some point, they--
[END PLAYBACK]
KOHITIJ KAR: Called him out. Anyways, so that was the history of Mechanical Turk. But Amazon also probably following the same way of operation decided that they will have Amazon Mechanical Turk. And I'm going to show you a little bit about how to maybe run a simple task. So you have to create an account on Amazon Mechanical Turk.
And so there are two ways you can operate. You can either operate as a developer-- you can operate as a worker or you can operate as a requester. So if you're a worker, then you sign up with all of your details and then you do the tasks. And it doesn't pay a lot, but there's a lot of interesting tasks that you can do in here.
As a requester or a developer, you can upload your own tasks and run them here. So typically, you can either-- so the way it works is that you have to design HTML file or something that runs online. And you don't have to follow their project guidelines or anything. You just have to have a way where you're pulling up HTML files, and the HTML files has links to some sort of server that's online.
So in our lab, we typically use Amazon S3 to store our images or what-- videos or whatever we're going to show to the subjects. And online, we basically make a call to those particular URLS and put them on the screen. You can do it in different ways. We have used JavaScript a lot to basically program these HTML files.
For example, you can go here in the create folder, and you can use one of their default projects. You can see they have many projects here. One thing would be choose image a versus b. So pick the image that you like more. This or that. So you can have your own task, basically. Or if your task is aligned with what they already have, you can use their template to launch them.
Once you launch them, they're called HITS, Human Intelligence Tasks. They're call HITS, and then you'll have workers working on those HITS. Typically, I came from a lab where we used to work in the psychophysics rigs. And it was a big deal to get a lot of subjects. So we would get maybe 30 subjects a month or so, if you were lucky.
Here, I got-- the first day I ran this, I got around 500 subjects, and it was amazing. The problem that people would bring forward is that, well, these are subjects who are sitting in their house and they're taking breaks and they have different distance from the screen. You don't know where they are looking.
So if your experiment is really dependent on eye position and eye tracking, I agree that this is a difficult setup to justify. But if your task is independent of those, but you are generally worried about the attention arousal levels of the subjects and stuff, what-- you can think of it this way that you will get so much data that the reliability of the mean of the effect that you're computing is going to be very, very accurate. And it will be difficult to get the same level of reliability for a mean acquired in the lab.
So we did a lot of analysis where we-- initially, we basically try to see at what number of repetitions the reliability-- the split of reliability of the data goes to 1. So we use that as a metric. You collect a lot of data, and then you split all of your trials into two halves. And then, you correlate whatever you're measuring with each other with each of those two data sets.
You'll see that as you keep increasing the number of repetitions, that correlation value approaches 1. And so we would like to operate in a regime where those correlation values are one 1. So that's one maybe thing that I learned by using-- while using Amazon Mechanical Turk. You can control for almost a lot of aspects of these subjects like where they are from, what computer they are using, whether they're using a keyboard or a touch pad.
I mean, the age is voluntarily reported, but you can ask for reports of age and stuff like that. So you can do a lot of post hoc pruning of the data, or even pre-emptive discarding of subjects based on that. You can ask the same subject to come over and over again. And I've done that in some of the papers that we have published in the lab where I would ask the same subject to do the task more often.
So once the subject does the task, you can go to manage. And here, we are not running to many tasks right now, so you don't see it. But there will be a lot of assignments here that will be filled up. So then, you can download the data. It comes in the form of a . CSV file or a JSON file. So you can unpack that and then use it for your analysis.
I really find this very exciting, and you can do a lot of stuff here. So the one thing I was thinking while I was making the tutorial is that we have had a lot of talk about like CNNs and how they can be models of the brain. And where is the brain reading stuff from, where is the decoding happening, where is the encoding happening.
One simple exercise that we can probably do at some point is that-- so I have shown you a lot of tasks before. So all of you can go do those tasks. I'm not asking you to do it. But somebody can go and do those tasks in that website. So I'll forward that website. And so once you do the task in those websites, the results are downloadable, right?
So also you can think of designing some task and running the similar tasks on Amazon. So you can get a lot of human data on tasks that are very low level tasks. Like dot-- dot intensity threshold and stuff like that. You can do these tasks. And people have theories about it. And neuroscientists have recorded neurons claiming that these neurons are responsible for this task, this is how the decoder works.
But not proper causal perturbations have been done and so on and so forth. So what you can do, you can take any of the CNNs, especially if it's a task that people have claimed is done in the ventral stream, you can do this task on Amazon Turk. And then you can ask, do I need to go and do the last layer to decode this, or is it better off if I use an intermediate layer to decode?
Stuff like that you can do. And then, those will become hypothesis based, basically. So for example, I was told this once by a professor [INAUDIBLE] that he doesn't think that IT is doing all of object recognition. If I have to do object discrimination between two objects that are really tiny bit different with each other, then we might need V4, or we might need V1. We might be reading-- the decoder is not-- it goes transformation, transformation, transformation, [INAUDIBLE]. It could be like V1-- the decoder actually needs to read from V1 at the same time.
So that's a testable hypothesis. And you need some sort of data to know what the humans are doing. And then, you can go back in the CNN and test the CNNs in the same way. So another message is that psychophysics is not limited to human subjects. Because these models are available where you can simulate the same conditions, you can do almost all of the psychophysics stuff that I've mentioned on these models. So I think that generates a huge space of hypotheses for future experiments.
Because I mean, without these models, there is no hypotheses that people have been following. So you might often find that there is a lot of confusion within-- explaining the very same kind of data. So what's going on. And there are also-- I think there is a website from Michael Bach, B- A- C- H. Bach or back. He has this repertoire of illusions. And you can make tasks out of them.
So you can basically take an illusion, and ask-- based on these methods that I mentioned, what is the best way to quantify the percent. So once you quantify the percent, you can ask can a standard deep net or recurrent deep net or something like that-- can that solve these kind of problems. They're also very interesting models of motion perception, label line models and stuff like that, tractor models. Also try those out for motion perception.
I think also-- the last slide that I had was a quality check of like Mechanical Turk versus lab. This is some CSD measure from an old lab-- old study from a lab. This is just showing that, on average, these metrics seem to be correlated across lab versus mturk subjects.
And as I said, at high level repetitions, mturk data is consistent with in-lab data. Also from some of our studies we saw that. Thanks. If you have questions-- this is more like a tutorial where I think you might want to think a little bit more about psychophysics.
Because it's often-- seems like people think it's a solved thing. We have all the metrics-- all the behavioral measures that we can think of. But oftentimes, reformatting those behaviors to match the new models is the way to generate new hypotheses for experiments. So I would encourage you to think like that. Thanks.
[APPLAUSE]