We develop a sampling extension of M-theory focused on invariance to scale and translation. Quite surprisingly, the theory predicts an architecture of early vision with increasing receptive field sizes and a high resolution fovea {\textemdash} in agreement with data about the cortical magnification factor, V1 and the retina. From the slope of the inverse of the magnification factor, M-theory predicts a cortical {\textquotedblleft}fovea{\textquotedblright} in V1 in the order of 40 by 40 basic units at each receptive field size {\textemdash} corresponding to a foveola of size around 26 minutes of arc at the highest resolution, ≈6 degrees at the lowest resolution. It also predicts uniform scale invariance over a fixed range of scales independently of eccentricity, while translation invariance should depend linearly on spatial frequency. Bouma{\textquoteright}s law of crowding follows in the theory as an effect of cortical area-by-cortical area pooling; the Bouma constant is the value expected if the signature responsible for recognition in the crowding experiments originates in V2. From a broader perspective, the emerging picture suggests that visual recognition under natural conditions takes place by composing information from a set of fixations, with each fixation providing recognition from a space-scale image fragment {\textemdash} that is an image patch represented at a set of increasing sizes and decreasing resolutions.

}, keywords = {Invariance, Theories for Intelligence}, author = {Tomaso Poggio and Jim Mutch and Leyla Isik} }