Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset