Materials make distinctive sounds when they are hit or scratched —
dirt makes a thud; ceramic makes a clink. These sounds reveal aspects
of an object's material properties, as well as the force and motion of
the physical interaction. In this paper, we introduce an algorithm
that learns to synthesize sound from videos of people hitting objects
with a drumstick. The algorithm uses a recurrent neural network to
predict sound features from videos and then produces a waveform from
these features with an example-based synthesis procedure. We
demonstrate that the sounds generated by our model are realistic
enough to fool participants in a "real or fake" psychophysical
experiment, and that they convey significant information about the
material properties in a scene.
|