frames are shown in figure 4. Our goal is to obtain a
regression network on color images, that is, a mapping from images to a real number. We will train this
network as a structured prediction problem operating on a sequence of N images to produce a sequence
of N heights, and each piece of data xi will be a vector of images, x. Rather than supervising our network with direct labels, y, we instead supervise the
network to find an object obeying the elementary
physics of free-falling objects. Because gravity acts
equally on all objects, we need not encode the
object’s mass or volume.
An object acting under gravity will have a fixed accel-
eration of a = – 9. 8 m/s2, and the plot of the object’s
height over time will form a parabola:
where Δt = 0.1s is the duration between frames and
y0 and v0 denote the initial location and velocity
respectively. This equation provides a necessary constraint, which the correct mapping f* must satisfy.
We thus train f by making incremental improvements in the direction of better satisfying this equation.
Given any trajectory of N height predictions, f(x),
we fit a parabola with fixed curvature to those predictions, and minimize the constraint loss, which is
the residual between the predictions and the parabola. Because the constraint loss is differentiable almost
everywhere, we can optimize it with SGD. Surprisingly, we find that when combined with existing reg-
yi = y0 + v0(i;t )+ a(i;t ) 2
Figure 4. Qualitative Results from Our Network Applied to Fresh Images.
As the pillow is tossed, the height forms a parabola over time. We exploit this structure to independently predict the pillow’s height in each
frame without providing labels.
Height of pillow versus Time
1. 35 40