Allocate source buffers to be multiples of 16

Currently, when the video frame width is not multiples of 16, the
source buffer has a stride of non-multiples of 16, which forces
an unaligned load in SAD function and hurts the performance. To
avoid that, this change allocates source buffers to be multiples
of 16.

Change-Id: Ib7506e3eb2cea06657d56be5a899f38dfe3eeb39
1 file changed