Fix a segmentation fault in multi-thread code

The segmentation fault happened while using arch=x86-linux-gcc, and
-DSANITIZE=integer. It crashed at
  memcpy(&lf_data->planes[i].dst, &pd[i].dst,
         sizeof(lf_data->planes[i].dst));
That is in loop_filter_data_reset() function.
It seemed that MOVAPS was used in memcpy(), which required that the
memory operand must be aligned on a 16-byte boundary.

First fix(patch set 1): Aligned lf_sync->lfdata to 16 bytes during
memory allocation.

Second fix(patch set 2): The above fix was not quite robust. Changed
to copy each element instead of calling memcpy. (Note: we tried
"lf_data->planes[i].dst = pd[i].dst;", that didn't work either.)

BUG=aomedia:1856
BUG=aomedia:1859

Change-Id: I2d2d4bae1c1546ed7b7e61bafea2334124079719
diff --git a/av1/common/thread_common.c b/av1/common/thread_common.c
index d3ce1d4..3884954 100644
--- a/av1/common/thread_common.c
+++ b/av1/common/thread_common.c
@@ -124,7 +124,18 @@
   lf_data->cm = cm;
   lf_data->xd = xd;
   for (int i = 0; i < MAX_MB_PLANE; i++) {
-    memcpy(&lf_data->planes[i].dst, &pd[i].dst, sizeof(lf_data->planes[i].dst));
+    // TODO(yunqing): Copying dst struct by "lf_data->planes[i].dst =
+    // pd[i].dst;" or "memcpy(&lf_data->planes[i].dst, &pd[i].dst,
+    // sizeof(lf_data->planes[i].dst));" would cause a segmentation fault while
+    // using arch=x86-linux-gcc, and -DSANITIZE=integer. It seems that MOVAPS
+    // is used in memcpy(), which requires that the memory operand to be
+    // aligned on a 16-byte boundary. This copy needs to be written efficiently
+    // once Clang fixes this issue.
+    lf_data->planes[i].dst.buf = pd[i].dst.buf;
+    lf_data->planes[i].dst.buf0 = pd[i].dst.buf0;
+    lf_data->planes[i].dst.width = pd[i].dst.width;
+    lf_data->planes[i].dst.height = pd[i].dst.height;
+    lf_data->planes[i].dst.stride = pd[i].dst.stride;
     lf_data->planes[i].subsampling_x = pd[i].subsampling_x;
     lf_data->planes[i].subsampling_y = pd[i].subsampling_y;
   }