国际版 Azure VM NC6 上运行 neural-style 遇到的 2 个疑问，主要是 Total memory 部分 #TensorFlow# #Azure VM NC6#

最近把玩了一下 https://github.com/anishathalye/neural-style
参考了： http://blog.csdn.net/v_july_v/article/details/52658965

为了跳过编译安装的坑（主要还是本地机器差。。。。。），我采用的是国际版 Azure 的虚拟机 NC6。代码和模型通过 Azure 存储的文件分享实现预先上传，然后将 SMB 共享装载到本地。简单提一下 NC 系列：
NC 系列：NVIDIA k80 GPU。双 GPU，4992 个 CUDA 核心，24GB 显存，双精度 2.91TFLOPS，单精度 8.73TFLOPS。
NC6：6 核+56GiB 内存+340GiB 硬盘+1X K80。$0.9/小时。

最后是运行，微软的套件果然很符合傻瓜相机的思路，cd 到 /mnt/mosp 目录后，就可以直接运行：
python neural_style.py --content ./source/WP_20170128_09_12_22_Rich.jpg --styles ./starry-sky.jpg --output ./result/WP_20170128_09_12_22_Rich.jpg

在说我遇到的问题之前，罗列日志提示如下：

I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 9909:00:00.0
Total memory: 11.17GiB
Free memory: 11.11GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y

问题 1：上面有六个 warning，涉及 SSE3、SSE4.1、SSE4.2、AVX、AVX2、FMA 是关于 CPU 计算的，在 GPU 为主的情况下，需要启用它们吗？如果需要，那么我需要重新编译一个支持它们的 tensorflow 版本？

问题 2 （重点）：Total memory 只有 11GiB 多一点，而 NC6 的内存 56GiB，显存 24GiB，都不符合 Total memory 的大小啊。我当时用 top 命令查看了可用内存，有 55GiB 左右（隔了两天了，记不太清，但是 50+是肯定的）。我需要怎么改动配置信息吗？还是直接在 neural-style 的 Python 代码中通过 config = tf.ConfigProto()改变 GPU 内存分配方式？

最后说下我测试时的场景：
图片 1：300x369，迭代的时候（默认 1000 次迭代），几乎不到一秒一次。
图片 2：
手机上的照片，分辨率 2960x5258，第一次迭代，OutOfMemory
缩小一半。1480x2629，第一次迭代，OutOfMemory
再缩小，740x1315，可以迭代了，不到三秒一次迭代 -_-
不做改变的情况下，貌似也就 1024x768 或者 1280x800 这个范围了。不过只用了 11GiB，明显太浪费了，而且无法处理高分辨率图片。
谢谢！

***
PS。知乎上也提问了，也可以在上面回答： https://www.zhihu.com/question/61931733

PS。附上我对阿里云和 Azure 的价格对比，只针对适合计算的部分：
https://my.worktile.com/share/tasks/9f5b1ca2560c45dc9bd46e2cb7b4b379
密码：1234

O(∩_∩)O 谢谢