最近把玩了一下 
https://github.com/anishathalye/neural-style
参考了: 
http://blog.csdn.net/v_july_v/article/details/52658965
为了跳过编译安装的坑(主要还是本地机器差。。。。。),我采用的是国际版 Azure 的虚拟机 NC6。代码和模型通过 Azure 存储的文件分享实现预先上传,然后将 SMB 共享装载到本地。简单提一下 NC 系列:
NC 系列:NVIDIA k80 GPU。双 GPU,4992 个 CUDA 核心,24GB 显存,双精度 2.91TFLOPS,单精度 8.73TFLOPS。 
NC6:6 核+56GiB 内存+340GiB 硬盘+1X K80。$0.9/小时。
最后是运行,微软的套件果然很符合傻瓜相机的思路,cd 到 /mnt/mosp 目录后,就可以直接运行:
python neural_style.py --content  ./source/WP_20170128_09_12_22_Rich.jpg  --styles ./starry-sky.jpg --output ./result/WP_20170128_09_12_22_Rich.jpg
在说我遇到的问题之前,罗列日志提示如下:
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 9909:00:00.0
Total memory: 11.17GiB
Free memory: 11.11GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y
问题 1:上面有六个 warning,涉及 SSE3、SSE4.1、SSE4.2、AVX、AVX2、FMA 是关于 CPU 计算的,在 GPU 为主的情况下,需要启用它们吗?如果需要,那么我需要重新编译一个支持它们的 tensorflow 版本?
问题 2 (重点):Total memory 只有 11GiB 多一点,而 NC6 的内存 56GiB,显存 24GiB,都不符合 Total memory 的大小啊。我当时用 top 命令查看了可用内存,有 55GiB 左右(隔了两天了,记不太清,但是 50+是肯定的)。我需要怎么改动配置信息吗?还是直接在 neural-style 的 Python 代码中通过 config = tf.ConfigProto()改变 GPU 内存分配方式?
最后说下我测试时的场景:
图片 1:300x369,迭代的时候(默认 1000 次迭代),几乎不到一秒一次。
图片 2:
手机上的照片,分辨率 2960x5258,第一次迭代,OutOfMemory
缩小一半。1480x2629,第一次迭代,OutOfMemory
再缩小,740x1315,可以迭代了,不到三秒一次迭代 -_-
不做改变的情况下,貌似也就 1024x768 或者 1280x800 这个范围了。不过只用了 11GiB,明显太浪费了,而且无法处理高分辨率图片。
谢谢!
***
PS。知乎上也提问了,也可以在上面回答: 
https://www.zhihu.com/question/61931733
PS。附上我对阿里云和 Azure 的价格对比,只针对适合计算的部分:
https://my.worktile.com/share/tasks/9f5b1ca2560c45dc9bd46e2cb7b4b379
密码:1234
O(∩_∩)O 谢谢