というのは、手元のRadeon 4850でPCIe Speed Test動かすと
(http://developer.amd.com/gpu/ATISTREAMPOWERTOY/Pages/default.aspx)

===> Testing device 0 <===
Device type: RV770
Max resource 2D width/height: 8192/8192
Total GPU memory size: 512 MB
Total CPU cached space size: 508 MB
Total CPU uncached space size: 1279 MB
GPU engine clock: 665 MHz
GPU memory clock: 993 MHz
Number of timing loops: 100
[        16 bytes] CPU->GPU= 320.000 KB/sec, GPU->CPU 400.000 KB/sec
[        32 bytes] CPU->GPU= 640.000 KB/sec, GPU->CPU 800.000 KB/sec
[        64 bytes] CPU->GPU=   1.280 MB/sec, GPU->CPU   1.600 MB/sec
[       128 bytes] CPU->GPU=   2.560 MB/sec, GPU->CPU   3.200 MB/sec
[       256 bytes] CPU->GPU=   8.533 MB/sec, GPU->CPU   8.533 MB/sec
[       512 bytes] CPU->GPU=  17.067 MB/sec, GPU->CPU  17.067 MB/sec
[      1024 bytes] CPU->GPU=  34.133 MB/sec, GPU->CPU  34.133 MB/sec
[      2048 bytes] CPU->GPU=  68.267 MB/sec, GPU->CPU  68.267 MB/sec
[      4096 bytes] CPU->GPU= 136.533 MB/sec, GPU->CPU 204.800 MB/sec
[      8192 bytes] CPU->GPU= 273.067 MB/sec, GPU->CPU 409.600 MB/sec
[     16384 bytes] CPU->GPU= 546.133 MB/sec, GPU->CPU 819.200 MB/sec
[     32768 bytes] CPU->GPU=   1.638 GB/sec, GPU->CPU   1.092 GB/sec
[     65536 bytes] CPU->GPU=   2.185 GB/sec, GPU->CPU   2.185 GB/sec
[    131072 bytes] CPU->GPU=   2.621 GB/sec, GPU->CPU   2.185 GB/sec
[    262144 bytes] CPU->GPU=   2.621 GB/sec, GPU->CPU   2.016 GB/sec
[    524288 bytes] CPU->GPU=   2.621 GB/sec, GPU->CPU   2.185 GB/sec
[   1048576 bytes] CPU->GPU=   2.621 GB/sec, GPU->CPU   2.185 GB/sec
[   2097152 bytes] CPU->GPU=   2.655 GB/sec, GPU->CPU   2.208 GB/sec
[   4194304 bytes] CPU->GPU=   2.672 GB/sec, GPU->CPU   2.208 GB/sec
[   8388608 bytes] CPU->GPU=   2.663 GB/sec, GPU->CPU   2.213 GB/sec
[  16777216 bytes] CPU->GPU=   2.693 GB/sec, GPU->CPU   2.213 GB/sec
[  33554432 bytes] CPU->GPU=   2.695 GB/sec, GPU->CPU   2.213 GB/sec
[  67108864 bytes] CPU->GPU=   2.696 GB/sec, GPU->CPU   2.213 GB/sec
[ 134217728 bytes] CPU->GPU=   2.694 GB/sec, GPU->CPU   2.213 GB/sec
calResAllocLocal2D() returned an error when trying to allocate 268435456 bytes!
Peak CPU->GPU Bandwidth =   2.696 GB/sec [data size = 67108864 bytes]
Peak GPU->CPU Bandwidth =   2.213 GB/sec [data size = 8388608 bytes]

こんな感じだったのでもうちょっと出てもよい気がしたという話。
ほんとにPCIeの8Gが出るなら、CPUと比べても十分速いと言えるのだけど…


あと何かの役に立つかもしれないけど、
http://software.intel.com/en-us/articles/increasing-memory-throughput-with-intel-streaming-simd-extensions-4-intel-sse4-streaming-load/
Uncachableなメモリ領域が必要になる時って来るのだろうか。と、思ったが、アレに使えそうだな。あとで調べよう。


手元のマシンが4850 + Q9550で、これちょうど一年ぐらい前のピーク(の一歩手前)な感じがするのだけど、あれから一年、PCが進化した感じは全然しないな。
未だに4870とかGTX280とかから全然変わってないし、Core2QuadとNehalemてあんまり変わってないし。