您的位置：首页 > 其它

Array of Structures vs Structure of Arrays

2013-08-14 10:54 309 查看

1. 概念

Array of structures 结果是一个array，类型是一个structure！ Structureof Arrays结果是一个struture，类型是arrays！

2. 原文

Array of Structures vs Structure of Arrays

I was trying out a test today on arranging vertex data as an Array of Structures or as a Structure of Arrays:

The array of structure format is the way most applications lay out data.

e.g. A triangle with vertices A, B, C, color and normals can be represented as:

Array of Structures (AOS)

struct Triangle {
float A[3];
float B[3];
float C[3];
float RGBA[4];
float Normal[3];
} // one float is four bytes, so this is 64 byes per triangle

but in the Structure of Arrays (SOA) format it would look like:

struct Triangles {
float *Ax;
float *Ay;
float *Az;
float *RGB;
float* Normal;
}

So if you look at the memory layout in RAM, the AOS layout would have all the triangles together, whereas in the SOA layout you would have all the vertex data together in
RAM. So in theory, the SOA layout would be better performing because when you access the vertex data, you'd get more cache hits.

Let us consider the simple operation of computing the normal of the triangle. This involves two vector subtracts (AB = B-A and AC = C-A) and one cross product of AB cross AC. The number of adds, subtracts and multiplies are the same on AOS and SOA, its just the layout that changes.

So lets do some theoretical analysis before we write some code and test (always test the final thing, theory is nice but you need ground truth).

I have an Apple Macbook Pro 15" with an Intel Core Duo processor running at 2 GHZ. According to the Intel product brief it comes with a 32 KB instruction cache, a 32 KB data cache, a 2 MB L2 cache and then 1 GB of RAM.

Assuming a cache line size of 64 bytes (it changes from processor to processor), the AOS algorithm would access B[0] first, which is a cache miss, causing the processor to fetch in the entire Triangle of 64 bytes into the L1 cache. The rest of the operations would access data in the L1 cache, which is fast. Then, it goes on to the next triangle which is 64 bytes beyond the current one and hence in a different cache line, which is again a cache miss. So in theory in the AOS format we'd be having cache misses on every triangle the first time you read vertex B. This is assuming the compiler doesn't put in speculative pre-fetching for the next triangle.

On the other hand the SOA layout would incur a cache miss for the first triangle when it reads the vertex data Ax, Ay, Az, Bx, By etc. However, since a float is four bytes and the cache line model we use is 64 bytes, each cache line would have 16 floats and hence 16 triangles worth of data. This means that for the first triangle you get a cache miss and for the other 15 you'd get cache hits. So in theory the SOA layout should be a lot faster than the AOS layout.

So writing some code to perform the normal computation for some triangles I discovered the following:

Number of Triangles:	1000	10000	100000
Array of Structures Time (secs):	0.00001	0.00016	0.00476
Structure of Arrays Time (secs):	0.00003	0.00020	0.00257

So we see that AOS is faster for when all the data fits in the data cache but becomes slower when it does not. At 64 bytes per triangle 32,768 triangles fit inside the L2 cache. I ran my tests again and indeed for 32768 triangles and below the AOS is faster but when the data does not fit into the L2 cache, the SOA code is faster.

原文连接：http://hectorgon.blogspot.com/2006/08/array-of-structures-vs-structure-of.html

这篇文章的需求是：要计算三角形法向量。AOS结构中从A、B、C数组取数据计算结果存与normal数组中；SOA中从Ax、Ay、Az数组中读取数据存入Normal数组中。

结论：当数据大小适合数据缓存时，AOS要更快；反之则AOS要慢！结合上面表格数据，AOS中每个三角形64bytes，L2缓存正好能存放32768个三角形，当三角形个数小于等于32768阈值时，AOS要明显快于SOA。当数据超过32768时，SOA要快！

参考：

1. http://coolshell.cn/articles/10249.html

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： AOS SOA L2缓存

相关文章推荐

新的分享

章节导航