問題描述
Array of Structs selalu lebih cepat daripada Structs of arrays? (Array of Structs are always faster than Structs of arrays?)
I was wondering if the data layout Structs of Arrays
(SoA) is always faster than an Array of Structs
(AoS) or Array of Pointers
(AoP) for problems with inputs that only fits in RAM
programmed in C/JAVA
.
Some days ago I was improving the performance of a Molecular Dynamic algorithm (in C), summarizing in this algorithm it is calculated the force interaction among particles based on their force and position.
Original the particles were represented by a struct containing 9 different doubles, 3 for particles forces (Fx,Fy,Fz) , 3 for positions and 3 for velocity. The algorithm had an array containing pointers to all the particles (AoP). I decided to change the layout from AoP to SoA to improve the cache use.
So, now I have a Struct with 9 array where each array stores forces, velocity and positions (x,y,z) of each particle. Each particle is accessed by it own array index.
I had a gain in performance (for an input that only fits in RAM) of about 1.9x, so I was wondering if typically changing from AoP or AoS to SoA it will always performance better, and if not in which types of algorithms this do not occurs.
‑‑‑‑‑
參考解法
方法 1:
Much depends of how useful all fields are. If you have a data structure where using one fields means you are likely to use all of them, then an array of struct is more efficient as it keeps together all the things you are likely to need.
Say you have time series data where you only need a small selection of the possible fields you have. You might have all sorts of data about an event or point in time, but you only need say 3‑5 of them. In this case a structure of arrays is more efficient because a) you don't need to cache the fields you don't use b) you often access values in order i.e. caching a field, its next value and its next is useful.
For this reason, time‑series information is often stored as a collection of columns.
方法 2:
This will depend on how exactly you access the data. Try to imagine, what exactly happens in the hardware when you access your data, in either SoA or AoS.
To reason about your question, you must consider following things ‑
- If the cache is absent, the performance should be the same, assuming that memory access latency is equal for all the elements of the data.
- Now with the cache, if you access consecutive address locations, definitely you will get performance improvement. This is exactly valid in your case. When you have AoS, The locations are not consecutive in the memory, so you must lose some performance there.
- You must be accessing in for loops your data like
for(int i=0;i<1000000;i++) Fx[i] = 0
. So if the data is huge in quantity, you will easily see the small performance benefits. If your data was small, this would not matter much. - Finally, you also don't know about the DRAM that you are using. It will have some benefits when you access consecutive data. For example to understand why it is like that you can refer to wiki.
(by dreamcrash、Peter Lawrey、Raj)