問題描述
如果條件為真,則將表達式添加到循環中 (if condition is true, add an expression to a loop)
我正在尋找最快、最優雅的方式來執行以下操作:
const bool condition1;
(...)
if (condition1)
{
// add an expression to a loop below
}
while (condition2)
{
(...)
// expression to be executed if condition1 = true
}
我當然可以在循環的每次迭代中檢查條件,但條件 1 是恆定的,因此它不會這不是最有效的方法。
提前致謝!
參考解法
方法 1:
Just put the branch in the loop like
while (condition2)
{
if (condition1)
{
// run expression in the loop
}
}
CPU's have a branch predictor and if your "branch" never branches because of a constant condition, then the predictor will figure that out and you'll never actually branch in your codes execution.
方法 2:
Often you can rely on your compiler's optimizer to solve this problem for you. But not always.
So the first step is, write simple code, profile it, and see if your code is already fast enough; if the bottleneck is elsewhere, stop optimizing. Optimization effort is fungible; if can be spent on making the important code faster, and until you profile you don't know what is important.
Then use high‑yield optimization techniques, like enabling vectorization, parallelizing, and using someone else's library. That can earn you 2, 30 or even 3000x speedups for really low effort. (We just swapped a manual bit of pixel fiddling with using a library with ~50 lines changed, and identical input/output went from 2 seconds to 50 ms, a 40x speedup. The pixel fiddling wasn't bad, it was just not good)
Eventually work down to what I am going to show you next; I would only go this far for stuff like per‑frame per‑pixel operations, code you want to run on the order of 50 million times per second or more.
auto loop=[&](auto condition1){
while(condition2){
// code
if(condition1){
// code
}
}
};
if(condition1){
loop(std::integral_constsnt<bool,true>{});
} else {
loop(std::integral_constsnt<bool,false>{});
}
what I just did was force the branch to be pre‑calculated.
The trick here is that the auto condition1
variable is an integral constant, a stateless variable with a constexpr operstor bool. This gets every compiler to dead code eliminate the unreachable branch, sometimes even in debug mode.
We can then profile this against the raw version:
loop(condition1);
easily. (If you cannot profile the speed difference, you shouldn't be doing this.)
You can also use non type template parameters; a classic way to make per‑pixel operation code clean and DRY and efficient is to hoist things like "is premultiplied" or "entirely opaque" to template parameter bools.
方法 3:
If you want to be sure that there is no branch in the code, you can use a template:
template<bool condition1>
void foo()
{
while (condition2)
{
(...)
if constexpr (condition1)
{
// run expression in the loop
}
}
}
// in the original function
if (condition1)
{
foo<true>();
} else
{
foo<false>();
}
Or, you can do this:
void loopBody()
{
(...)
}
// in the original function
if (condition1)
{
while (condition2)
{
loopBody();
// expression to be executed if condition1 = true
}
} else
{
while (condition2)
{
loopBody();
}
}
Also consider whether order of the expressions matter. If the order doesn't matter, then it may be simpler (and possibly more efficient) to have separate loops:
while (condition2)
{
(...)
}
if (condition1)
{
while (similar_condition)
{
// expression to be executed if condition1 = true
}
}
(by user15039632、NathanOliver、Yakk ‑ Adam Nevraumont、eerorika)