問題描述
boost::regex ‑ \bb? (boost::regex ‑ \bb?)
我這裡有一些使用 boost::regex::perl
的舊代碼,其註釋很差。我之前想知道一個特定的構造,但由於代碼(或多或少)有效,我不願意觸摸它。
現在我必須觸摸它,以獲取技術原因(更準確地說,當前版本的 Boost 不再接受該構造),所以我必須弄清楚它做什麼 ‑ 或者更確切地說,打算做什麼。 p>
正則表達式的相關部分:
(?<!(\bb\s|\bb|^[a‑z]\s|^[a‑z]))
讓我頭疼的部分是\bb
。我知道 \b
,但我找不到提到 \bb
,並且在這裡尋找文字 'b'
沒有意義. \bb
是一些特殊的未充分記錄的功能,還是我必須認為這是一個錯字?
參考解法
方法 1:
As Boost seems to be a regex engine for C++, and one of the compatibility modes is perl compatibility‑‑if that is a "perl‑compatible" expression, than the second 'b' can only be a literal.
It's a valid expression, pretty much a special case for words beginning with 'b'.
It seems to be the deciding factor that this is a c++ library, and that it's to give environments that aren't perl, perl‑compatible regexes. Thus my original thought that perl might interpret the expression (say with overload::constant
) is invalid. Yet it is still worth mentioning just for clarification purposes, regardless of how inadvisable it would be tweak an expression meaning "word beginning with 'b'".
The only caveat to that idea is that perhaps Boost out‑performs Perl at it's own expressions and somebody would be using the Boost engine in a Perl environment, then all bets are off as to whether that could have been meant as a special expression. This is just one stab, given a grammar where '!!!' meant something special at the beginning of words, you could piggyback on the established meaning like this (NOT RECOMMENDED!)
s/\\bb\b/(?:!!!(\\p{Alpha})|\\bb)/
This would be something dumb to do, but as we are dealing with code that seems unfit for its task, there are thousands of ways to fail at a task.
方法 2:
(\bb\s|\bb|^[a‑z]\s|^[a‑z])
matches a b
if it's not preceded by another word character, or any lowercase letter if it's at the beginning of the string. In either case, the letter may be followed by a whitespace character. (It could match uppercase letters too if case‑insensitive mode is set, and the ^
could also match the beginning of a line if multiline mode is set.)
But inside a lookbehind, that shouldn't even have compiled. In some flavors, a lookbehind can contain multiple alternatives with different, fixed lengths, but the alternation has to be at the top level in the lookbehind. That is, (?<=abc|xy|12345)
will work, but (?<=(abc|xy|12345))
won't. So your regex wouldn't work even in those flavors, but Boost's docs just say the lookbehind expression has to be fixed‑length.
If you really need to account for all four of the possibilities matched by that regex, I suggest you split the lookbehind into two:
(?<!\bb|^[a‑z])(?<!(?:\bb|^[a‑z])\s)
(by DevSolar、Axeman、Alan Moore)