boost::regex - \bb? (boost::regex - \bb?)


問題描述

boost::regex ‑ \bb? (boost::regex ‑ \bb?)

我這裡有一些使用 boost::regex::perl 的舊代碼,其註釋很差。我之前想知道一個特定的構造,但由於代碼(或多或少)有效,我不願意觸摸它。

現在我必須觸摸它,以獲取技術原因(更準確地說,當前版本的 Boost 不再接受該構造),所以我必須弄清楚它做什麼 ‑ 或者更確切地說,打算做什麼。 p>

正則表達式的相關部分:

(?<!(\bb\s|\bb|^[a‑z]\s|^[a‑z]))

讓我頭疼的部分是\bb。我知道 \b,但我找不到提到 \bb,並且在這裡尋找文字 'b' 沒有意義. \bb 是一些特殊的未充分記錄的功能,還是我必須認為這是一個錯字?


參考解法

方法 1:

As Boost seems to be a regex engine for C++, and one of the compatibility modes is perl compatibility‑‑if that is a "perl‑compatible" expression, than the second 'b' can only be a literal.

It's a valid expression, pretty much a special case for words beginning with 'b'.

It seems to be the deciding factor that this is a c++ library, and that it's to give environments that aren't perl, perl‑compatible regexes. Thus my original thought that perl might interpret the expression (say with overload::constant) is invalid. Yet it is still worth mentioning just for clarification purposes, regardless of how inadvisable it would be tweak an expression meaning "word beginning with 'b'".

The only caveat to that idea is that perhaps Boost out‑performs Perl at it's own expressions and somebody would be using the Boost engine in a Perl environment, then all bets are off as to whether that could have been meant as a special expression. This is just one stab, given a grammar where '!!!' meant something special at the beginning of words, you could piggyback on the established meaning like this (NOT RECOMMENDED!)

s/\\bb\b/(?:!!!(\\p{Alpha})|\\bb)/

This would be something dumb to do, but as we are dealing with code that seems unfit for its task, there are thousands of ways to fail at a task.

方法 2:

(\bb\s|\bb|^[a‑z]\s|^[a‑z]) matches a b if it's not preceded by another word character, or any lowercase letter if it's at the beginning of the string. In either case, the letter may be followed by a whitespace character. (It could match uppercase letters too if case‑insensitive mode is set, and the ^ could also match the beginning of a line if multiline mode is set.)

But inside a lookbehind, that shouldn't even have compiled. In some flavors, a lookbehind can contain multiple alternatives with different, fixed lengths, but the alternation has to be at the top level in the lookbehind. That is, (?<=abc|xy|12345) will work, but (?<=(abc|xy|12345)) won't. So your regex wouldn't work even in those flavors, but Boost's docs just say the lookbehind expression has to be fixed‑length.

If you really need to account for all four of the possibilities matched by that regex, I suggest you split the lookbehind into two:

(?<!\bb|^[a‑z])(?<!(?:\bb|^[a‑z])\s)

(by DevSolarAxemanAlan Moore)

參考文件

  1. boost::regex ‑ \bb? (CC BY‑SA 3.0/4.0)

#perl #RegEx #boost #C++






相關問題

保持特定位數的簡單 Perl 數學 (simple Perl math while keeping a specific number of digits)

如何在 Windows 批處理腳本或 Perl 中將文件移動到回收站? (How can I move files to the Recycle Bin in a Windows batch script or Perl?)

從子程序返回數組 (Return array from subroutine)

我可以以與操作系統無關的方式限制 Perl 進程使用的內存嗎? (Can I Iimit the memory used by a Perl process in an OS-agnostic way?)

$# 在 perl 中接受什麼作為輸入? (what does $# accept as input in perl?)

Perl Text::CSV_XS 從字符串中讀取 (Perl Text::CSV_XS read from string)

使用 excel 2010 更新批處理文件中的變量 (use excel 2010 to update variables in batch file)

在 perl 中為哈希添加值 (Adding value to an hash in perl)

為什麼 perl 會忽略我的正則表達式中的多餘字符? (Why does perl ignore extra characters in my regex?)

boost::regex - \bb? (boost::regex - \bb?)

如果小於 X 天,如何從磁盤讀取文件,如果舊,則重新獲取 html 文件 (How to read a file from the disk if less than X days old, if older, refetch the html file)

使用 Devel-Cover 獲取覆蓋率報告 (Using Devel-Cover to get coverage reports)







留言討論