C++ Xerces Parser 加載 HTML 並蒐索 HTML 元素 (C++ Xerces Parser Load HTML and Search for HTML Elements)


問題描述

C++ Xerces Parser 加載 HTML 並蒐索 HTML 元素 (C++ Xerces Parser Load HTML and Search for HTML Elements)

我正在嘗試使用 Xerces DOMDocument C++ 解析器加載 HTML 並蒐索特定的 HTML 元素。我很難找到如何實現這一點的好例子。我似乎發現的只是解析 XML。有人可以幫忙嗎?謝謝。


參考解法

方法 1:

Take a look at this: http://xerces.apache.org/xerces‑c/program‑dom‑3.html

There is an example with DOMDocument as well:

// // Create a small document tree //

{
    XMLCh tempStr[100];

    XMLString::transcode("Range", tempStr, 99);
    DOMImplementation* impl = DOMImplementationRegistry::getDOMImplementation(tempStr, 0);

    XMLString::transcode("root", tempStr, 99);
    DOMDocument*   doc = impl‑>createDocument(0, tempStr, 0);
    DOMElement*   root = doc‑>getDocumentElement();

    XMLString::transcode("FirstElement", tempStr, 99);
    DOMElement*   e1 = doc‑>createElement(tempStr);
    root‑>appendChild(e1);

    XMLString::transcode("SecondElement", tempStr, 99);
    DOMElement*   e2 = doc‑>createElement(tempStr);
    root‑>appendChild(e2);

    XMLString::transcode("aTextNode", tempStr, 99);
    DOMText*       textNode = doc‑>createTextNode(tempStr);
    e1‑>appendChild(textNode);

    // optionally, call release() to release the resource associated with the range after done
    DOMRange* range = doc‑>createRange();
    range‑>release();

    // removedElement is an orphaned node, optionally call release() to release associated resource
    DOMElement* removedElement = root‑>removeChild(e2);
    removedElement‑>release();

    // no need to release this returned object which is owned by implementation
    XMLString::transcode("*", tempStr, 99);
    DOMNodeList*    nodeList = doc‑>getElementsByTagName(tempStr);

    // done with the document, must call release() to release the entire document resources
    doc‑>release();
};

... and so long.

EDIT:

But how do I load HTML into the DOMDocument and search against the html elements? Thats what Im trying to figure out.

XercesDOMParser parser;

parser.loadGrammar("grammar.dtd", Grammar::DTDGrammarType);

parser.setValidationScheme(XercesDOMParser::Val_Always);

Handler handler;

parser.setErrorHandler( &handler );

parser.parse("xmlfile.xml");

(by somejkuserLeo Chapiro)

參考文件

  1. C++ Xerces Parser Load HTML and Search for HTML Elements (CC BY‑SA 2.5/3.0/4.0)

#domdocument #xerces #html #C++






相關問題

PHP/DOMDocument: unset() 不釋放資源 (PHP/DOMDocument: unset() does not release resources)

C++ Xerces Parser 加載 HTML 並蒐索 HTML 元素 (C++ Xerces Parser Load HTML and Search for HTML Elements)

Cách lấy tên thuộc tính kiểu bằng PHP xpath (How to get the style property name using PHP xpath)

DOMDocument:如何解析類似 bbcode 的標籤? (DOMDocument : how to parse a bbcode like tag?)

如何使用 DOMDocument 獲取此 html 中的 url (How to use DOMDocument to get url in this html)

DomDocument 未能為 RSS 提要添加“鏈接”元素 (DomDocument failing to add a "link" element for RSS feed)

如何防止將文檔類型添加到 HTML 中? (How to prevent the doctype from being added to the HTML?)

PHP DOM 文檔回顯問題 (PHP DOMdocument echoing problem)

使用 PHP 將數據放到服務器上(新的 DOMdocument 不起作用) (Use PHP to put data onto server ( new DOMdocument not working))

有沒有辦法構建類似於 DOMDocument 構建 HTML 文檔的 SQL 查詢? (Is there a way to build SQL queries similar to how DOMDocument builds HTML document?)

來自 URL 的 file_get_contents 僅適用於本地服務器 (file_get_contents from URL works on local server only)

使用多個 <table> 標記抓取 HTML 頁面並從特定的 <a> 標記後代中提取文本 (Scrape HTML page with multiple <table> tags and extract text from specific <a> tag descendants)







留言討論