問題描述
C++ Xerces Parser 加載 HTML 並蒐索 HTML 元素 (C++ Xerces Parser Load HTML and Search for HTML Elements)
我正在嘗試使用 Xerces DOMDocument C++ 解析器加載 HTML 並蒐索特定的 HTML 元素。我很難找到如何實現這一點的好例子。我似乎發現的只是解析 XML。有人可以幫忙嗎?謝謝。
參考解法
方法 1:
Take a look at this: http://xerces.apache.org/xerces‑c/program‑dom‑3.html
There is an example with DOMDocument as well:
// // Create a small document tree //
{
XMLCh tempStr[100];
XMLString::transcode("Range", tempStr, 99);
DOMImplementation* impl = DOMImplementationRegistry::getDOMImplementation(tempStr, 0);
XMLString::transcode("root", tempStr, 99);
DOMDocument* doc = impl‑>createDocument(0, tempStr, 0);
DOMElement* root = doc‑>getDocumentElement();
XMLString::transcode("FirstElement", tempStr, 99);
DOMElement* e1 = doc‑>createElement(tempStr);
root‑>appendChild(e1);
XMLString::transcode("SecondElement", tempStr, 99);
DOMElement* e2 = doc‑>createElement(tempStr);
root‑>appendChild(e2);
XMLString::transcode("aTextNode", tempStr, 99);
DOMText* textNode = doc‑>createTextNode(tempStr);
e1‑>appendChild(textNode);
// optionally, call release() to release the resource associated with the range after done
DOMRange* range = doc‑>createRange();
range‑>release();
// removedElement is an orphaned node, optionally call release() to release associated resource
DOMElement* removedElement = root‑>removeChild(e2);
removedElement‑>release();
// no need to release this returned object which is owned by implementation
XMLString::transcode("*", tempStr, 99);
DOMNodeList* nodeList = doc‑>getElementsByTagName(tempStr);
// done with the document, must call release() to release the entire document resources
doc‑>release();
};
... and so long.
EDIT:
But how do I load HTML into the DOMDocument and search against the html elements? Thats what Im trying to figure out.
XercesDOMParser parser;
parser.loadGrammar("grammar.dtd", Grammar::DTDGrammarType);
parser.setValidationScheme(XercesDOMParser::Val_Always);
Handler handler;
parser.setErrorHandler( &handler );
parser.parse("xmlfile.xml");
(by somejkuser、Leo Chapiro)