PHP DOM 文檔回顯問題 (PHP DOMdocument echoing problem)


問題描述

PHP DOM 文檔回顯問題 (PHP DOMdocument echoing problem)

$content = '<!‑‑<sup><span style="font‑weight:bold;color:black;">0</span></sup><br/>‑‑>
    <div class="popular‑video‑image">
        <a href="video/Far+East+Movement ‑ Like+a+G6/w4s6H4ku6ZY/" title="<lang video_go_to=Far East Movement ‑ Like a G6>">
            <img src="/images/topvideo/1.jpg" alt=""/>
        </a>
        <span class="popular‑video‑artist ellipsis"><a href="video/Far+East+Movement ‑ Like+a+G6/w4s6H4ku6ZY/" title="<lang video_go_to=Far East Movement ‑ Like a G6>" class="ellipsis">Far East Movement</a></span>
        <span class="popular‑video‑title ellipsis"><a href="video/Far+East+Movement ‑ Like+a+G6/w4s6H4ku6ZY/" title="<lang video_go_to=Far East Movement ‑ Like a G6>" class="ellipsis">Like a G6</a></span>
    </div>';

    $dom = new DOMDocument;
    $dom‑>preserveWhiteSpace = false;
    $dom‑>loadHTML($content);
    foreach ($dom‑>getElementsByTagName('a') as $node)
    {
        $node‑>setAttribute('href', 'http://mysite.ru/' . $node‑>getAttribute('href'));
    }
    $dom‑>formatOutput = true;

    echo $dom‑>saveXml($dom‑>documentElement);

輸出:

<html>
  <body>
    <div class="popular‑video‑image">&#13;
        <a href="http://mysite.ru/video/Far+East+Movement ‑ Like+a+G6/w4s6H4ku6ZY/" title="&lt;lang video_go_to=Far East Movement ‑ Like a G6&gt;">&#13;
            <img src="/images/topvideo/1.jpg" alt=""/></a>&#13;
        <span class="popular‑video‑artist ellipsis"><a href="http://mysite.ru/video/Far+East+Movement ‑ Like+a+G6/w4s6H4ku6ZY/" title="&lt;lang video_go_to=Far East Movement ‑ Like a G6&gt;" class="ellipsis">Far East Movement</a></span>&#13;
        <span class="popular‑video‑title ellipsis"><a href="http://mysite.ru/video/Far+East+Movement ‑ Like+a+G6/w4s6H4ku6ZY/" title="&lt;lang video_go_to=Far East Movement ‑ Like a G6&gt;" class="ellipsis">Like a G6</a></span>&#13;
    </div>

  </body>
</html>

我不想添加 html 和 body 標籤。也不希望將標記替換為 <lang>而且 也是沒必要的。

我想接收這樣的內容,就是在入口處,只有修改過的鏈接..

抱歉英語不好!


## 參考解法 #### 方法 1:

You are seeing &#13; at the end of each line because your HTML has Windows‑style line endings CR+LF. To get rid of them, run this on it before you feed it into DOMDocument — to convert them to Unix‑style line endings LF:

$content = preg_replace('/\r\n/', "\n", $content);

方法 2:

saveXml takes an optional parameter to allow you to specify the node to output.

$dom‑>saveXml($dom‑>documentElement‑>firstChild‑>firstChild);

This will remove the html and body tags from the output.

方法 3:

I guess that the <html> and <body> tags get placed in because you are using loadHTML. Try using loadXML instead.

As for &lt;lang&gt;, it has to be replaced because otherwise the resulting XML would not be valid. If it is causing you problems, you should change your approach a little and work with it, not against it.

方法 4:

<?php
    $content = '<!‑‑<sup><span style="font‑weight:bold;color:black;">0</span></sup><br/>‑‑>
    <div class="popular‑video‑image">
        <a href="video/Far+East+Movement ‑ Like+a+G6/w4s6H4ku6ZY/" title="<lang video_go_to=Far East Movement ‑ Like a G6>">
            <img src="/images/topvideo/1.jpg" alt=""/>
        </a>
        <span class="popular‑video‑artist ellipsis"><a href="video/Far+East+Movement ‑ Like+a+G6/w4s6H4ku6ZY/" title="<lang video_go_to=Far East Movement ‑ Like a G6>" class="ellipsis">Far East Movement</a></span>
        <span class="popular‑video‑title ellipsis"><a href="video/Far+East+Movement ‑ Like+a+G6/w4s6H4ku6ZY/" title="<lang video_go_to=Far East Movement ‑ Like a G6>" class="ellipsis">Like a G6</a></span>
    </div>';

    $dom = new DOMDocument;
    $dom‑>preserveWhiteSpace = false;
    $dom‑>loadHTML($content);
    foreach ($dom‑>getElementsByTagName('a') as $node)
    {
        $node‑>setAttribute('href', 'http://mysite.ru/' . $node‑>getAttribute('href'));
    }
    $dom‑>formatOutput = true;

    echo preg_replace('#^<!DOCTYPE.+?>#', '', str_replace( array('<html>', '</html>', '<body>', '</body>', "\n\n", '&lt;', '&gt;'), array('', '', '', '', '', '<', '>',), $dom‑>saveHTML()));

(by IsismattalxndrStephen CurranJonIsis)

參考文件

  1. PHP DOMdocument echoing problem (CC BY‑SA 3.0/4.0)

#domdocument #PHP






相關問題

PHP/DOMDocument: unset() 不釋放資源 (PHP/DOMDocument: unset() does not release resources)

C++ Xerces Parser 加載 HTML 並蒐索 HTML 元素 (C++ Xerces Parser Load HTML and Search for HTML Elements)

Cách lấy tên thuộc tính kiểu bằng PHP xpath (How to get the style property name using PHP xpath)

DOMDocument:如何解析類似 bbcode 的標籤? (DOMDocument : how to parse a bbcode like tag?)

如何使用 DOMDocument 獲取此 html 中的 url (How to use DOMDocument to get url in this html)

DomDocument 未能為 RSS 提要添加“鏈接”元素 (DomDocument failing to add a "link" element for RSS feed)

如何防止將文檔類型添加到 HTML 中? (How to prevent the doctype from being added to the HTML?)

PHP DOM 文檔回顯問題 (PHP DOMdocument echoing problem)

使用 PHP 將數據放到服務器上(新的 DOMdocument 不起作用) (Use PHP to put data onto server ( new DOMdocument not working))

有沒有辦法構建類似於 DOMDocument 構建 HTML 文檔的 SQL 查詢? (Is there a way to build SQL queries similar to how DOMDocument builds HTML document?)

來自 URL 的 file_get_contents 僅適用於本地服務器 (file_get_contents from URL works on local server only)

使用多個 <table> 標記抓取 HTML 頁面並從特定的 <a> 標記後代中提取文本 (Scrape HTML page with multiple <table> tags and extract text from specific <a> tag descendants)







留言討論