<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://wiki.trinitydesktop.net/index.php?action=history&amp;feed=atom&amp;title=HTML_parsing_methods</id>
	<title>HTML parsing methods - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://wiki.trinitydesktop.net/index.php?action=history&amp;feed=atom&amp;title=HTML_parsing_methods"/>
	<link rel="alternate" type="text/html" href="https://wiki.trinitydesktop.net/index.php?title=HTML_parsing_methods&amp;action=history"/>
	<updated>2026-05-24T23:46:34Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.35.13</generator>
	<entry>
		<id>https://wiki.trinitydesktop.net/index.php?title=HTML_parsing_methods&amp;diff=1690&amp;oldid=prev</id>
		<title>Blu256: +KDE3</title>
		<link rel="alternate" type="text/html" href="https://wiki.trinitydesktop.net/index.php?title=HTML_parsing_methods&amp;diff=1690&amp;oldid=prev"/>
		<updated>2021-08-22T21:01:01Z</updated>

		<summary type="html">&lt;p&gt;+KDE3&lt;/p&gt;
&lt;table class=&quot;diff diff-contentalign-left diff-editfont-monospace&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 21:01, 22 August 2021&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 2:&lt;/td&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 2:&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;[[Category:KDE3]]&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;[[Category:KDE3]]&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;[[Category:Tutorials]]&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;[[Category:Tutorials]]&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td colspan=&quot;2&quot; class=&quot;diff-empty diff-side-deleted&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-addedline diff-side-added&quot;&gt;&lt;div&gt;{{KDE3}}&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;br /&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;br /&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-deleted&quot;&gt;&lt;div&gt;For HTML parsing, you have the following possibilities:&lt;/div&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;
  &lt;td class=&quot;diff-context diff-side-added&quot;&gt;&lt;div&gt;For HTML parsing, you have the following possibilities:&lt;/div&gt;&lt;/td&gt;
&lt;/tr&gt;

&lt;!-- diff cache key mwdb:diff:wikidiff2:1.12:old-298:rev-1690:1.13.0 --&gt;
&lt;/table&gt;</summary>
		<author><name>Blu256</name></author>
	</entry>
	<entry>
		<id>https://wiki.trinitydesktop.net/index.php?title=HTML_parsing_methods&amp;diff=298&amp;oldid=prev</id>
		<title>imported&gt;Eliddell: Created page with &quot;Category:Developers Category:KDE3 Category:Tutorials  For HTML parsing, you have the following possibilities: * QXML * QDOM * Perl * XHTML Obviously, QXML and QDOM...&quot;</title>
		<link rel="alternate" type="text/html" href="https://wiki.trinitydesktop.net/index.php?title=HTML_parsing_methods&amp;diff=298&amp;oldid=prev"/>
		<updated>2014-05-26T21:33:29Z</updated>

		<summary type="html">&lt;p&gt;Created page with &amp;quot;&lt;a href=&quot;/Category:Developers&quot; title=&quot;Category:Developers&quot;&gt;Category:Developers&lt;/a&gt; &lt;a href=&quot;/Category:KDE3&quot; title=&quot;Category:KDE3&quot;&gt;Category:KDE3&lt;/a&gt; &lt;a href=&quot;/Category:Tutorials&quot; title=&quot;Category:Tutorials&quot;&gt;Category:Tutorials&lt;/a&gt;  For HTML parsing, you have the following possibilities: * QXML * QDOM * Perl * XHTML Obviously, QXML and QDOM...&amp;quot;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;[[Category:Developers]]&lt;br /&gt;
[[Category:KDE3]]&lt;br /&gt;
[[Category:Tutorials]]&lt;br /&gt;
&lt;br /&gt;
For HTML parsing, you have the following possibilities:&lt;br /&gt;
* QXML&lt;br /&gt;
* QDOM&lt;br /&gt;
* Perl&lt;br /&gt;
* XHTML&lt;br /&gt;
Obviously, QXML and QDOM need XML-compliant HTML pages, and the least HTML pages are XML-compliant. Perl is not the scope of this site. This tutorial chooses the XHTML approach. &lt;br /&gt;
&lt;br /&gt;
=First step=&lt;br /&gt;
As we remember from http://developernew.kde.org/Development/Tutorials/Programming_Tutorial_KDE_4/How_to_write_an_HTML_parser, biggest thing is to be able to parse non-XML-conform syntax. It works with the following program. &lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;tags.cpp&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;cpp-qt&amp;quot; line&amp;gt;&lt;br /&gt;
#include &amp;lt;kapplication.h&amp;gt;&lt;br /&gt;
#include &amp;lt;kaboutdata.h&amp;gt;&lt;br /&gt;
#include &amp;lt;kcmdlineargs.h&amp;gt;&lt;br /&gt;
#include &amp;lt;dom/html_document.h&amp;gt;&lt;br /&gt;
&lt;br /&gt;
int main (int argc, char *argv[])&lt;br /&gt;
{&lt;br /&gt;
        KAboutData aboutData( &amp;quot;test&amp;quot;, &amp;quot;test&amp;quot;,&lt;br /&gt;
        &amp;quot;1.0&amp;quot;, &amp;quot;test&amp;quot;, KAboutData::License_GPL,&lt;br /&gt;
        &amp;quot;(c) 2006&amp;quot; );&lt;br /&gt;
        KCmdLineArgs::init( argc, argv, &amp;amp;aboutData );&lt;br /&gt;
        KApplication khello;&lt;br /&gt;
&lt;br /&gt;
        DOM::HTMLDocument doc;&lt;br /&gt;
        DOM::DOMString tag(&amp;quot;*&amp;quot;);&lt;br /&gt;
        DOM::DOMString uri(&amp;quot;&amp;lt;html&amp;gt;&amp;lt;body&amp;gt;&amp;lt;a href=\&amp;quot;http://www.kde.org/\&amp;quot;&amp;gt;&amp;lt;/a&amp;gt;&amp;lt;a href=\&amp;quot;/index.php\&amp;quot; nowrap&amp;gt;Log in&amp;lt;/a&amp;gt;&amp;lt;a href=\&amp;quot;http://www.gmx.de\&amp;quot;&amp;gt;&amp;lt;/a&amp;gt;&amp;lt;/body&amp;gt;&amp;lt;/html&amp;gt;&amp;quot;);&lt;br /&gt;
&lt;br /&gt;
        doc.loadXML(uri);&lt;br /&gt;
        kdDebug() &amp;lt;&amp;lt; &amp;quot;Does this doc have child elements ? &amp;quot; &amp;lt;&amp;lt; doc.hasChildNodes() &amp;lt;&amp;lt; endl;&lt;br /&gt;
        for (int i=0; i&amp;lt;doc.getElementsByTagName(tag).length(); i++) kdDebug() &amp;lt;&amp;lt; doc.getElementsByTagName(tag).item(i).nodeName().string() &amp;lt;&amp;lt; endl;&lt;br /&gt;
        kdDebug() &amp;lt;&amp;lt; &amp;quot;Size of your doc &amp;quot; &amp;lt;&amp;lt; sizeof(doc.firstChild()) &amp;lt;&amp;lt; endl;&lt;br /&gt;
        kdDebug() &amp;lt;&amp;lt; doc.isHTMLDocument() &amp;lt;&amp;lt; endl;&lt;br /&gt;
        kdDebug() &amp;lt;&amp;lt; doc.toString().string() &amp;lt;&amp;lt; endl;&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Compile it like this:&lt;br /&gt;
 gcc -I/usr/lib/qt3/include -I/opt/kde3/include \&lt;br /&gt;
 -L/opt/kde3/lib -lkdeui -lkhtml -o tags tags.cpp&lt;br /&gt;
&lt;br /&gt;
=Second=&lt;br /&gt;
&lt;br /&gt;
&amp;lt;syntaxhighlight lang=&amp;quot;cpp-qt&amp;quot;&amp;gt;&lt;br /&gt;
#include &amp;lt;kapplication.h&amp;gt;&lt;br /&gt;
#include &amp;lt;kaboutdata.h&amp;gt;&lt;br /&gt;
#include &amp;lt;kcmdlineargs.h&amp;gt;&lt;br /&gt;
#include &amp;lt;dom/html_document.h&amp;gt;&lt;br /&gt;
#include &amp;lt;dom/html_element.h&amp;gt;&lt;br /&gt;
#include &amp;lt;dom/dom_node.h&amp;gt;&lt;br /&gt;
&lt;br /&gt;
int main (int argc, char *argv[])&lt;br /&gt;
{&lt;br /&gt;
        KAboutData aboutData( &amp;quot;test&amp;quot;, &amp;quot;test&amp;quot;,&lt;br /&gt;
        &amp;quot;1.0&amp;quot;, &amp;quot;test&amp;quot;, KAboutData::License_GPL,&lt;br /&gt;
        &amp;quot;(c) 2006&amp;quot; );&lt;br /&gt;
        KCmdLineArgs::init( argc, argv, &amp;amp;aboutData );&lt;br /&gt;
        KApplication khello;&lt;br /&gt;
&lt;br /&gt;
        DOM::HTMLDocument doc;&lt;br /&gt;
        DOM::DOMString tag(&amp;quot;*&amp;quot;);&lt;br /&gt;
        DOM::DOMString uri(&amp;quot;&amp;lt;html&amp;gt;&amp;lt;body&amp;gt;&amp;lt;a href=\&amp;quot;http://www.kde.org/\&amp;quot;&amp;gt;&amp;lt;b&amp;gt;fat&amp;lt;/b&amp;gt;&amp;lt;/a&amp;gt;&amp;lt;a href=\&amp;quot;/index.php\&amp;quot; nowrap&amp;gt;Log in&amp;lt;/a&amp;gt;&amp;lt;a href=\&amp;quot;http://www.gmx.de\&amp;quot;&amp;gt;&amp;lt;/a&amp;gt;&amp;lt;/body&amp;gt;&amp;lt;/html&amp;gt;&amp;quot;);&lt;br /&gt;
&lt;br /&gt;
        doc.loadXML(uri);&lt;br /&gt;
        kdDebug() &amp;lt;&amp;lt; &amp;quot;Here&amp;#039;s a list of the document elements&amp;quot; &amp;lt;&amp;lt; endl;&lt;br /&gt;
        for (int i=0; i&amp;lt;doc.getElementsByTagName(tag).length(); i++) kdDebug() &amp;lt;&amp;lt; doc.getElementsByTagName(tag).item(i).nodeName().string() &amp;lt;&amp;lt; endl;&lt;br /&gt;
       &lt;br /&gt;
        DOM::HTMLDocument doc2;&lt;br /&gt;
        DOM::DOMString uri2(&amp;quot;&amp;lt;html&amp;gt;&amp;lt;body&amp;gt;this is html&amp;lt;b&amp;gt;fat&amp;lt;/b&amp;gt;&amp;lt;/body&amp;gt;&amp;lt;/html&amp;gt;&amp;quot;);&lt;br /&gt;
        doc2.loadXML(uri2);&lt;br /&gt;
        kdDebug() &amp;lt;&amp;lt; &amp;quot;This is the in-memory html:&amp;quot; &amp;lt;&amp;lt; endl;&lt;br /&gt;
        kdDebug() &amp;lt;&amp;lt; doc.toString().string() &amp;lt;&amp;lt; endl;&lt;br /&gt;
        doc.body().insertBefore(doc.body().firstChild().firstChild(),doc.body().firstChild());&lt;br /&gt;
        kdDebug() &amp;lt;&amp;lt; &amp;quot;Moving around nodes&amp;quot; &amp;lt;&amp;lt; endl;&lt;br /&gt;
        kdDebug() &amp;lt;&amp;lt; doc.toString().string() &amp;lt;&amp;lt; endl;&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/syntaxhighlight&amp;gt;&lt;/div&gt;</summary>
		<author><name>imported&gt;Eliddell</name></author>
	</entry>
</feed>