Delphi HTML Parser

title	description	author	tags
Delphi HTML Parser	It is very small and fast module for parsing HTML pages.	sandbil	delphi, Lasarus

Delphi HTML Parser

This module lets you working with HTML documents as DOM tree and use XPath for searching tags.
It is very simple way to parse HTML.

This tested with version Delphi XE5,6

Requirements

OpenSSL (Demo project contains two required dll: libeay32.dll, ssleay32.dll)

Usage

Add in Uses parser.pas; Next:

.. var DomTree: TDomTree; DomTreeNode: TDomTreeNode; HtmlTxt: string; NodeList:TNodeList; ValueList:TStringList; begin HtmlTxt:= NodeList:= TNodeList.Create; ValueList:= TStringList.Create;

DomTree:= TDomTree.Create;
DomTreeNode:= DomTree.RootNode;
If DomTreeNode.RunParse(HtmlTxt) then
  begin
    {your code
	 example:
	 DomTreeNode.FindXPath('//*[@id="TopBox"]/div[1]/div[@class="draw default"]'),NodeList,ValueList)}
	 ..
  end;	
..

end;

And joy!!!

Available properties and methods: TDomTree

property Count - count of nodes
property RootNode - root node (TDomTreeNode)
property ParseErr - Tstringlist contains all parsing errors and warnings

TDomTreeNode

property Tag - name of tag
property AttributesTxt - string with all attributtes
property Attributes - parsed attributes (TDictionary<string, string>)
property Text - text
property TypeTag -
property Child - contains child's nodes (TChildList of TDomTreeNode)
property Parent - contains parent's node
property Owner - contains pointer to owner TDomTree
function FindNode - boolean function, if true then TNodeList contains found nodes
function FindTagOfIndex - boolean function, if true then TNodeList contains founded nodes
function GetAttrValue - returns value of attribute of current node
function GetComment - returns the set index comment in the current container node
function GetTagName - return name of tag + AttributesTxt
function GetText - returns the set index text in the current container node
function GetXPath - returns Xpath for current node
function RunParse - if parse is successfully then CHild property contains HTML DOM tree
function FindXPath - boolean function, if true then TNodeList contains found nodes
and TStringList contains found values of attribute, comment, text

Xpath support:

attributes - //*[@id="TopBox"]/div/@class
comment - //*[@id="TopBox"]/div/comment()[3]
text - //*[@id="TopBox"]/div/text()[2]
previous level - /../div[@class="draw default"]/img[2]/@alt
partial coincidence by search in value of attribute: /div[@class="draw] returned nodes [[div class="draw default"],[div class="draw"], [div class="draw any"]..] like Xpath's function contains. Note: Xpath always starts search from current node. If you want to do global search, you must start from root's node.

See demo project.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
demo		demo
test		test
.gitignore		.gitignore
HTML.Parser.pas		HTML.Parser.pas
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Delphi HTML Parser

Requirements

Usage

About

Releases

Packages

Languages

Nashev/HTML-Parser

Folders and files

Latest commit

History

Repository files navigation

Delphi HTML Parser

Requirements

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages