Skip to content

Swift wrapper around libxml2 HTML Parser to provide SAX style HTML Parsing

License

Notifications You must be signed in to change notification settings

raymccrae/swift-htmlsaxparser

Repository files navigation

HTML SAX Parser for Swift 4

HTMLSAXParser is a swift module that wraps the libxml2 HTMLParser for the purposes of providing a simple lightweight SAX parser for HTML content. libxml2 is part of the Mac, iOS and Apple TV SDK, if you are developing for those platforms then you will not require any additional dependencies. SAX parsers provide an event based parsing process, where a closure you provide will be called with a series of events as the parser moves through the document.

HTMLSAXParser take inspiration from NSXMLParser however it uses enums with associated types for the parsing events, rather than a delegate class. A simple of example of usage is: -

let parser = HTMLSAXParser()
do {
	try parser.parse(string: "<html><body>Some HTML Content</body></html>") { context, event in
		switch event {
			case let .startElement(name, attributes):
				print("Found character : \(name)")
			case let .character(text):
				print("Found character : \(text)")
			default:
				break
		}
	}
}
catch {
	// Handle error
}

This approach lends itself to short inlined processing of HTML without the need for a parser delegate class.

/**
 Example function to extract all the image sources from HTML data. Specifically
 fetching the "src" attribute from all "img" tags.
*/
func imageSources(from htmlData: Data) throws -> [String] {
	var sources: [String] = []
	let parser = HTMLSAXParser()
	try parser.parse(data: htmlData) { context, event in
		switch event {
			case let .startElement(name, attributes) where name == "img":
				if let source = attributes["src"] {
					sources.append(source)
				}
			default:
				break
		}
	}
	return sources
}

Installation

Swift Package Manager

Add HTMLSAXParser as a dependency to your projects Package.swift. For example: -

// swift-tools-version:4.0
import PackageDescription

let package = Package(
    name: "YourProject",
    dependencies: [
        // Dependencies declare other packages that this package depends on.
        .package(url: "https://github.com/raymccrae/swift-htmlsaxparser.git", .branch("master"))
    ],
    targets: [
        // Targets are the basic building blocks of a package. A target can define a module or a test suite.
        // Targets can depend on other targets in this package, and on products in packages which this package depends on.
        .target(
            name: "YourProject",
            dependencies: ["HTMLSAXParser"]),
    ]
)

Since this module makes use of libxml2 you will need to inform the C compiler where the header files for libxml2 are located. If you have Xcode installed (Mac Only) then you can include the following additional arguments to the swift build command to the current SDK path:

$ swift build -Xcc -I"$(xcrun --show-sdk-path)/usr/include/libxml2"

Contributors

Contributors on GitHub

License

Version

  • Version 0.4