Skip to content

Commit

Permalink
new refactored parser
Browse files Browse the repository at this point in the history
  • Loading branch information
Tamer Gur committed Jun 11, 2019
1 parent 42fbc02 commit 4d215ab
Show file tree
Hide file tree
Showing 15 changed files with 567 additions and 25,410 deletions.
88 changes: 34 additions & 54 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,85 +1,65 @@
## XML Stream Parser for GO
xml-stream-parser is a GO library to parse xml files. It is written to addres the performance [issue](https://github.com/golang/go/issues/21823) in default xml package.

### Install

```
go get -u github.com/tamerh/xml-stream-parser
```

## xml stream parser
xml-stream-parser is xml parser for GO. It is efficient to parse large xml data with streaming fashion.

### Usage

Let say you have following xml and you want to loop over book as a stream
and parse various elements and attributes

```xml
<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
<book ISBN="10-000000-001">
<book>
<title>The Iliad and The Odyssey</title>
<price>12.95</price>
<comments>
<userComment rating="4">Best translation I've read.</userComment>
<userComment rating="2">I like other versions better.</userComment>
</comments>
<description>Homer's two epics of the ancient world, The Iliad & The Odyssey, tell stories as riveting today as when they were written between the eighth and ninth century B.C.</description>
</book>
<book ISBN="10-000000-999">
<book>
<title>Anthology of World Literature</title>
<price>24.95</price>
<comments>
<userComment rating="3">Needs more modern literature.</userComment>
<userComment rating="4">Excellent overview of world literature.</userComment>
</comments>
<description>The anthology includes epic and lyric poetry, drama, and prose narrative, with many complete works and a focus on the most influential pieces and authors from each region and time period.</description>
</book>
</bookstore>
```

you can use the library like so

<b>Stream</b> over books
```go
//First open your file and create reader. You can also use gzip file check tests
file, _ := os.Open("books2.xml")
defer file.Close()
br := bufio.NewReader(file)
f, _ := os.Open("input.xml")
br := bufio.NewReaderSize(f,8192)
parser := pr.NewXmlParser(br, "books")

// then create following channel to read your parsed data from.
var resultChannel = make(chan XMLEntry)
for xml := range *parser.Stream() {
fmt.Println(xml.Childs["title"][0].InnerText)
fmt.Println(xml.Childs["comments"][0].Childs["userComment"][0].Attrs["rating"])
fmt.Println(xml.Childs["comments"][0].Childs["userComment"][0].InnerText)
}

```

// init parser
var parser = XMLParser{
R: br,
// define tag to loop over
LoopTag: "book",
OutChannel: &resultChannel,
// you can skip tags that you are not interested it relatively speeds up the process
SkipTags: []string{"description"},
<b>Skip</b> tags for speed
```go
parser := pr.NewXmlParser(br, "books").SkipElements([]string{"price", "comments"})
```

<b>Error</b> handlings
```go
for xml := range *parser.Stream() {
if xml.Err !=nil {
// handle error
}
}
```

// start parsing with a go routine
go parser.Parse()
<b>Progress</b> of parsing
```go
// total byte read to calculate the progress of parsing
parser.TotalReadSize
```

// and finally read parsed data
for book := range resultChannel {
// print ISBN value
isbn := book.Attrs["ISBN"]
fmt.Println(isbn)

// print title
title := book.Elements["title"][0].InnerText
fmt.Println(title)

// print a user commet which has rating 4
// basically you can walk on all the sub nodes if you have
for _, userComments := range book.Elements["comments"][0].Childs {
for _, comment := range userComments {
if comment.Attrs["rating"] == "4" {
// print the user comment
fmt.Println(comment.InnerText)
}
}
}
}
```

If you interested check also [json parser](https://github.com/tamerh/jsparser) which works similarly
3,115 changes: 0 additions & 3,115 deletions article.xml

This file was deleted.

122 changes: 0 additions & 122 deletions books.xml

This file was deleted.

21 changes: 0 additions & 21 deletions books2.xml

This file was deleted.

16 changes: 0 additions & 16 deletions books_invalid.xml

This file was deleted.

5 changes: 5 additions & 0 deletions error.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
<examples>
<tag1 att1="<att0>" att2="att0">
<tag11 att1="att0">InnerText110</tag11>
<tag11 att1="att0">InnerText111</tag11>
<tag12 att1="att0"/>
Loading

0 comments on commit 4d215ab

Please sign in to comment.