Mapping from DOM to POJO with CSS Query Syntax and annotations.
Estivate use JSoup API for inside CSS queries.
Giving this simple HTML, we want to the POJO's name
field set with the body of #nameId
element.
<html>
<head></head>
<body>
<div id="nameId">This is my name</div>
</body>
</html>
public class Result {
@Text(select="#nameId")
public String name;
}
Mapping a DOM document to a POJO is very easy.
InputStream document = ...
EstivateMapper mapper = new EstivateMapper();
Result result = mapper.map(document, Result.class);
Definition of Result class POJO definition which is:
- Select an JSoup Element with cssQuery
"#nameId"
on the document.
- Apply JSoup
element.text()
on the Element selected. - Set the result to the
name
field.
<dependency>
<groupId>com.github.btheu.estivate</groupId>
<artifactId>estivate</artifactId>
<version>0.4.2</version>
</dependency>
InputStream document = ...
EstivateMapper mapper = new EstivateMapper();
List<Result> result = mapper.mapToList(document, Result.class);
@Select("div.someClass")
public class Result {
@Text(select=".name")
public String name;
}
Estivate's annotations can be used directly on methods. This provides a way to implement custom operations just after mapping.
public class Result {
public String name;
@Text(select="#nameId")
public void setName(String pName){
this.name = pName.substring(0,3).toUpperCase();
}
}
Makes JSoup's element.select(...)
operation on the DOM Document.
Do cssQuery on the DOM Document then return the DOM Element corresponding.
When combined with @Text
(or @Attr
), the
final result will be the application of text()
(or attr(...)
)
on this DOM Element.
public class Result {
@Text(select="div#content > span p")
public String description;
}
Also, the JSoup Element object can be mapped to the field or method.
public class Result {
@Select(select="div#content > span p")
public Element paragraphElement;
}
Method mapping is a way to perform further JSoup operations.
public class Result {
public String name;
@Select(select="div#content > span p")
public void setName(Element pElement){
name = pElement.siblingNodes().first().text();
}
}
Makes JSoup's element.text()
operation on the DOM Element when own attribute is set to false.
Maps the combined text of this element and all its children. Whitespace is normalized and trimmed.
public class Result {
@Text(select="#description")
public String description;
}
Makes JSoup's element.ownText()
operation on the DOM Element when value is true.
Maps the text owned by this element only; does not get the combined text of all children.
public class Result {
@Text(select="#description")
public String description;
}
Makes JSoup's element.attr(...)
operation on the DOM Element.
Maps an attribute's value by its key. To get an absolute URL from an attribute that may be a relative URL, prefix the key with abs, which is a shortcut to the absUrl method. E.g.:
public class Result {
@Attr(select="#picture", value="abs:href")
public String absoluteUrl;
}
Parse Table HTML DOM and match data by column name
Each column are mapped to java class field/method.
InputStream document = ...
EstivateMapper mapper = new EstivateMapper();
List<Result> result = mapper.mapToList(document, Result.class);
@Table(select="#table1")
public class Result {
@Column("Number Column")
public int number;
@Column("Street Column")
public int street;
@Column(name="Name.*Column", regex=true)
@Attr(select="span", value="title")
public String name;
}
<html>
<head>
<title>table-u1</title>
</head>
<body>
<div id="content">
<table id="table1">
<thead>
<tr>
<th><span>Number Column</span></th>
<th><span>Street Column</span></th>
<th><span>Name Column</span></th>
</tr>
</thead>
<tbody>
<tr>
<td><span>1</span></td>
<td><span>streetA</span></td>
<td><span title="nameA"/></td>
</tr>
<tr>
<td><span>2</span></td>
<td><span>streetB</span></td>
<td><span title="nameB"/></td>
</tr>
<tr>
<td><span>3</span></td>
<td><span>streetC</span></td>
<td><span title="nameC"/></td>
</tr>
</tbody>
</table>
</body>
</html>
Makes JSoup's element.is(...)
operation on the DOM Element.
Check if this element matches the given Selector CSS query.
public class Result {
@Is(select="#setting", value=".specific")
public boolean isSpecific;
}
Indicate that Estivate wont throw a exception if the mapping of this field or method is not satisfied.
public class Result {
@Text(select="#description", optional=true)
public String description;
}
Makes JSoup's element.TagName()
operation on the DOM Element.
Maps the name of the tag for this element. E.g. div
public class Result {
@TagName(select=".picture", first=true)
public String pictureTagName;
}
Makes JSoup's element.title()
operation on the DOM Document.
Maps the string contents of the document's title element.
public class Result {
@Title
public String pageTitle;
}
Makes JSoup's element.val()
operation on the DOM Element.
Maps the value of a form element (input, textarea, etc).
public class Result {
@Val("#form_field_1")
public String name;
}
POJO can have complexe mapping having sub POJO themself mapped with a sub DOM Element.
public class Page {
@Select(select="div#content2")
public Content content;
}
/**
* All fields will be mapped with the sub DOM
* selected by <code>Page</code> content rule
*/
public class Content {
@Text(select=".name")
public String name;
@Text(select=".description")
public String description;
}
The name
field will be setted as "Actual name2"
with the following HTML.
<html>
<head></head>
<body>
<div id="content1">
<div class="name">
Actual name1
</div>
...
<div class="description">
This is the description of content 1.
</div>
</div>
<div id="content2">
<div class="name">
Actual name2
</div>
...
<div class="description">
This is the description of content 2.
</div>
</div>
</body>
</html>
public class Page {
@Select(select="div.article p")
public List<Article> articles;
}
/**
* All fields will be mapped with the sub DOM
* selected by <code>Page</code> articles rule for one <code>P</code>
*/
// JSoupSelectList is not necessary as long Page already specify the select rule.
public class Article {
@Text(select=".author")
public String author;
@Text(select=".date")
public String date;
}
This will perfectly macht all aticles giving this HTML DOM.
<html>
<head></head>
<body>
<div class="article">
<p>
<div class="author">
Author first article.
</div>
...
<div class="date">
Nov. 1st 2015
</div>
</p>
</div>
...
<div class="article">
<p>
<div class="author">
Author last article.
</div>
...
<div class="date">
Nov. 30th 2015
</div>
</p>
</div>
</body>
</html>
Estivate handles primitive types for fields or methods arguments mapping.
public class Rapport {
@Text(select="#nbTeachers")
public Integer numberOfTeachers;
@Text(select="#nbStudents")
public int numberOfStudents;
}
<html>
<head></head>
<body>
<div id="nbTeachers">
123
</div>
<div id="nbStudents">
456
</div>
</body>
</html>
The MIT License
© 2016-2023, Benoit Theunissen benoit.theunissen@gmail.com
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.