-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Element Selection Guide
To get data from the web page, the first step we need to do is to "Select" element on the web page. After the element is selected, then we can define subsequent operations based on it.
There are some concepts regarding with element selection.
The Waiting Element refers to the element prepared to be selected. Just move the mouse pointer on the web page, we will see one element is marked with light-purple background, which means the element is waiting to be selected currently:
When you want to select one element, just right-click your mouse or press F7 on the keyboard to select the element, which will be marked with bright-blue background:
Then you can select the 2nd/3rd/.../n-th elements again by keep right-click your mouse or press F7 on the keyboard:
After the element is selected, you will see many options on the Operation Toolbox, which can be dragged freely:
If you select more than 2 elements, but you don't want the latest selected element, you can click the Revoke selection option on the toolbox to revoke the latest selected element.
For example, you selected 3 elements on the webpage, which in order is:
(1) The "Daily Deals" link on the top bar.
(2) The "Shop by category" option.
(3) The "ebay" image (logo).
And now you don't want to select the "ebay" image anymore (3), then you can click the "Revoke selection" option to deselect it, remaining the other 2 elements (1 and 2) be selected:
If you want to cancel the current selection, click the Deselect button on the toolbox to deselect all selected elements:
Sometimes you may want to select an element, but what is actually selected is its child element. E.g., the real XPath of the element you want to select is: /html/body/div/a, but the XPath of actually selected element is: /html/body/div/a/span, and you cannot actually select the "a" tag because it don't actually have a width or height. Here, you need to click the Expand Path button on the toolbox to expand the XPath of current selected element to its parent element:
And the selection area will also be marked:
EasySpider support automatically detect similar elements, which is very useful when we want to get data from a list, e.g., to get all product titles/prices on ebay.
Take the above ebay collection task as example, firstly, we select the title of the first product in the list, and we can see all other product titles are marked with blue boarder, which means they are "waiting to be selected":
Then we can click the "Select All" option on the toolbox, or select the 2nd/3rd/.../n-th matched element by right-click our mouse to select all matched elements:
Sometimes EasySpider may have many options of "Similar Elements", e.g., the following two types of elements can both be similar elements for the link on Google:
This is because that elements under (1) are all "links", and elements under (2) belong to a same parent.
Under this condition, EasySpider will hint us one type of similar elements, if the hinted elements (such as (2)) are not the actual similar elements we want, we can just select the second intended elements on the web page to let EasySpider change the similar element pattern:
Another case is sometimes EasySpider may detect less similar elements than we expect:
We can see only the products under "Score these trending kicks" sections are detected, but we want not only these products, but also products under "Feel-good fashion at the Brand Outlet" section. To select them, similarly, we just need to artificially select one product at the "Feel-good fashion at the Brand Outlet" section, then all similar products will be detected:
Then, click "Select All" option in the toolbox to select all elements.
An element on a web page may contain many child elements which are useful, such as the following block contains the title, price, discount, number of watchers, warranty, etc. of a product:
Artificially select them one by one is cumbersome, especially when we want to select many elements under a list. Therefore, EasySpider provides the "Select child elements" option for us to select all child elements by one-click. The steps are:
-
Select one element by right-click or F7.
-
Click "Select child elements" option on the toolbox.
Then all child elements are selected by EasySpider:
To remove useless child element(s), please configure at the Workflow Manager.
Click Pages
above to see more pages