You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
**In this lesson we'll use the browser tools for developers to manually extract product data from an e-commerce website.**
10
12
11
13
---
12
14
15
+
In our pursuit to scrape products from the [Sales page](https://warehouse-theme-metal.myshopify.com/collections/sales), we've been able to locate parent elements containing relevant data. Now how do we extract the data?
16
+
17
+
## Finding product details
18
+
19
+
Previously, we've figured out how to save the subwoofer product card to a variable in the **Console**:
The product details are within the element as text, so maybe if we extract the text, we could work out the individual values?
27
+
28
+
```js
29
+
subwoofer.textContent;
30
+
```
31
+
32
+
That indeed outputs all the text, but in a form which would be hard to break down to relevant pieces.
33
+
34
+

35
+
36
+
We'll need to first locate relevant child elements and extract the data from each of them individually.
37
+
38
+
## Extracting title
39
+
40
+
We'll use the **Elements** tab of DevTools to inspect all child elements of the product card for the Sony subwoofer. We can see that the title of the product is inside an `a` element with several classes. From those the `product-item__title` seems like a great choice to locate the element.
JavaScript represents HTML elements as [Element](https://developer.mozilla.org/en-US/docs/Web/API/Element) objects. Among properties we've already played with, such as `textContent` or `outerHTML`, it also has the [`querySelector()`](https://developer.mozilla.org/en-US/docs/Web/API/Element/querySelector) method. Here the method looks for matches only within children of the element:
45
+
46
+
```js
47
+
title =subwoofer.querySelector('.product-item__title');
48
+
title.textContent;
49
+
```
50
+
51
+
Notice we're calling `querySelector()` on the `subwoofer` variable, not `document`. And just like this, we've scraped our first piece of data! We've extracted the product title:
To figure out how to get the price, we'll use the **Elements** tab of DevTools again. We notice there are two prices, a regular price and a sale price. For the purposes of watching prices we'll need the sale price. Both are `span` elements with the `price` class.
We could either rely on the fact that the sale price is likely to be always the one which is highlighted, or that it's always the first price. For now we'll rely on the former and we'll let `querySelector()` to simply return the first result:
62
+
63
+
```js
64
+
price =subwoofer.querySelector('.price');
65
+
price.textContent;
66
+
```
67
+
68
+
It works, but the price isn't alone in the result. Before we'd use such data, we'd need to do some **data cleaning**:
But for now that's okay. We're just testing the waters now, so that we have an idea about what our scraper will need to do. Once we'll get to extracting prices in Python, we'll figure out how to get numbers out of them.
73
+
74
+
## Extracting URL
75
+
76
+
:::danger Work in Progress
77
+
78
+
Under development.
79
+
80
+
:::
81
+
82
+
## Extracting all URLs
83
+
84
+
:::danger Work in Progress
85
+
86
+
Under development.
87
+
88
+
:::
89
+
90
+
---
91
+
92
+
<Exercises />
93
+
13
94
:::danger Work in Progress
14
95
15
-
This lesson is under development. Please read [Extracting data with DevTools](../scraping_basics_javascript/data_extraction/devtools_continued.md) in the meantime so you can follow the upcoming lessons.
0 commit comments