One of the most common quotes about web scraping, paraphrasing the “Fight Club” book, is:
The first rule of web scraping is… do not talk about web scraping
We can see it also in the description of the r/webscraping subreddit and the main reason is that web scraping is still seen as a little dirty secret and should be kept so.
But have you ever heard the phrase:
The first rule of the ETL is… do not talk about ETLs?
No, because it makes no sense at all. Of course, people talk publicly about tools for Extract, Load, and Transform data (that’s what ETL stands for), with courses and degrees for them. But web scraping can also be seen as a tool to extract data from websites, and, as with every tool, can be used for good and bad.
You can use your ETL to extract data from the payroll database of your company and publish all the salaries online just to damage your employer and so web scraping can be used to extract copyrighted data or harm the target website’s business.
Web scraping it’s not in a grey area: the way it’s implemented determines if it’s illegal or not.
For this reason, at Databoutique.com (a marketplace for web data) we want to break this curtain around web scraping and create the Web Data Landscape Map, a catalog of all the companies involved in the extraction, enrichment, and usage of web data.
It’s a collaborative project, where anyone could contribute by sharing its knowledge of the industry, adding companies to the list, or solutions where web data is used.
By creating the most complete map possible, we can gain clarity about the actors involved in web data, their role, and the solutions they’re offering, in order to answer many of the questions that pop around, like: “Where I can find residential proxies?” or “how can I call to scrape a Youtube channel?”.
For the sake of clarity, we divided the map in two: a list of actors involved in web data and a list of solutions where web data are involved.
The actors involved are companies, organizations, and freelancers (within the boundaries of privacy laws) who explicitly offer solutions related to web data.
Any company, regardless of size, geography, or sector, and freelancer who’s actually scraping and selling web data, on Databoutique or not, can be listed on the map. This is particularly useful for empowering and showing your own brand and capabilities, building trust before people buy data from you.
Every company that provides a solution for making web scraping easier: proxy providers, anti-detect browsers, API providers (and so on) can be listed on Databoutique. This provides visibility to the hundreds (and growing) of technical users interested in web scraping that we are already registerd on the platform, who could be interested in trying their solutions. On top, companies can keep these professionals updated by adding news and discounts on the platform, which will be broadcast on our channels.
The raw data available on Databoutique it’s only the starting point: the end user wants insights and that’s why there are plenty of companies doing pricing optimization in different industries, dynamic pricing, trend forecasting, and so on.
Sometimes people on Databoutique look for data and then ask for a turn-key solution, but they have difficulties in finding the right one. Being listed on the Map gives your company the chance to be immediately visible beside the data that customers are looking for, so you can show them what can be built on top of that data they're interested into.
This is also true for System Integrators: sometimes customers prefer to buy raw data but don’t have time to create an automated solution to load it into their systems. Being visible on the map is a good way to catch their attention and offer them your services.
A solution is a service commonly offered to the market used for, with, or in relation to web data.
Every company offers a service to their customers, from data provisioning to competitive intelligence, a solution an extended description of these services.
Solutions are specific (i.e. Hotel Dynamic Pricing, Residential Proxy) and don’t refer to specific providers or product names.
It is the solution that allows an actor to be listed on a map, even if a company covers various services, but one of them is in relation to web data (i.e. they do consult on how to integrate web data) they are allowed - as long as there is an explicit reference in their service portfolio.
Actors can be associated with more than one solution, and this can evolve over time. Any use can suggest a new solution and can suggest new or ceasing relationships between actors and solutions.
The process takes two minutes and it’s totally free. On the actors’ page, for example, if the actor is not yet listed, you can suggest it by clicking the blue button.
This will open a guided form to submit all the needed data and, after a manual approval, the company will be visible on the catalog.
If you work on a company that is listed, instead, you can claim its ownership by clicking on the “Work Here?” black button and registering to Databoutique with a company email.
By claiming your company profile, you gain the right to post news and manage its solutions attached.
Are you willing to participate in populating the Web Data Landscape Map?