CAASES is a web crawler created for extracting accessibility information from websites through the Avaliador e Simulador de Acessibilidade em Sítios (ASES)
The collected information includes:
Variable | Description | Dtype |
---|---|---|
url |
URL of the web page | string |
name |
The title of the web page | string |
size_bytes |
The size of the page's HTML file in bytes | float |
with_html |
Indicates whether information was collected from the HTML file or the URL | boolean |
num_lines_of_code |
The number of lines of code in the web page | integer |
The accessibility summary based on eMAG recommendations presents errors and warnings for six evaluation sections: markup, behavior, content/information, presentation/design, multimedia and forms.
Variable | Description | Dtype |
---|---|---|
url |
URL of the web page | string |
ases_pct |
Web page accessibility percentage | float |
n_markup_errors |
Number of markup errors on the page | integer |
n_behavior_errors |
Number of behavior errors on the page | integer |
n_information_errors |
Number of information errors on the page | integer |
n_presentation_errors |
Number of presentation errors on the page | integer |
n_multimedia_errors |
Number of multimedia errors on the page | integer |
n_form_errors |
Number of form errors on the page | integer |
n_markup_warnings |
Number of markup warnings on the page | integer |
n_behavior_warnings |
Number of behavior warnings on the page | integer |
n_information_warnings |
Number of information warnings on the page | integer |
n_presentation_warnings |
Number of presentation warnings on the page | integer |
n_multimedia_warnings |
Number of multimedia warnings on the page | integer |
n_form_warnings |
Number of form warnings on the page | integer |
Variable | Description | Dtype |
---|---|---|
url |
URL of the web page | string |
category |
Category of the content (mark , behavior , information , presentation , multimedia , form ) |
string |
info_type |
Type of information (error or warning ) |
string |
recommendation |
eMAG recommendation | string |
count |
Quantity of recommendations | integer |
source_code_lines |
Lines of code to which the recommendation applies | list(string) |
CAASES can be installed directly from the source using the following commands
git clone https://github.com/lincprog/CAASES.git
pip install -r requirements.txt
The project structure is defined as follows:
📦CAASES
┣ 📂data (data collected by the crawler. It includes the data explained above)
┃ ┣ 📜emag_summary.csv
┃ ┣ 📜err_warn_summary.csv
┃ ┗ 📜page_info.csv
┣ 📂html_files (store HTML files downloaded by the crawler)
┣ 📂logs (store log files generated by the crawler)
┣ 📜broken_urls.txt (contains a list of broken URLs encountered during execution)
┣ 📜docker-compose.yml (Docker Compose configuration file)
┣ 📜main.py (executing the crawling process)
┣ 📜models.py (definitions for structures used for data collection)
┣ 📜README.md
┣ 📜requirements.txt (dependencies required for the project)
┣ 📜urls.txt (a list of URLs to be processed by the crawler)
┗ 📜utils.py (utility functions and helper methods used throughout the project)
Please, ensure Docker is installed on your system.
- Navigate to the directory containing the
docker-compose.yaml
file. - Open a terminal window.
- Run the command below to start the Docker containers defined in the docker-compose.yaml file
docker-compose up -d
- Once the containers are running, navigate to the directory containing the main.py file.
- Execute the main.py file using the appropriate command for your Python environment
python main.py
Optionally, to monitor the execution in the Selenium Grid, you can access the URL in your browser at http://localhost:4444.
The following works use CAASES:
- Marcos, C. O., Gustavo, S. S., & Antonio, F. L. J. J. (2024). Dados da avaliação de acessibilidade Web nos portais das Instituições de Ensino Superior no Brasil [Data set]. Zenodo. https://doi.org/10.5281/zenodo.10612128