This Jupyter Notebook (dnldtool4rcsb.ipynb) downloads files (PDB and SDF) with atomic coordinates from the Protein Data Bank. It reads the HTML RCSB page to scrape data related to the identification of the active ligand. It focuses on structures for which binding affinity data is available. The active ligand is a small molecule bound to a protein target for which binding is available (de Azevedo et al., 2024). It employs the Requests HTTP library for downloading the atomic coordinates from the RCSB (Veit-Acosta & de Azevedo, 2021).
Schematic flowchart for DnldTool4RCSB. It reads input files (lig.in and par.in) and downloads PDB and SDF from the Protein Data Bank for which binding affinity data (e.g., Ki) is available. DnldTool4RCSB reads pdb_codes.in to define the PDB file(s) to be downloaded from the Protein Data Bank. Input files lig.in and par.in define the folders used for downloading structures and the binding affinity. DnldTool4RCSB also downloads the SDF for the active ligand in the structure.
This code has the following functions
ISO80000: This function determines the file size in kibibytes (KiB), mebibytes (MiB), gibibytes (GiB) etc. It employs ISO/IEC 80000 standard
(https://www.iso.org/standard/87648.html).
read_dictionary: This function reads a file with parameters stored as Python dictionaries.
read_pdb_codes: This function reads a CSV file with PDB access codes and returns a list with them. It shows a summmary.
read_pdb_codes_no_summary: This function reads a CSV file with PDB access codes and returns a list with them.
show_line: This function shows a formatted line.
show_references: This function shows references given as a list.
show_title: This function shows a formatted line as a title.
rcsb_download_sdf: This function downloads a specific ligand from the PDB using
the RCSB Model Server API and saves it as an SD File.
scrape_data_rcsb: This function scrapes data from the RCSB page. It saves it to a file with the identification of the active ligand found in a given structure.
extract_pdb_coordinates: This function extracts coordinates from a downloaded
PDB file. It intends to select target coordinates for docking simulations.
rcsb_download_pdb: This function downloads a PDB file from the RCSB and saves it to a target directory.
zip_content_folders: This function zips datasets folder in the content directory.
download_file_from_google_drive: This function downloads a file from the google drive.
get_confirm_token: This function gets the confirmation token.
save_response_content: This function saves the response content.
unzip_a_folder: This function unzips a previously zipped folder.
make_a_dir: This function makes a directory in the content folder.
Requests HTTP library docs available at https://requests.readthedocs.io/en/latest/.
To install the requests library, type the following command.
python -m pip install requests
References
de Azevedo WF Jr, Quiroga R, Villarreal MA, da Silveira NJF, Bitencourt-Ferreira
G, da Silva AD, Veit-Acosta M, Oliveira PR, Tutone M, Biziukova N, Poroikov V, Tarasova O, Baud S. SAnDReS 2.0: Development of machine-learning models to explore the scoring function space. J Comput Chem. 2024;45(27):2333-2346.
doi:
Veit-Acosta M, de Azevedo Júnior WF. The Impact of Crystallographic Data for the Development of Machine Learning Models to Predict Protein-Ligand Binding Affinity. Curr Med Chem. 2021;28(34):7006-7022.
doi:
de Azevedo WF Jr, editor. Docking screens for drug discovery. 2nd ed. New York, NY: Springer; 2026. DOI: 10.1007/978-1-0716-4949-7
My scientific interests are interdisciplinary, with three main emphases: computational structural biology, artificial intelligence, and complex systems. In my studies, I developed several free software programs to explore the concept of Scoring Function Space.
As a result of my research, I published over 200 scientific works about protein structures, computer models of complex systems, and simulations of protein systems. These publications have generated over 12,000 citations on Google Scholar (h-index of 63) and more than 10,000 citations and an h-index of 58 in Scopus.
Due to the impact of my work, I have been ranked among the most influential researchers in the world (Fields: Biophysics, Biochemistry & Molecular Biology, and Biomedical Research) according to a database created by Journal Plos Biology (see news here). The application of the same set of metrics recognized the influence of my work from 2021 to 2025 (Baas et al., 2021; Ioannidis, 2022, 2023, 2024, and 2025). Not bad for a poor guy who was a shoe seller at a store in São Paulo and had the opportunity to study at the University of São Paulo with a scholarship for food and housing. I was 23 when I initiated my undergraduate studies and was the first in my family to have access to higher education.
Document and Citation Trends (Scopus ID: 7006435557) (Data captured on January 30, 2026)
Regarding scientific impact (Peterson, 2005), Hirsch said that for a physicist, an h-index of 45 or higher could mean membership in the National Academy of Sciences of the USA. So far, there have been no invitations. No hard feelings because I am in good company. Carl Sagan was never allowed into the National Academy of Sciences. According to an analysis of citations performed on Nov. 9, 2024 (The Conversation), his work accumulates more than 1,000 citations per year on Google Scholar. Indeed, his current citation rate exceeds that of many members of the National Academy of Sciences.
I will continue working in science with low-budget and interdisciplinary projects and combating denialism with science. The fight against denialism is a continuing work, and scientists should not forget their role in a complex society where social media has given the right to speak to legions of imbeciles.
“Social media gives the right to speak to legions of imbeciles who previously only spoke at the bar after a glass of wine, without damaging the community. They were immediately silenced, but now they have the same right to speak as a Nobel Prize winner. It’s the invasion of imbeciles.”
Umberto Eco. Source: Quote Investigator
"Let the light of science end the darkness of denialism." My quote (DOI:10.2174/092986732838211207154549).