Thank you for your interest in contributing to the project! The goal of this guide is to help you with your endevour. There are many ways to contribute and we have outlined some opportunities which might be interesting to you. If you have any questions or suggestions, feel free to contact us at webchem@ropensci.org.
Write us an e-mail and show us a full example of how you use or how you would like to use webchem
in your data analysis! This would give us ideas about new features and also help us create better vignettes that help others get started. Please send your e-mails to webchem@ropensci.org.
If you found a bug either in the code or the documentation, or a data source you would like us to integrate into webchem
, maybe you dreamed up a new functionality that would be nice to implement, raise an issue and let's discuss it! Even if you don't have the time or the coding background to resolve the issue yourself, maybe others do, and so just by giving a good problem you might help others who are looking for interesting problems to solve. You can raise an issue here. Feel free to join discussions on existing issues as well!
If you know some coding, you can also add code contributions.
- Fork this repo to your Github account.
- Clone your version on your account down to your machine from your account, e.g,.
git clone https://github.com/<yourgithubusername>/webchem.git
. - Make sure to track upstream progress (i.e., on our version of
webchem
atropensci/webchem
) by doinggit remote add upstream https://github.com/ropensci/webchem.git
. Before making changes make sure to pull changes in from upstream by doing eithergit fetch upstream
then merge later orgit pull upstream
to fetch and merge in one step - Make your changes. Bonus points for making changes on a new branch.
Creating new branches is good practice. This is because if you finish with a topic, open a pull request and start working on another topic without starting a new branch, any further commits you push to your account will be automatically added to your pull request as well, making it much harder for us to evaluate your request. To aboid this, open a new branch for each new topic.
- Push up to your account.
- Submit a pull request to home base at
ropensci/webchem
.
You can find the rOpenSci developer guide at https://devguide.ropensci.org/
We are happy to help at any point in your work.
-
We follow the tidyverse style. You can find the style guide here. Before committing your code, we encourage you to use
lintr::lint_file()
to check for nonconformances. -
We use
roxygen2
for documentation. Please make sure you update the package to the latest version before you update the documentation withdevtools::document()
. Use@noRd
for non exported functions. -
Please use the
xml2
package instead of theXML
package. The maintainance of xml2 is much better. -
Please use the lightweight
jsonlite
package for handling JSON. -
Use utilities in
webchem::utils.R
when possible to keep function style consistent across the package. -
Be nice to the resources! Minimise interaction with the servers. Use appropriate timeouts.
-
Within test files always include a check whether the webservice is running and skip all tests when it is not. See
R/ping.R
for more details.
Some consistency guidelines:
-
Functions that query a database for one or more database specific identifiers should follow the naming convention
get_*
, e.g. the function that queries ChEBI IDs is calledget_chebiid()
. These functions should take a vector of queries and return a single tibble. Whenever possible these functions should have argumentsquery
,from
,match
,verbose
and...
. The first column of the tibble should contain the ID-s and the last should contain the queries. Invalid queries should return a row of NA-s (apart from the last element of the row which should be the query itself). -
The naming of functions that query a database for chemical information should start with the name of the database, followed by the functionality, e.g.
pc_synonyms()
searches for synonyms in PubChem. These functions should take a vector of queries and return a list of responses. Invalid queries should returnNA
. -
Functions should always validate their input when appropriate. Use
match.arg()
for input validation. -
Make sure
NA
is not confused with sodium. -
Functions that retrieve images should follow the naming convention
*_img
, e.g. the function that retrieves images from ChemSpider is calledcs_img()
. These functions should take a vector of arguments and download images into a user defined directory. They should not keep images in memory, should not implement image processing functionality, and should not return anything to the console. Functions should include argumentsdir
,overwrite = TRUE
andverbose = TRUE
.dir
should not have a default value. -
SMILES strings may use special characters like "#".
URLencode()
does not encode this as "%23" by default, so useURLencode(query, reserved = TRUE)
instead. It's important to note that it's the query that has to be encoded like this, not the full url. -
Print verbose messages. Use
httr::message_for_status()
andwebchem_message()
functions to generate standard messages when possible. -
Wrap function examples that interact with an API in
\dontrun{}
. Avoid using\donttest{}
. -
If an API is no longer available defunct all the exported functions interacting with it.
You might think all webscraping is perfectly legal but it is unfortunately not that simple.
Some services allow you to browse their website but do not allow you programmable access, for various reasons. Therefore, we always have to check the Terms & Conditons and any other legal documents that might restrict programmable access. webchem
only provides access to databases where programmable access is clearly approved by the database provider. A provider might create a publicly accessible API, and if they do not have a restrictive T&C, this indicates their implicit approval for programmatically accessing their data. In all other cases explicit approval is required, i.e. either the T&C has to state that scraping is allowed, or we have to acquire written consent from the database provider before developing functions that scrape their website.
And there is a big difference between scraping and crawling. webchem
does provide some scraping functionality but it does not provide crawling functionality.