Official data sources for the Quality Attributes project to train, test and validate if Non-Functional Requirements related to Quality Attributes can be found on GitHub Issues reports.
On December 14th, 2019 the site http://ctp.di.fct.unl.pt/RE2017/pages/submission/data_papers/ was visited to get the PROMISE dataset, included as part of a data challenge.
Sayyad Shirabad, J. and Menzies, T.J. (2005) The PROMISE Repository of Software Engineering Databases. School of Information Technology and Engineering, University of Ottawa, Canada. Available: http://promise.site.uottawa.ca/SERepository
The non-functional requirements' labels in this dataset (involving 15 different projects) are distributed as follows:
| Class | Quantity | Percentage |
|---|---|---|
| Funcional (F) | 255 | 40.80% |
| Availability (A) | 21 | 3.36% |
| Fault Tolerance (FT) | 10 | 1.60% |
| Legal (L) | 13 | 2.08% |
| Look & Feel (LF) | 38 | 6.08% |
| Maintainability (MN) | 17 | 2.72% |
| Operational (O) | 62 | 9.92% |
| Performance (PE) | 54 | 8.64% |
| Portability (PO) | 1 | 0.16% |
| Scalability (SC) | 21 | 3.36% |
| Security (SE) | 66 | 10.56% |
| Usability (US) | 67 | 10.72% |
| Total | 625 | 100% |
For the purposes of this study, only a subset of this dataset was considered, as part of the quality attributes categories and due to imbalanced classes:
| Class | Quantity | Percentage |
|---|---|---|
| Availability (A) | 21 | 8.20% |
| Fault Tolerance (FT) | 10 | 3.91% |
| Maintainability (MN) | 17 | 6.64% |
| Performance (PE) | 54 | 21.09% |
| Scalability (SC) | 21 | 8.21% |
| Security (SE) | 66 | 25.78% |
| Usability (US) | 67 | 26.17% |
| Total | 256 | 100% |
Based upon the book:
Miller, Roxanne E., 2009, The Quest for Software Requirements, MavenMark Books, Milwaukee, WI
40 different non-functional requirements associated to quality attributes where collected. From the following categories (matching the ones included in the training).
- Access Security
- Availability
- Usability
- Maintainability
- Scalability
According to the State of the Octoverse in 2019, the most contributed open source project at GitHub were as follows:
| Place | Repository | Contributors |
|---|---|---|
| 01 | microsoft/vscode | 19.1k |
| 02 | MicrosoftDocs/azure-docs | 14k |
| 03 | flutter/flutter | 13k |
| 04 | firstcontributions/first-contributions | 11.6k |
| 05 | tensorflow/tensorflow | 9.9k |
| 06 | facebook/react-native | 9.1k |
| 07 | kubernetes/kubernetes | 6.9k |
| 08 | DefinitelyTyped/DefinitelyTyped | 6.9k |
| 09 | ansible/ansible | 6.8k |
| 10 | home-assistant/home-assistant | 6.3k |
The repositories selected describe different software systems, excluding documentations and projects with the same scope (i.e. flutter and react-native. Data collected using quality-attributes/issue-collector for the following repositories:
Note: Only the latest 100 issues (as of 02/20/2020) for each repository were collected, due to GitHub's API v4 limitations