-
Notifications
You must be signed in to change notification settings - Fork 0
/
footer.html
59 lines (47 loc) · 10.4 KB
/
footer.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
<hr/>
<h2 id="help">Help</h2>
<div>
<p>This application allows to inspect the distributions (probability densities) of a large corpus of metrics. It was conceived to demonstrate the significant differencs in these densities when they become conditional on the context. For example, the distribution of the TLOC metric differs vastly among the contexts "Programming language" and "Middleware".</p>
<p>In order to <i>operationalize</i> a metric in a given context, it is transformed into a <i>score</i> using some ideal value. Some metrics have an implicit ideal value. For example, the McCabe Cyclomatic Complexity has a lowest-possible (and desirable) ideal value of <code>1</code>. This application allows to transform any metric into a distance using a user-chosen explicit ideal value. After transforming a metric into a distance, its distribution reflects the observed distances from the chosen ideal value. When we then proceed to obtaining the <i>complementary cumulative</i> distribution function, the CCDF, those distances can be translated into scores. The application also allows checking metrics values from own applications in specific contexts.</p>
<h3>Elements</h3>
<ul>
<li><b>Controls:</b>
<ul>
<li><b>Metric Selector:</b> Used to select a single metric for operationalization across the entire corpus and all the various domains. Metrics are grouped into discrete (if discrete, the metric is of integral type) and continuous.</li>
<li><b>Distribution Type Selector:</b> This application supports the four principal types <b>PDF</b> (probability density function), <b>CDF</b> (cumulative density function), <b>CCDF [<code>1-CDF</code>]</b> (<i>complementary</i> cumulative distribution function; sometimes also called survival function), and PPF (percent point or quantile function). Furthermore, the application offers different sources for each fitted random variable. These are:
<ul>
<li><b>Parametric:</b> A total of approximately 100 continuous different distributions were attempted to fit to each metric, in each domain. The best fitting distribution was selected using a one-sample Kolmogorov—Smirnow test [4]. We even attempt to fit continuous distributions to discrete data, but not vice versa.</li>
<li><b>Parametric (discrete):</b> For those metrics that are of discrete nature, that is, <i>integral</i>, we also attempt to fit more than approximately 20 discrete distributions and select the best fitting using the Epps—Singleton two-sample goodness-of-fit test [5] (the second sample is generated using the PPF of the fitted distribution).</li>
<li><b>Kernel Density Estimation:</b> Using a Kernel Density Estimation allows us to estimate a probability density for continuous and discrete metrics. In cases of the (C)CDF derived from KDE fit, we sample a large number inversely from the estimated density, fit a Gaussian KDE, and then obtain an ECDF in order for it to become smooth. This ECDF can be reversed (to become an ECCDF) and inversed (to become an EPPF). The smoothed versions are useful for cases when the actual data is scarce and the ECDFs have large jumps.</li>
<li><b>Empirical:</b> While the (C)CDF and EPPF are derived from the data's ECDF, the EPMF is a frequency table over the data. For obtaining exact scores, ECDF and EPMF should be used.</li>
</ul>
</li>
<li><b>Status indicator:</b> Shows "Ready." when the application is ready to be used. Some actions will require a significant amount of processing during which the status will show a throbber and a status text. It may also show errors, if any. For example, selecting a discrete distribution for a continuous metric is an error.</li>
<li><b>Transformation Type Selector:</b> Used to select an explicit value to transform the selected metric into a distance from that value, which is context-specific. While the infimum and supremum were taken directly from the data, the expectation (E[X]), median, and mode for continuous metrics were taken from PDFs that were obtained by fitting a Gaussian Kernel on a large individual inverse sample from the original data. This was done to obtain more real-valued data as transformation values are otherwise very similar and repetitive. For discrete metrics, these transformation values were computed ordinarily.</li>
<li><b>Own Metric Value / Probability Input:</b> A numeric input field that enables the user to check their own metric value or probability against the currently selected distribution- and transformation type. If no transformation is chosen or transformation using the context's ideal value is disabled, the own metric's value is shown as a vertical line on the plot. Note that when checking a probability against a PPF, the checkbox for applying the selected transform is disabled.</li>
<li><b>Apply transform using ideal value:</b> If selected, the own metric's value will be transformed into a distance (has no effect if no transformation was chosen). The transformation is calculated as <code>abs(ideal - [own metric value])</code>. Note that when a transformation is chosen and this box is checked, the vertical line in the plot indicating the own metric's value will disappear, as the distance is dependent on the context and this would otherwise require drawing a line for each transformed value / distance.</li>
<li><b>Contain Plot Button:</b> When this button is clicked, the plot will be panned and zoomed to exactly contain the shown distributions. Note that the plot itself also has a reset button.</li>
<li><b>Cutting off smoothed distributions:</b> This checkbox is only enabled if the chosen distribution is not of type E(C)CDF. Since smoothed distributions were obtained using Kernel density estimation, the bandwidth of Gaussian Kernels will indicate non-zero probability beyond actual observed values. Checking this box will allow to obtain a value from a smoothed distribution while simultaneously ascertaining that it will not be out of range.</li>
</ul>
</li>
<li><b>Data Table:</b> A table that shows context-specific values (according to each domain).
<ul>
<li>Domain: The domain/context.</li>
<li>Used Transformation Value: If a transformation other than <i><none></i> was chosen, this column shows the context-dependent value for the conditional distribution. This value is used to transform the own metric's value into a distance from the transformation value.</li>
<li>Metric Value / Metric Distance / Probability: If an own metric value or probability is checked against the corpus, the value is shown here. If a transformation other than <i><none></i> is chosen and the checkbox for applying transformations is selected, then the own metric's value is transformed into a distance using each domain's ideal value as <code>abs(ideal - value)</code>. The title of this column will change accordingly.</li>
<li>Relative Likelihood / Cumulative Probability / Corresponding Score / Value of Random Variable: Shows the (transformed) own metric's value for each domain or the predicted metric's value for a given probability. For probability densities (PDF), the column name will be "Relative Likelihood". For ECDFs the title will be "Cumulative Probability", for ECCDFs / Scores the title will be "Corresponding Score", and for PPFs the title will be "Value of Random Variable".</li>
<li>Parametric Distribution: Shows the name of the best-fitting parametric distribution if the selected type of distribution is a parametric PDF, CDF, or CCDF. Shows <i><not parametric></i>, otherwise. Note that in most cases, even with a significance level of <code>alpha=0.001</code>, it was not possible to fit a parametric distribution. In those cases, the value of this column shows <i><not possible></i>.</li>
<li>Statistic: This is the statistical test's value. For continuous random variables, we compute the one-sample Kolmogorov—Smirnov goodness-of-fit test [4], and for discrete random variables we use the Epps—Singleton two-sample goodness-of-fit test [5].</li>
</ul>
</li>
<li><b>Plotting Area:</b> The plot is the main interactive user element of the "Metrics As Scores" web application. It is best utilized using a mouse. Panning the plot can be achieved by using the left mouse button, zooming by using the mouse wheel. By default, x- and y-axis are zoomed simultaneously. The plot features additional controls located on top of the right edge for zooming along the x- or y-axis separately. Each domain in the legend on the right can be (de-)selected. By hovering the plot, a crosshair is shown. It allows to obtain values for relative likelihood, cumulative probability, and scores. The labels for both axes will change accordingly, based on the distribution type and whether a transformation is selected.</li>
</ul>
</div>
<hr/>
<h3>References</h3>
<ol>
<li>Hönel, Sebastian, Morgan Ericsson, Welf Löwe, and Anna Wingkvist. "Contextual Operationalization of Metrics As Scores: Is My Metric Value Good?". <i>2022 IEEE 22nd International Conference on Software Quality, Reliability and Security (QRS)</i> 2022. <a target="_blank" href="https://doi.org/10.1109/QRS57517.2022.00042">https://doi.org/10.1109/QRS57517.2022.00042</a>.</li>
<li>Terra, Ricardo, Luis Miranda, Marco Valente, and Roberto Bigonha. "Qualitas.class Corpus." <i>ACM SIGSOFT Software Engineering Notes</i> 38, no. 5 (2013): 1–4. <a target="_blank" href="https://doi.org/10.1145/2507288.2507314">https://doi.org/10.1145/2507288.2507314</a>.</li>
<li>Tempero, Ewan, Craig Anslow, Jens Dietrich, Ted Han, Jing Li, Markus Lumpe, Hayden Melton, and James Noble. "The Qualitas Corpus: A Curated Collection of Java Code for Empirical Studies." <i>2010 Asia Pacific Software Engineering Conference</i>, 2010. <a target="_blank" href="https://doi.org/10.1109/APSEC.2010.46">https://doi.org/10.1109/APSEC.2010.46</a>.</li>
<li>Stephens, M. A. "EDF Statistics for Goodness of Fit and Some Comparisons." <i>Journal of the American Statistical Association</i> 69, no. 347 (1974): 730–37. <a target="_blank" href="https://doi.org/10.1080/01621459.1974.10480196">https://doi.org/10.1080/01621459.1974.10480196</a>.</li>
<li>Epps, T.W., and Kenneth J. Singleton. "An Omnibus Test for the Two-Sample Problem Using the Empirical Characteristic Function." <i>Journal of Statistical Computation and Simulation</i> 26, no. 3-4 (1986): 177—203. <a target="_blank" href="https://doi.org/10.1080/00949658608810963">https://doi.org/10.1080/00949658608810963</a>.</li>