Skip to content

Commit fe80bea

Browse files
authored
docs: refine presentation of NNDR Ratio (#207)
1 parent 50aef0f commit fe80bea

File tree

1 file changed

+18
-13
lines changed

1 file changed

+18
-13
lines changed

mostlyai/qa/assets/html/report_template.html

Lines changed: 18 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -162,7 +162,7 @@ <h1 id="summary"><span>{{ meta.report_title }}</span>{{ meta.report_subtitle }}<
162162
<td style="width: 70px;">
163163
<div class="result-box-title">
164164
Distances
165-
<div data-bs-toggle="tooltip" data-bs-title='This metric represents the average distance between synthetic samples and their nearest training samples. For comparison, the average distances between synthetic samples and samples from a holdout dataset is shown in light gray to assess if the trained model learned the general patterns that are common in training as well as in holdout sets.'>
165+
<div data-bs-toggle="tooltip" data-bs-title='Identical matches is the share of synthetic samples that have at least one exact match within the training dataset. As reference the share of synthetic samples, with an identical match within the holdout is being reported. The average distances is the mean distance between synthetic samples and their nearest training samples. As reference the mean distance between synthetic samples and their nearest holdout samples is provided. The DCR share is the share of synthetic samples that are closer to a training sample than to a holdout sample. With equally-sized holdout and training datasets, the DCR share is ideally close to 50%. The NNDR is the nearest neighbor distance ratio, which is the distance towards the nearest neighbor divided by the distance to the second nearest neighbor. We compute the NNDR for all synthetic samples with respect to the training dataset, as well as with respect to the holdout dataset. The NNDR ratio is then the ratio of the 10-th smallest NNDR for synthetic vs. training, divided by 10-th smallest NNDR for synthetic vs. holdout.'>
166166
{{html_assets['info.svg']}}
167167
</div>
168168
</div>
@@ -435,10 +435,10 @@ <h2 id="distances" class="anchor">Distances</h2>
435435
<thead>
436436
<tr>
437437
<td style="width: 25%"> </td>
438-
<td style="width: 25%">Synthetic vs. Training Data</td>
438+
<td style="width: 25%">Synthetic vs. Training</td>
439439
{% if metrics.distances.ims_holdout is not none %}
440-
<td style="width: 25%"><small style="color: #666666;">Synthetic vs. Holdout Data</small></td>
441-
<td style="width: 25%"><small style="color: #999999;">Training vs. Holdout Data</small></td>
440+
<td style="width: 25%"><small style="color: #666666;">Synthetic vs. Holdout</small></td>
441+
<td style="width: 25%"><small style="color: #999999;">Training vs. Holdout</small></td>
442442
{% endif %}
443443
</tr>
444444
</thead>
@@ -459,22 +459,26 @@ <h2 id="distances" class="anchor">Distances</h2>
459459
<td><small style="color: #999999;">{{ "{:.3f}".format(metrics.distances.dcr_trn_hol) if metrics.distances.dcr_trn_hol is not none else "N/A" }}</small></td>
460460
{% endif %}
461461
</tr>
462-
{% if metrics.distances.dcr_share is not none %}
463-
<tr>
464-
<td>DCR Share</td>
465-
<td>{{ "{:.1%}".format(metrics.distances.dcr_share) }}</td>
466-
<td><small style="color: #666666;">{{ "{:.1%}".format(1 - metrics.distances.dcr_share) }}</small></td>
467-
<td></td>
468-
</tr>
469-
{% endif %}
470462
<tr>
471463
<td>NNDR Min10</td>
472464
<td>{{ "{:.2e}".format(metrics.distances.nndr_training) if metrics.distances.nndr_training < 0.01 else "{:.3f}".format(metrics.distances.nndr_training) }}</td>
473465
{% if metrics.distances.nndr_holdout is not none %}
474466
<td><small style="color: #666666;">{{ "{:.2e}".format(metrics.distances.nndr_holdout) if metrics.distances.nndr_holdout < 0.01 else "{:.3f}".format(metrics.distances.nndr_holdout) }}</small></td>
475-
<td></td>
467+
<td><small style="color: #999999;">{{ "{:.2e}".format(metrics.distances.nndr_trn_hol) if metrics.distances.nndr_trn_hol < 0.01 else "{:.3f}".format(metrics.distances.nndr_trn_hol) }}</small></td>
476468
{% endif %}
477469
</tr>
470+
{% if metrics.distances.dcr_share is not none %}
471+
<tr>
472+
<td>DCR Share</td>
473+
<td colspan="3" style="padding-left: 20px;"><b>{{ "{:.1%}".format(metrics.distances.dcr_share) }}</b> <small style="color: #999999;">of synthetic samples are closer to a training than to a holdout sample</small></td>
474+
</tr>
475+
{% endif %}
476+
{% if metrics.distances.nndr_holdout is not none %}
477+
<tr>
478+
<td>NNDR Ratio</td>
479+
<td colspan="3" style="padding-left: 20px;"><b>{{ "{:.3f}".format(metrics.distances.nndr_training / metrics.distances.nndr_holdout) }}</b> <small style="color: #999999;"> = (NNDR Min10 of Synthetic vs. Training) / (NNDR Min10 of Synthetic vs. Holdout)</small></td>
480+
</tr>
481+
{% endif %}
478482
</tbody>
479483
</table>
480484
</div>
@@ -496,6 +500,7 @@ <h2 id="distances" class="anchor">Distances</h2>
496500
A green line that is significantly left of the dark gray line implies that synthetic samples are closer to the training samples than to the holdout samples, indicating that the data has overfitted to the training data.
497501
A green line that overlays with the dark gray line validates that the trained model indeed represents the general rules, that can be found in training just as well as in holdout samples.
498502
The DCR share indicates the proportion of synthetic samples that are closer to a training sample than to a holdout sample, and ideally, this value should not significantly exceed 50%, as a higher value could indicate overfitting.
503+
The NNDR ratio is the ratio of the 10-th smallest NNDR for synthetic vs. training, divided by 10-th smallest NNDR for synthetic vs. holdout. Ideally, this value should be close to 1, indicating that the synthetic samples are in sparse as well as in dense regions just as close to the training samples as to the holdout samples.
499504
</div>
500505
</div>
501506
</div>

0 commit comments

Comments
 (0)