Skip to content

Conversation

@mzueva
Copy link

@mzueva mzueva commented Dec 4, 2025

This PR adds cluster columns to block UI and exported tsv files

@gemini-code-assist
Copy link

Summary of Changes

Hello @mzueva, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the data browsing and export capabilities by integrating 'cluster columns' throughout the application. It introduces robust mechanisms in the data model to identify and manage these linked columns, updates the user interface to display them, and refines the data export logic to ensure they are correctly processed and uniquely labeled in TSV files. These changes provide users with more comprehensive data views and export options, particularly for complex datasets involving clustered information.

Highlights

  • Cluster Column Integration: Introduced functionality to display and export cluster-related columns in both the UI and TSV files, enhancing data visualization and export capabilities for complex datasets.
  • Model Enhancements: Added new types (LinkedColumnEntry) and helper functions (getLinkedColumns, getLinkedColumnsForArgs) to the data model to identify, retrieve, and manage linked columns, particularly those connected via 'linker columns' (e.g., clonotypeKey to clusterKey).
  • UI Updates: Modified the UI component (PerSamplePage.vue) to dynamically update and synchronize the linkedColumns argument based on the model's output, ensuring cluster columns are correctly reflected in the user interface.
  • Export Logic Refinement: Enhanced the export workflow (export-table.tpl.tengo) with new helper functions (isValidColumn, deriveLabelsForColumns, ensureUniqueLabel, processLinkedColumns) to correctly identify, process, and uniquely label linked columns for TSV export, including robust handling of potential label conflicts.
  • Label Derivation Utility: Introduced a new utility library (workflow/src/libs/utils.lib.tengo) containing deriveLabelsFromTrace and calculateLabels functions to generate unique and meaningful labels for columns based on their trace annotations, crucial for managing duplicate column names in exports.
  • Prerun Configuration: Updated the prerun workflow (prerun.tpl.tengo) to include linker and linked columns in the bundle builder and pass them to the export process, ensuring they are available for downstream operations.
  • Dependency Updates: Updated various package dependencies in pnpm-lock.yaml and pnpm-workspace.yaml to their newer versions, including core libraries like @babel/*, @milaboratories/*, @platforma-sdk/*, and vue.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for cluster columns by finding them through linker columns and displaying them in the UI and including them in exported TSV files. The changes are spread across the model, UI, and workflow. The implementation is quite complex, especially in the Tengo templates for exporting data, where logic for deriving unique column labels has been added. While the feature implementation seems correct, I've identified two areas with significant code duplication and complexity that could be refactored to improve maintainability and reduce the risk of future bugs. My feedback focuses on these refactoring opportunities in model/src/index.ts and workflow/src/export-table.tpl.tengo.

Comment on lines 117 to 281
function getLinkedColumns(
ctx: RenderCtx<BlockArgs, UiState>,
anchor: PlRef,
anchorSpec: PColumnSpec,
): { linkerColumns: PColumn<PColumnDataUniversal>[]; linkedColumns: PColumn<PColumnDataUniversal>[] } | undefined {
const linkerColumns: PColumn<PColumnDataUniversal>[] = [];
const linkedColumns: PColumn<PColumnDataUniversal>[] = [];

// Try both axis positions (clonotypeKey might be at idx 0 or idx 1 in the linker)
let linkerIndex = 0;
for (const idx of [0, 1]) {
let axesToMatch;
if (idx === 0) {
// clonotypeKey in second axis of the linker
axesToMatch = [{}, anchorSpec.axesSpec[1]];
} else {
// clonotypeKey in first axis of the linker
axesToMatch = [anchorSpec.axesSpec[1], {}];
}

// Get linker columns that connect to our anchor's clonotypeKey axis
const linkers = ctx.resultPool.getAnchoredPColumns(
{ main: anchor },
[{
axes: axesToMatch,
annotations: { 'pl7.app/isLinkerColumn': 'true' },
}],
);

if (linkers) {
linkerColumns.push(...linkers);
}

// Get linker refs to use as anchors for finding linked columns
const linkerOptions = ctx.resultPool.getOptions([{
axes: axesToMatch,
annotations: { 'pl7.app/isLinkerColumn': 'true' },
}]);

// For each linker, use it as an anchor to get columns on the other side
for (const linkerOption of linkerOptions) {
const anchorName = `linker-${linkerIndex}`;
const linkerAnchorSpec: Record<string, PlRef> = {
[anchorName]: linkerOption.ref,
};

// Get columns that have the linker's "other" axis (the one that's not clonotypeKey)
const linkedProps = ctx.resultPool.getAnchoredPColumns(
linkerAnchorSpec,
[{
axes: [{ anchor: anchorName, idx: idx }],
}],
);

if (linkedProps) {
linkedColumns.push(
...linkedProps.filter((p) => !isLabelColumn(p.spec)),
);
}

linkerIndex++;
}
}

return { linkerColumns, linkedColumns };
}

/**
* Gets linked columns data structure for workflow arguments.
* Similar to getLinkedColumns but returns the format needed for linkedColumns argument.
*
* @param ctx - The render context
* @param anchor - The anchor PlRef (e.g., input anchor with clonotypeKey axis)
* @param anchorSpec - The specification of the anchor column
* @returns Record of linked column entries keyed by anchor name, or undefined if not available
*/
function getLinkedColumnsForArgs(
ctx: RenderCtx<BlockArgs, UiState>,
anchor: PlRef,
anchorSpec: PColumnSpec,
): Record<string, LinkedColumnEntry> | undefined {
const result: Record<string, LinkedColumnEntry> = {};

// Try both axis positions (clonotypeKey might be at idx 0 or idx 1 in the linker)
let i = 0;
for (const idx of [0, 1]) {
let axesToMatch;
if (idx === 0) {
// clonotypeKey in second axis of the linker
axesToMatch = [{}, anchorSpec.axesSpec[1]];
} else {
// clonotypeKey in first axis of the linker
axesToMatch = [anchorSpec.axesSpec[1], {}];
}

// Get linker refs to use as anchors for finding linked columns
const linkerOptions = ctx.resultPool.getOptions([{
axes: axesToMatch,
annotations: { 'pl7.app/isLinkerColumn': 'true' },
}]);

// For each linker, use it as an anchor to get columns on the other side
for (const linkerOption of linkerOptions) {
const anchorName = `linker-${i}`;
const linkerAnchorSpec: Record<string, PlRef> = {
[anchorName]: linkerOption.ref,
};

// Get columns that have the linker's "other" axis (the one that's not clonotypeKey)
const linkedProps = ctx.resultPool.getAnchoredPColumns(
linkerAnchorSpec,
[{
axes: [{ anchor: anchorName, idx }],
}],
);

if (linkedProps && linkedProps.length > 0) {
// Filter out label columns and serialize queries
const columns = linkedProps
.filter((p) => !isLabelColumn(p.spec))
.map((p) => {
// Create AnchoredPColumnSelector query for this column
// We need to match by the column's spec properties
const query: AnchoredPColumnSelector = {
axes: [{ anchor: anchorName, idx }],
};

// Add domain if present
if (p.spec.domain && Object.keys(p.spec.domain).length > 0) {
// Convert domain to string values only (no anchor refs) for serialization
const domain: Record<string, string> = {};
for (const [key, value] of Object.entries(p.spec.domain)) {
if (typeof value === 'string') {
domain[key] = value;
}
}
if (Object.keys(domain).length > 0) {
query.domain = domain;
}
}

// Add name if it's not a generic column
if (p.spec.name) {
query.name = p.spec.name;
}

// Serialize to JSON string
return JSON.stringify(query);
});

if (columns.length > 0) {
result[anchorName] = {
anchorName,
anchorRef: linkerOption.ref,
columns,
};
}
}

i++;
}
}

return Object.keys(result).length > 0 ? result : undefined;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The functions getLinkedColumns and getLinkedColumnsForArgs share a significant amount of duplicated logic for finding linker columns. Specifically, the loops iterating over idx, calculating axesToMatch, and getting linkerOptions are nearly identical.

This duplication makes the code harder to maintain. If the logic for discovering linkers needs to change, it must be updated in two places, increasing the risk of inconsistencies.

Consider refactoring the common logic into a helper function. This function could iterate through linkers and invoke a callback for each, allowing getLinkedColumns and getLinkedColumnsForArgs to focus on their specific processing tasks.

For example, you could introduce a helper function like this:

function forEachLinker(
  ctx: RenderCtx<BlockArgs, UiState>,
  anchorSpec: PColumnSpec,
  callback: (linkerOption: PColumnOption, idx: 0 | 1, linkerIndex: number) => void
) {
  let linkerIndex = 0;
  for (const idx of [0, 1] as const) {
    const axesToMatch = idx === 0
      ? [{}, anchorSpec.axesSpec[1]]
      : [anchorSpec.axesSpec[1], {}];

    const linkerOptions = ctx.resultPool.getOptions([{
      axes: axesToMatch,
      annotations: { 'pl7.app/isLinkerColumn': 'true' },
    }]);

    for (const linkerOption of linkerOptions) {
      callback(linkerOption, idx, linkerIndex);
      linkerIndex++;
    }
  }
}

This would simplify both getLinkedColumns and getLinkedColumnsForArgs, making the code more modular and easier to understand.

Comment on lines +33 to +334
// Helper function to derive unique labels for columns
// Returns a map from column identifier/index to derived label
deriveLabelsForColumns := func(columnIdToColumnMap) {
// Group columns by base label to identify duplicates
columnsByLabel := {}
for columnId, col in columnIdToColumnMap {
baseLabel := col.spec.annotations["pl7.app/label"]
if is_undefined(columnsByLabel[baseLabel]) {
columnsByLabel[baseLabel] = []
}
columnsByLabel[baseLabel] = append(columnsByLabel[baseLabel], {columnId: columnId, column: col})
}

// Derive unique labels for duplicate columns using trace-based algorithm
derivedLabels := {}
columnsNeedingDerivation := []
columnIdToDerivedIndexMap := {} // Map columnId to index in columnsNeedingDerivation array
derivedIndexOffset := 0 // Track the offset for indices in columnsNeedingDerivation

for baseLabel, colEntries in columnsByLabel {
if len(colEntries) > 1 {
// Multiple columns with same label - use trace-based derivation
colsForDerivation := []
for entry in colEntries {
index := len(colsForDerivation)
colsForDerivation = append(colsForDerivation, entry.column)
columnIdToDerivedIndexMap[entry.columnId] = derivedIndexOffset + index
}
columnsNeedingDerivation += colsForDerivation
derivedIndexOffset += len(colsForDerivation)
} else {
// Single column - use base label
derivedLabels[colEntries[0].columnId] = baseLabel
}
}

// Derive labels for duplicate columns
if len(columnsNeedingDerivation) > 0 {
traceDerivedLabels := utils.deriveLabelsFromTrace(columnsNeedingDerivation)
// Map the derived labels back to column identifiers
for columnId, derivedIndex in columnIdToDerivedIndexMap {
derivedLabel := traceDerivedLabels[derivedIndex]
if !is_undefined(derivedLabel) {
derivedLabels[columnId] = derivedLabel
}
}
}

// Ensure all columns have labels (fallback to base label if missing)
for columnId, col in columnIdToColumnMap {
if is_undefined(derivedLabels[columnId]) {
derivedLabels[columnId] = col.spec.annotations["pl7.app/label"]
}
}

// Final check: ensure all derived labels are globally unique
// This handles cases where columns with different base labels get the same trace-derived label
labelCounts := {}
labelOccurrences := {}
finalDerivedLabels := {}

// Count occurrences of each derived label
for columnId, label in derivedLabels {
if is_undefined(labelCounts[label]) {
labelCounts[label] = 0
}
labelCounts[label] += 1
}

// Add suffixes for duplicates
for columnId, label in derivedLabels {
if labelCounts[label] > 1 {
// This label appears multiple times, add suffix
if is_undefined(labelOccurrences[label]) {
labelOccurrences[label] = 0
}
labelOccurrences[label] += 1
if labelOccurrences[label] > 1 {
// Add numeric suffix starting from (1)
finalDerivedLabels[columnId] = label + " (" + string(labelOccurrences[label] - 1) + ")"
} else {
// First occurrence keeps original label
finalDerivedLabels[columnId] = label
}
} else {
// Unique label, use as-is
finalDerivedLabels[columnId] = label
}
}

return finalDerivedLabels
}

// Helper function to ensure label uniqueness against used labels
// Returns the unique label (with suffix if needed) and updates usedLabelsSet
ensureUniqueLabel := func(label, usedLabelsSet) {
if is_undefined(label) {
return label
}
originalLabel := label
suffixCount := 0
for {
if is_undefined(usedLabelsSet[label]) {
break
}
suffixCount += 1
label = originalLabel + " (" + string(suffixCount) + ")"
}
usedLabelsSet[label] = true // Mark this label as used
return label
}

// Process linked columns and return frames and updated byCloneSpecs
processLinkedColumns := func(columnBundle, datasetSpec, linkedColumns, cloneKeyAxisHeader, bigTableWf, byCloneSpecs, usedLabels) {
linkedFrames := []
updatedByCloneSpecs := byCloneSpecs

// Collect all linked columns from all linkers
// Use column identifier from linkedColumns (the key in data.columns) as unique key
columnIdMap := {} // Map column identifier to column object

for linker, data in linkedColumns {
for columnId, _ in data.columns {
col := columnBundle.getColumn(columnId)
if is_undefined(col) {
continue
}
if !isValidColumn(col) {
continue
}
columnIdMap[columnId] = col
}
}

// Track labels already used by columnsPerClonotype to avoid conflicts
usedLabelsSet := {}
if !is_undefined(usedLabels) {
for label in usedLabels {
usedLabelsSet[label] = true
}
}

// Derive unique labels for all linked columns
derivedLabels := deriveLabelsForColumns(columnIdMap)

// Ensure all derived labels are unique by adding suffixes if needed
// This handles cases where trace-based algorithm produces same label for columns with different base labels
// Also check against labels already used by columnsPerClonotype
labelCounts := {}
labelOccurrences := {}
finalDerivedLabels := {}

// Count occurrences of each derived label (skip undefined labels)
for colName, label in derivedLabels {
if is_undefined(label) {
continue
}
if is_undefined(labelCounts[label]) {
labelCounts[label] = 0
}
labelCounts[label] += 1
}

// Add suffixes for duplicates, also checking against usedLabelsSet
for colName, label in derivedLabels {
if is_undefined(label) {
// Skip undefined labels - they will use base label as fallback
continue
}
needsSuffix := false
if labelCounts[label] > 1 {
needsSuffix = true
} else if !is_undefined(usedLabelsSet[label]) {
// Label conflicts with already used label from columnsPerClonotype
needsSuffix = true
}

if needsSuffix {
// This label appears multiple times or conflicts, add suffix
if is_undefined(labelOccurrences[label]) {
labelOccurrences[label] = 0
}
labelOccurrences[label] += 1
if labelOccurrences[label] > 1 {
// Add numeric suffix starting from (1)
finalDerivedLabels[colName] = label + " (" + string(labelOccurrences[label] - 1) + ")"
} else {
// First occurrence keeps original label if it doesn't conflict
if is_undefined(usedLabelsSet[label]) {
finalDerivedLabels[colName] = label
} else {
// Conflict with used label, add suffix
finalDerivedLabels[colName] = label + " (" + string(labelOccurrences[label]) + ")"
labelOccurrences[label] += 1
}
}
} else {
// Unique label, use as-is
finalDerivedLabels[colName] = label
}
}

derivedLabels = finalDerivedLabels

for linker, data in linkedColumns {
// Collect and filter linked columns for this linker
linkedColsForLinker := []
for name, _ in data.columns {
col := columnBundle.getColumn(name)
if is_undefined(col) {
continue
}
if !isValidColumn(col) {
continue
}
linkedColsForLinker = append(linkedColsForLinker, col)
}

// Sort linked columns by orderPriority (similar to columnsPerSample)
slices.quickSortInPlaceFn(linkedColsForLinker, func(a, b) {
if is_undefined(a.spec.annotations["pl7.app/table/orderPriority"]) {
return false
}
if is_undefined(b.spec.annotations["pl7.app/table/orderPriority"]) {
return true
}
return int(a.spec.annotations["pl7.app/table/orderPriority"]) > int(b.spec.annotations["pl7.app/table/orderPriority"])
})

// Skip if no valid linked columns after filtering
if len(linkedColsForLinker) == 0 {
continue
}

linkedTsvBuilder := pframes.tsvFileBuilder().cpu(1).mem("4GiB")
linkedTsvBuilder.setAxisHeader(datasetSpec.axesSpec[1], cloneKeyAxisHeader)

// Define cluster axis name for this linker
clusterAxisName := "cluster_" + linker

// Get the linker column using its anchorRef
if !is_undefined(data.anchorRef) {
linkerCol := columnBundle.getColumn(data.anchorRef)
if !is_undefined(linkerCol) {
// Add linker column (like in antibody-tcr-lead-selection lines 213-226)
if datasetSpec.axesSpec[1].name == linkerCol.spec.axesSpec[1].name {
linkedTsvBuilder.add(linkerCol, {header: "linker." + linker})
linkedTsvBuilder.setAxisHeader(linkerCol.spec.axesSpec[0], clusterAxisName)
} else if datasetSpec.axesSpec[1].name == linkerCol.spec.axesSpec[0].name {
linkedTsvBuilder.add(linkerCol, {header: "linker." + linker})
linkedTsvBuilder.setAxisHeader(linkerCol.spec.axesSpec[1], clusterAxisName)
}
}
}

// Add filtered and sorted linked columns with unique headers
for name, _ in data.columns {
col := columnBundle.getColumn(name)
if is_undefined(col) {
continue
}
if !isValidColumn(col) {
continue
}

// Get derived label (from trace if duplicate, otherwise base label)
// Use column identifier (name from data.columns) as key
uniqueHeader := derivedLabels[name]

// If no derived label (shouldn't happen), fallback to base label
if is_undefined(uniqueHeader) {
uniqueHeader = col.spec.annotations["pl7.app/label"]
}

// Final check for uniqueness against all used labels (including columnsPerClonotype)
// Only add suffix if there's an actual conflict
uniqueHeader = ensureUniqueLabel(uniqueHeader, usedLabelsSet)

// Set axis headers for any axes that are not the main clonotypeKey axis
for axisIdx, axis in col.spec.axesSpec {
if axis.name != datasetSpec.axesSpec[1].name {
axisHeaderName := clusterAxisName + "_" + string(axisIdx)
linkedTsvBuilder.setAxisHeader(axis, axisHeaderName)
}
}

linkedTsvBuilder.add(col, {header: uniqueHeader})
// Update the spec's label to match the header we used
updatedSpec := maps.clone(col.spec)
updatedSpec.annotations["pl7.app/label"] = uniqueHeader
updatedByCloneSpecs += [updatedSpec]
}

linkedTsv := linkedTsvBuilder.build()
linkedFrames = append(linkedFrames, bigTableWf.frame(linkedTsv, { xsvType: "tsv", inferSchema: false }))
}

return {
frames: linkedFrames,
byCloneSpecs: updatedByCloneSpecs
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is significant code duplication and complexity in the logic for ensuring unique column labels, spread across deriveLabelsForColumns and processLinkedColumns.

  • The deriveLabelsForColumns function includes a "Final check" for global uniqueness, which is then partially repeated inside processLinkedColumns.
  • processLinkedColumns performs another uniqueness check that also considers labels from columnsPerClonotype (usedLabelsSet).

This makes the logic hard to follow and maintain. It's also prone to errors, as changes might not be applied consistently in all places.

I suggest refactoring this into a more modular approach:

  1. Simplify deriveLabelsForColumns to only derive labels based on trace data, without handling global uniqueness.
  2. Create a single, reusable helper function, say ensureGlobalUniqueness(derivedLabels, usedLabelsSet), that takes the derived labels and a set of already used labels, and returns a map of columnId -> uniqueLabel. This function would contain all the logic for counting occurrences and adding suffixes.

This would centralize the uniqueness logic, reduce redundancy, and make the overall workflow easier to understand and maintain.

Comment on lines +26 to +41
// @ts-expect-error - linkedColumns will be available after model types are regenerated
app.model.args.linkedColumns = (app.model.outputs as Record<string, unknown>).linkedColumns ?? {};
} else {
app.model.args.datasetTitle = undefined;
app.model.args.linkedColumns = {};
}
}
// Keep linkedColumns in sync with output when it changes
watch(
() => (app.model.outputs as Record<string, unknown>).linkedColumns,
(linkedColumns) => {
if (app.model.args.inputAnchor) {
// @ts-expect-error - linkedColumns will be available after model types are regenerated
app.model.args.linkedColumns = linkedColumns ?? {};
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The use of @ts-expect-error and casting to Record<string, unknown> indicates that the model types are out of sync with the implementation. While the comments explain the situation, this creates technical debt and reduces type safety.

It would be best to regenerate the model types so that linkedColumns is properly typed on app.model.outputs and app.model.args. This would remove the need for type casting and error suppression, making the code safer and easier to maintain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants