Skip to content

Conversation

ncarbon
Copy link
Collaborator

@ncarbon ncarbon commented Sep 8, 2025

Description

This pull request introduces the main UI and logic for the "Faker Schema Editor" step in the Mock Data Generator modal. It implements a new editor interface for mapping collection schema fields to faker.js functions, including validation of the generated mappings, and updates the modal's flow to handle loading and confirmation states.

schema_editor.mov
Screenshot 2025-09-08 at 11 54 21 PM Screenshot 2025-09-09 at 12 07 55 AM

Checklist

  • New tests and/or benchmarks are included
  • Documentation is changed or added
  • If this change updates the UI, screenshots/videos are added and a design review is requested
  • I have signed the MongoDB Contributor License Agreement (https://www.mongodb.com/legal/contributor-agreement)

Motivation and Context

  • Bugfix
  • New feature
  • Dependency update
  • Misc

Open Questions

Dependents

Types of changes

  • Backport Needed
  • Patch (non-breaking change which fixes an issue)
  • Minor (non-breaking change which adds functionality)
  • Major (fix or feature that would cause existing functionality to change)

@ncarbon ncarbon changed the title CLOUDP-333854: WIP feat(compass-collection): add Schema Editor UI - Mock Data Generator CLOUDP-333854 Sep 8, 2025
@github-actions github-actions bot added the feat label Sep 8, 2025
@ncarbon ncarbon added the no release notes Fix or feature not for release notes label Sep 9, 2025
@ncarbon ncarbon marked this pull request as ready for review September 9, 2025 05:10
@Copilot Copilot AI review requested due to automatic review settings September 9, 2025 05:10
@ncarbon ncarbon requested a review from a team as a code owner September 9, 2025 05:10
Copilot

This comment was marked as outdated.

@gribnoysup
Copy link
Collaborator

gribnoysup commented Sep 9, 2025

@ncarbon I just checked out the latest version of your branch, running it locally, and nothing shows up after schema analysis

Kapture 2025-09-09 at 18 13 54

Does the latest changes in this branch work for you fine? Is there any extra setup I need to do to make it work?

Comment on lines 625 to 628
const fakerMethodWithArgs = eval(
`(faker, ...fakerArgs) => faker["${first}"]["${second}"](...fakerArgs)`
);
fakerMethodWithArgs(faker, ...fakerArgs);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is passing unsanitized generated output from LLM to the eval, this is a pretty big security risk, you shouldn't do it and you don't need to eval if you're just trying to check if the keys exist on the object, you can do this via dynamic property accessors: faker[first][second]

Copy link
Collaborator

@jcobis jcobis Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. We were considering using eval to ensure that the faker args are correctly formed. I think we can take the Copilot suggestion below and first verify that fakerMethod is a valid faker function, then try to execute that method with the args (without using eval)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling the function also doesn't sound safe as this is also generated output / unvalidated user input we're passing to the call methods here, we shouldn't do that. There's no guarantees that this will not run something like faker.string.alpha(1000000000) and lock the main thread completely / crash the browser with OOM

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would adding a timeout alleviate this issue?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or do you think it's best to just rely on the function name validation and trust the args are valid

Copy link
Collaborator Author

@ncarbon ncarbon Sep 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, I think it's best to validate the faker functions by checking that the function name exists in the faker object, and do without the args (let faker use default values).

There might be ways to "sandbox" the function execution, but doesn't look straightforward or worth it if we can just check the function IS a faker function.

Copy link
Collaborator

@gribnoysup gribnoysup Sep 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jcobis timeout doesn't change anything, it would just delay the execution in the same thread, all faker functions are sync as far as I can see and can easily lead to issues when misused (like most of these that accept length values for generated output can be easily made to generate too much and exhaust the memory).

@ncarbon exactly, there are ways you can try to do this in a way that doesn't affect the thread you are on, "sandbox" it, but all of them will be adding a significant chunk of complexity that I don't think this validation is worth. If y'all think that arguments can be omitted, that's probably the easiest way to move forward here. You can probably just not validate them and lease as-is, but this has a risk of suggesting destructive scripts to the users. You can also consider adding a full proper validation to the options that actually inspects the values, but I'm not very familiar with faker so hard for me to estimate the effort here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear also, these arguments will not be subject to user input, it will only be output from the LLM

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created a thread here. To summarize: We can use .apply, and testing shows faker.js has built-in argument validation that prevents breaking operations

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rough demo draft PR I mocked up with AI, branching off Nataly’s, to demonstrate

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces the main UI and logic for the Faker Schema Editor step in the Mock Data Generator modal. It implements a new editor interface that allows users to review and modify mappings between collection schema fields and faker.js functions, including validation of faker methods and updates to handle loading and confirmation states.

  • Adds faker schema validation that verifies faker.js methods are valid and accessible
  • Creates new UI components for field selection and faker function mapping
  • Updates the modal flow to handle loading states and requires user confirmation before proceeding

Reviewed Changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
packages/compass-web/webpack.config.js Adds @faker-js/faker as a bundled dependency
packages/compass-collection/src/modules/collection-tab.ts Adds faker schema validation logic and updates type definitions
packages/compass-collection/src/components/mock-data-generator-modal/types.ts Defines FakerSchemaMapping type and updates state types
packages/compass-collection/src/components/mock-data-generator-modal/schema-field-selector.tsx New component for selecting schema fields in the editor
packages/compass-collection/src/components/mock-data-generator-modal/mock-data-generator-modal.tsx Updates modal to handle loading states and schema confirmation
packages/compass-collection/src/components/mock-data-generator-modal/mock-data-generator-modal.spec.tsx Adds tests for new functionality
packages/compass-collection/src/components/mock-data-generator-modal/faker-schema-editor.tsx Main editor component for reviewing faker mappings
packages/compass-collection/src/components/mock-data-generator-modal/faker-mapping-selector.tsx Component for displaying and editing faker function mappings
packages/compass-collection/package.json Adds @faker-js/faker dependency

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines 625 to 643
try {
// Try with arguments first
const method = (faker as any)?.[first]?.[second];
if (typeof method !== 'function') {
throw new Error(`Faker method ${fakerMethod} is not a function`);
}
method(...fakerArgs);
return field;
} catch (error) {
// If that fails and there are arguments, try without arguments
if (fakerArgs.length > 0) {
try {
const method = (faker as any)?.[first]?.[second];
if (typeof method !== 'function') {
throw new Error(`Faker method ${fakerMethod} is not a function`);
}
method();
return field;
} catch (error) {
Copy link
Preview

Copilot AI Sep 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method lookup logic (faker as any)?.[first]?.[second] is duplicated. Extract this into a helper function to reduce code duplication and improve maintainability.

Copilot uses AI. Check for mistakes.

case MockDataGeneratorStep.DOCUMENT_COUNT:
return <></>; // TODO: CLOUDP-333856
case MockDataGeneratorStep.PREVIEW_DATA:
return <></>; // TODO: CLOUDP-333857
case MockDataGeneratorStep.GENERATE_DATA:
return <ScriptScreen />;
}
}, [currentStep]);
}, [currentStep, fakerSchemaGenerationState]);
Copy link
Preview

Copilot AI Sep 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The useMemo dependency array includes the entire fakerSchemaGenerationState object. Consider extracting only the specific properties needed (like fakerSchemaGenerationState.status and fakerSchemaGenerationState.fakerSchema) to prevent unnecessary re-renders when other properties change.

Suggested change
}, [currentStep, fakerSchemaGenerationState]);
}, [
currentStep,
fakerSchemaGenerationState.status,
fakerSchemaGenerationState.fakerSchema,
]);

Copilot uses AI. Check for mistakes.

Comment on lines 43 to 44
/* eslint-disable @typescript-eslint/no-require-imports */
const faker = require('@faker-js/faker/locale/en');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The root cause of the error you are seeing is that tsc for this package is targeting commonjs, on practice this doesn't really matter, so you can continue using import instead of switching it to require and add a expect-error notation instead of disabling an eslint rule

Suggested change
/* eslint-disable @typescript-eslint/no-require-imports */
const faker = require('@faker-js/faker/locale/en');
// @ts-expect-error TypeScript warns us about importing esm when output is cjs,
// but in reality this code will be just consumed by webpack, so it doesn't
// really matter
import { faker } from '@faker-js/faker/locale/en';

The benefits of that are that you still have proper types for the package (using require just completely bails out of the module resolution and so you just get any instead) and when compass repo fully supports esm and we are not targeting commonjs anymore, compiler will highlight this code (as it won't be an error any more) and we can clean it up.

Also generally speaking, I would strongly suggest to document why rules are being broken when they are broken 🙂 It is somewhat unavoidable sometimes to do so, indeed, but leaving the context is a good practice for a maintainable codebase

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was helpful. Thank you!

return (
<div className={fieldMappingSelectorsStyles}>
<Body className={labelStyles}>Mapping</Body>
<Select
Copy link
Collaborator

@gribnoysup gribnoysup Sep 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These select boxes allow to deselect the value, and as soon as I deselect it, the whole input control disappears and I can't select anything anymore, that's probably not an expected behavior?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

value={activeJsonType}
onChange={onJsonTypeSelect}
>
{[activeJsonType].map((type) => (
Copy link
Collaborator

@gribnoysup gribnoysup Sep 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit hard to understand the scope of work from the ticket. Is that correct that there's actually no choice here? Is this work planned for later? If there's a follow-up work planned, we usually use // TODO(ticket number): description comments in the code to annotate that, it's low effort and should make reviewing for us easier

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a //TODO comment with a description. We're planning to show dropdown options in a follow up ticket - CLOUDP-344400. They will be derived from a hard-coded map of JSON types to faker methods.

case MockDataGeneratorStep.DOCUMENT_COUNT:
return <></>; // TODO: CLOUDP-333856
case MockDataGeneratorStep.PREVIEW_DATA:
return <></>; // TODO: CLOUDP-333857
case MockDataGeneratorStep.GENERATE_DATA:
return <ScriptScreen />;
}
}, [currentStep]);
}, [currentStep, fakerSchemaGenerationState]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing a dependency on isSchemaConfirmed

Suggested change
}, [currentStep, fakerSchemaGenerationState]);
}, [currentStep, fakerSchemaGenerationState, isSchemaConfirmed]);

I'm seeing these warnings in VSCode coming from ESLint:

image

just wondering are they not showing up for you?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I didn't have ESLint set up properly. Addressed.

Comment on lines 126 to 129
<Button
size={ButtonSize.Small}
className={confirmMappingsButtonStyles}
variant={ButtonVariant.Primary}
onClick={onConfirmMappings}
>
Confirm mappings
</Button>
Copy link
Collaborator

@gribnoysup gribnoysup Sep 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the faker mappings or navigating to the previous step and then going back to the form doesn't change the confirmation status, it's a bit unexpected to me: if this is confirming currently selected values, then surely changing them should reset this 🙂 So just wanted to check: is this an expected behavior for this confirm button?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I addressed and changed the confirmation state logic to the following:

  • Reset the state when navigating to the previous state.
  • Reset the state when updating any of the inputs.
  • Preserve the state when navigating to the next state.

Comment on lines 699 to 700
const [moduleName, methodName] = fakerMethod.split('.');
if (typeof (faker as any)[moduleName]?.[methodName] !== 'function') {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: probably better to make sure that llm output didn't include more parts than needed just in case

Suggested change
const [moduleName, methodName] = fakerMethod.split('.');
if (typeof (faker as any)[moduleName]?.[methodName] !== 'function') {
const [moduleName, methodName, ...rest] = fakerMethod.split('.');
if (rest.length === 0 && typeof (faker as any)[moduleName]?.[methodName] !== 'function') {

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, we can include in Nataly's followup validation ticket

@@ -19,7 +19,7 @@ type MockDataGeneratorInProgressState = {

type MockDataGeneratorCompletedState = {
status: 'completed';
fakerSchema: MockDataSchemaResponse;
fakerSchema: Array<FakerSchemaMapping>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very small nit: we prefer compact array syntax in compass codebase

Suggested change
fakerSchema: Array<FakerSchemaMapping>;
fakerSchema: FakerSchemaMapping[];

Copy link
Collaborator

@jcobis jcobis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work Nataly!

@ncarbon ncarbon merged commit 8f62a7e into main Sep 15, 2025
57 of 58 checks passed
@ncarbon ncarbon deleted the CLOUDP-333854 branch September 15, 2025 20:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat no release notes Fix or feature not for release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants