Skip to content

Improve root README and 2d classification documentation #2005

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

mingxin-zheng
Copy link
Contributor

@mingxin-zheng mingxin-zheng commented Jun 26, 2025

Fixes #2001 .

Description

I applied more "aggressive" prompts to request Cursor to improve the markdown porition of some Jupyter notebooks and the README.

Please review these jupyter notebook markdown blocks, code comments and markdown tutorials carefully. Improve the notebook documentation on these four aspects: 
- Typo Fixes 
- Language Refinement 
- Consistency Review 
- Beginner-Friendliness

Checks

  • Avoid including large-size files in the PR.
  • Clean up long text outputs from code cells in the notebook.
  • For security purposes, please check the contents and remove any sensitive info such as user names and private key.
  • Ensure (1) hyperlinks and markdown anchors are working (2) use relative paths for tutorial repo files (3) put figure and graphs in the ./figure folder
  • Notebook runs automatically ./runner.sh -t <path to .ipynb file>

Signed-off-by: Mingxin Zheng <mingxinz@nvidia.com>
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@mingxin-zheng mingxin-zheng marked this pull request as draft June 26, 2025 07:43
@mingxin-zheng mingxin-zheng marked this pull request as ready for review June 27, 2025 08:05
@mingxin-zheng mingxin-zheng requested review from KumoLiu and Copilot June 27, 2025 08:07
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances the documentation and markdown content in the MONAI Tutorials repository to improve clarity, structure, and beginner-friendliness. It updates the README with a more detailed Quick Start Guide and reorganizes the tutorial sections, while also refining explanations and headings in the 2D classification notebooks.

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
README.md Improved documentation structure, refined instructions, and added sections for beginner and advanced users.
2d_classification/monai_201.ipynb Revised headings and explanatory text to better describe advanced training techniques.
2d_classification/monai_101.ipynb Updated introduction and detailed breakdown of tutorial steps and requirements.
2d_classification/mednist_tutorial.ipynb Enhanced overview and dataset explanation with additional context and clarifications.

@mingxin-zheng mingxin-zheng requested a review from ericspod June 27, 2025 08:07
@mingxin-zheng
Copy link
Contributor Author

This PR introduces some more experimental use of a coding agent tool to improve the documentation quality. The changes suggested by the LLM model look good to me so far. I’d like to open this up for discussion — are we comfortable merging these kinds of changes and exploring this direction further, or would we prefer to hold off for now?

Looking forward to your thoughts! @ericspod @KumoLiu

Copy link
Member

@ericspod ericspod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's overall good additions to understanding in the notebook. I have questions about the verbosity and voice but otherwise it does improve things with more explanation of what's going on in notebooks.

@@ -17,14 +17,27 @@
"\n",
"# Medical Image Classification Tutorial with the MedNIST Dataset\n",
"\n",
"In this tutorial, we introduce an end-to-end training and evaluation example based on the MedNIST dataset.\n",
"This comprehensive tutorial demonstrates how to build a complete medical image classification system using MONAI and the MedNIST dataset. You'll learn to integrate MONAI's powerful features into PyTorch workflows for medical AI applications.\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"This comprehensive tutorial demonstrates how to build a complete medical image classification system using MONAI and the MedNIST dataset. You'll learn to integrate MONAI's powerful features into PyTorch workflows for medical AI applications.\n",
"This tutorial demonstrates how to build a complete medical image classification system using MONAI and the MedNIST dataset.\n",

I feel that generated text like this often gets too wordy or overclaims on occasion. I think if we have a workflow of seeing what's generated and then paring it down a bit would work though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point — I agree that a bit of guardrailing would help. We could add additional prompts to steer the generation style more tightly. For example, we could use a system prompt for OpenAI API endpoint usage, or define stricter rules for coding agents (like these cursor rules).

"## Read image filenames from the dataset folders\n",
"## Explore the Dataset Structure\n",
"\n",
"Let's examine our MedNIST dataset to understand its organization and characteristics. This exploration step is crucial for understanding the data before training.\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a stylistic choice of voice when describing what's being done. One way is to be neutral and not referring to personal perspectives, eg. no "us" or "you" when describing actions or observations. This could read instead "Here the dataset is explored...." to not have any 2nd or 3rd person voices used. It's a question of what we want to do and prompting the network to adhere to that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this particular point, we could add some guidance to CONTRIBUTING.md and use an agent to review PRs for stylistic consistency. We can also instruct the coding agent to follow these contributing guidelines directly. Either approach — or both — should help keep the voice aligned.

mingxin-zheng and others added 5 commits July 4, 2025 13:30
Co-authored-by: Eric Kerfoot <17726042+ericspod@users.noreply.github.com>
Signed-off-by: Mingxin Zheng <mingxinz@nvidia.com>
Co-authored-by: Eric Kerfoot <17726042+ericspod@users.noreply.github.com>
Signed-off-by: Mingxin Zheng <mingxinz@nvidia.com>
Co-authored-by: Eric Kerfoot <17726042+ericspod@users.noreply.github.com>
Signed-off-by: Mingxin Zheng <mingxinz@nvidia.com>
Co-authored-by: Eric Kerfoot <17726042+ericspod@users.noreply.github.com>
Signed-off-by: Mingxin Zheng <mingxinz@nvidia.com>
Co-authored-by: Eric Kerfoot <17726042+ericspod@users.noreply.github.com>
Signed-off-by: Mingxin Zheng <mingxinz@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use of LLM-Driven Tools to Fix Documentation Errors
2 participants