Skip to content

Conversation

@NiedielnitsevIvan
Copy link
Contributor

@NiedielnitsevIvan NiedielnitsevIvan commented Apr 17, 2025

Description

This PR is addressed at adding the functionality to import modulestore-based content to the learning-core based learning package.

Testing instructions

So far, these are only python interfaces for creating content imports, so it can only be tested in the cms shell.

  1. Create a course in the CMS and fill it with content.
  2. Create a Content Library.
  3. Go to the CMS shell.
  4. Import the main APIs (create_import and import_course_staged_content_to_library)
  5. Pass your course ID and user ID to create_import.
  6. Check that a new entry has been created in the Import table in the admin panel (admin/import_from_modulestore/import/)
  7. Check that a new entry has been created in the Staged Content table in the admin panel (/admin/content_staging/stagedcontent/), the number of created entries should be equal to the number of sections in the course.
  8. Pass all the necessary data to import_course_staged_content_to_library.
  • The list of Usage Keys to be imported can be obtained from the Staged Content records. IMPORTANT: currently, import_course_staged_content_to_library accepts ONLY Usage Keys of the chapter level.
  1. After completing the code, check that all the blocks from the specified sections have been created in the Content Library.

Deadline

ASAP

@openedx-webhooks
Copy link

openedx-webhooks commented Apr 17, 2025

Thanks for the pull request, @NiedielnitsevIvan!

This repository is currently maintained by @openedx/wg-maintenance-edx-platform.

Once you've gone through the following steps feel free to tag them in a comment and let them know that your changes are ready for engineering review.

🔘 Get product approval

If you haven't already, check this list to see if your contribution needs to go through the product review process.

  • If it does, you'll need to submit a product proposal for your contribution, and have it reviewed by the Product Working Group.
    • This process (including the steps you'll need to take) is documented here.
  • If it doesn't, simply proceed with the next step.
🔘 Provide context

To help your reviewers and other members of the community understand the purpose and larger context of your changes, feel free to add as much of the following information to the PR description as you can:

  • Dependencies

    This PR must be merged before / after / at the same time as ...

  • Blockers

    This PR is waiting for OEP-1234 to be accepted.

  • Timeline information

    This PR must be merged by XX date because ...

  • Partner information

    This is for a course on edx.org.

  • Supporting documentation
  • Relevant Open edX discussion forum threads
🔘 Get a green build

If one or more checks are failing, continue working on your changes until this is no longer the case and your build turns green.


Where can I find more information?

If you'd like to get more details on all aspects of the review process for open source pull requests (OSPRs), check out the following resources:

When can I expect my changes to be merged?

Our goal is to get community contributions seen and reviewed as efficiently as possible.

However, the amount of time that it takes to review and merge a PR can vary significantly based on factors such as:

  • The size and impact of the changes that it introduces
  • The need for product review
  • Maintenance status of the parent repository

💡 As a result it may take up to several weeks or months to complete a review and merge your PR.

@openedx-webhooks openedx-webhooks added the open-source-contribution PR author is not from Axim or 2U label Apr 17, 2025
@github-project-automation github-project-automation bot moved this to Needs Triage in Contributions Apr 17, 2025
@NiedielnitsevIvan NiedielnitsevIvan changed the title feat: Import from modulestore APIs Draft: feat: Import from modulestore APIs Apr 17, 2025
@NiedielnitsevIvan NiedielnitsevIvan marked this pull request as draft April 17, 2025 11:59
@NiedielnitsevIvan NiedielnitsevIvan force-pushed the NiedielnitsevIvan/feat/import-from-modulestore-apis branch 3 times, most recently from d06b3bc to fbf93c8 Compare April 18, 2025 16:10
@NiedielnitsevIvan NiedielnitsevIvan marked this pull request as ready for review April 18, 2025 16:11
@NiedielnitsevIvan NiedielnitsevIvan changed the title Draft: feat: Import from modulestore APIs feat: Import from modulestore APIs Apr 18, 2025
@NiedielnitsevIvan NiedielnitsevIvan force-pushed the NiedielnitsevIvan/feat/import-from-modulestore-apis branch from fbf93c8 to 2f0b019 Compare April 21, 2025 08:19
@NiedielnitsevIvan NiedielnitsevIvan force-pushed the NiedielnitsevIvan/feat/import-from-modulestore-apis branch from 2f0b019 to 4113467 Compare April 21, 2025 19:09
Copy link
Member

@kdmccormick kdmccormick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still reviewing, but here are some comments in the meantime

VERTICAL = 'vertical'
XBLOCK = 'xblock'
COMPLICATED_LEVELS = [CHAPTER, SEQUENTIAL, VERTICAL]
FLAT_LEVELS = [XBLOCK]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
FLAT_LEVELS = [XBLOCK]
COMPONENT = 'component'
UNIT = ContainerType.Unit.value
# Subsection- and Section-level composition to be added in the future as part of
# https://github.com/openedx/frontend-app-authoring/issues/1602
# SUBSECTION = ContainerType.Subsection.value
# SECTION = ContainerType.Section.value
COMPLEX_LEVELS = [
UNIT,
# SUBSECTION,
# SECTION,
]

A few suggestions here:

  • The container_types in learning core use the product names (component, unit, subsection, section) rather than the xblock tag names (..., vertical, sequential, chapter).
  • These levels are already defined in the ContainerType enum (just Unit, for now), so let's reference those names in order to keep them consistent.
  • Let's reverse the order so that, when compared using >, SECTION > SUBSECTION > UNIT > COMPONENT
  • Complex is often used to describe structures, whereas complicated usually implies "confusing".
  • Rather than include the levels that aren't supported yet, we can put them behind comments until they're ready to be used.
  • It is safe to assume that COMPONENT will always be the one and only flat level, so no need to define a FLAT_LEVELS list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I fixed the name COMPLICATED_LEVELS and removed the FLAT_LEVELS list.

But I can't agree with you about the other part, because these legacy names are tied to the functional part. They are necessary to check that the block we are importing is really vertical/sequential, etc. because this is the information stored in Staged Content OLX.
Otherwise, we need to do vertical -> unit, sequential -> subsection mapping to preserve functionality.

Also, I think that at the CompositionLevel level, we should not comment out CHAPTER and SEQUENTIAL, because it will require a lot of refactoring as a temporary thing. It seems to me more logical to comment out the corresponding mapping in ImportClient until Sections and Subsections are merged.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I can't agree with you about the other part, because these legacy names are tied to the functional part. They are necessary to check that the block we are importing is really vertical/sequential, etc. because this is the information stored in Staged Content OLX.

OK, that's a good point. However, I think the API should use the new names rather than the legacy names, so we need some way of mapping between the old and new names. I have an idea that I've proposed to Dave and Braden in #content-libraries-relaunch-dev.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, we have added logic to openedx.core.content_libraries.api.containers.ContainerType: #36580

Please have the API use container_types (unit, subsection, section) and use the ContainerType interface to translate to/from the old OLX block types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for your help with this!
Refactored to ContainerType usage.

Copy link
Member

@kdmccormick kdmccormick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(still reviewing)

try:
import_event = Import.objects.get(uuid=import_uuid, status=ImportStatus.STAGED, user_id=user_id)
except Import.DoesNotExist:
log.info('Ready import from modulestore not found')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error handling here is just hiding information from the developer/operator. If the import is missing, then the developer/operator will want to know that, and they will want to know what the uuid of the missing import is. The exception carries all of that information, but the log statement does not.

In the future we could do some more thoughtful error handling here, but we don't really have time, so please just remove the try-excepts. If the Import does not exist, then an Import.DoesNotExist exception will be raised, and that is a perfectly reasonable outcome for now.

Practically speaking, the REST API and Django Admin will each want to handle the not-found scenarios differently. The REST API probably should raise a 404/400, whereas the Django Admin should display an error banner. By allowing DoesNotExist to be raised, we make it very easy for the REST API and Django Admin to handle these situations using except Import.DoesNotExist--or, if we want to handle all not-found scenarios together, using except ObjectDoesNotExist. On the other hand, by hiding this exception and logging it, we make it impossible for the REST API or Django Admin to display a useful error message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, in the future I'll avoid the error handling where it's not needed. Thank you!

Comment on lines 80 to 84
try:
target_learning_package = LearningPackage.objects.get(id=learning_package_id)
except Import.LearningPackage:
log.info('Target learning package not found')
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here-- remove the try-except wrapper, and let the exception escape.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

Comment on lines 100 to 102
except Exception as exc: # pylint: disable=broad-except
import_event.set_status(ImportStatus.IMPORTING_FAILED)
raise exc from exc # TODO: retest raise and describe change_log commitment on Exception
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
except Exception as exc: # pylint: disable=broad-except
import_event.set_status(ImportStatus.IMPORTING_FAILED)
raise exc from exc # TODO: retest raise and describe change_log commitment on Exception
except: # pylint: disable=broad-except
import_event.set_status(ImportStatus.IMPORTING_FAILED)
raise
  • You can use bare raise to re-raise the same exception that you caught
  • raise exc from exc does not make sense-- that would be saying that "exc caused itself"
  • Can you explain TODO: retest raise and describe change_log commitment on Exception? The bulk_draft_changes_for block will ensure that all DB changes are rolled back when an exception escapes. Is that what you wanted to make sure of?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored.

Is that what you wanted to make sure of?

Yes, sorry, I didn't realize I hadn't removed this comment.
It was a comment for myself to explore the behavior of the new bulk_draft_changes_for context manager when Exception raises.

@shared_task
@set_code_owner_attribute
def import_staged_content_to_library_task(
usage_keys_string: list[str],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
usage_keys_string: list[str],
usage_key_strings: list[str],

this is a list of strings, each of which is a usage key

usage_keys_string implies a single string made of many usage keys

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

parser = etree.XMLParser(strip_cdata=False)


class ImportClient:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're close to the deadline I will not ask you to make this change, but as feedback for future work: When a class's sole purpose is to have a single method executed, for example:

                    import_client = ImportClient(
                        import_event,
                        usage_key_string,
                        target_learning_package,
                        staged_content_item,
                        composition_level,
                        override,
                    )
                    import_client.import_from_staged_content()

then the class does not need to exist; it could instead be a function:

                import_from_staged_content(
                    import_event,
                    usage_key_string,
                    target_learning_package,
                    staged_content_item,
                    composition_level,
                    override,
                )

and the class's private methods (_process_import, etc) could instead be regular private helper functions.

The function is easier for other developers to understand because it has simpler control flow and it does not contain internal state. I know that in some languages like Java, the object-oriented style would be preferred, but in Python (especially in Open edX) the procederal style is preferred.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem, I moved it to a separate function.
Thank you for the detailed explanation.

with authoring_api.bulk_draft_changes_for(learning_package_id=learning_package_id) as change_log:
try:
for usage_key_string in usage_keys_string:
if staged_content_item := import_event.get_staged_content_by_source_usage_key(usage_key_string):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, this is swallowing an exception, which is not a good thing. If the staged content is missing, then it is helpful for us to let StagedContentForImport.DoesNotExist be raised.

Suggested change
if staged_content_item := import_event.get_staged_content_by_source_usage_key(usage_key_string):
staged_content_item = import_event.staged_content_for_import.get(source_usage_key=usage_key_string):

This also means that we can delete def get_staged_content_by_source_usage_key.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_staged_content_by_source_usage_key removed

@@ -0,0 +1,43 @@
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding so many type annotations! Please add cms/djangoapps/import_from_modulestore to mypy.ini so that they will be checked when mypy is executed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added, and type annotations have been improved a bit more.

Copy link
Member

@kdmccormick kdmccormick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more comments. Still looking through helpers.py, but looking good otherwise.

Comment on lines 393 to 417
def get_block_to_import(node, usage_key):
"""
Get the block to import from a node.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make it more obvious what this returns. Before I read the definition, I thought it returned an XBlock.

Suggested change
def get_block_to_import(node, usage_key):
"""
Get the block to import from a node.
"""
def get_node_for_usage_key(node: etree._Element, usage_key: UsageKey) -> etree._Element:
"""
Get the node in an XML tree which matches to the usage key.
"""

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Get the block to import from a node.
"""

if node.get('url_name') == usage_key.block_id:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if node.get('url_name') == usage_key.block_id:
if node.tag == usage_key.block_type and node.get('url_name') == usage_key.block_id:

A block_id (a.k.a. url_name) exists within the namespace of a block_type. For example, the same course could have type@html+block@123 and type@problem+block@123. So, to properly identify a block in a course, check both the type and the ID.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, thanks.

VERTICAL = 'vertical'
XBLOCK = 'xblock'
COMPLICATED_LEVELS = [CHAPTER, SEQUENTIAL, VERTICAL]
FLAT_LEVELS = [XBLOCK]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I can't agree with you about the other part, because these legacy names are tied to the functional part. They are necessary to check that the block we are importing is really vertical/sequential, etc. because this is the information stored in Staged Content OLX.

OK, that's a good point. However, I think the API should use the new names rather than the legacy names, so we need some way of mapping between the old and new names. I have an idea that I've proposed to Dave and Braden in #content-libraries-relaunch-dev.

@NiedielnitsevIvan NiedielnitsevIvan force-pushed the NiedielnitsevIvan/feat/import-from-modulestore-apis branch from 7bd1ec5 to 852a243 Compare April 23, 2025 08:08
@mphilbrick211 mphilbrick211 moved this from Needs Triage to In Eng Review in Contributions Apr 23, 2025
Copy link
Member

@kdmccormick kdmccormick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the hard work on this.

@kdmccormick kdmccormick merged commit 3f67f3c into openedx:master Apr 23, 2025
49 checks passed
@github-project-automation github-project-automation bot moved this from In Eng Review to Done in Contributions Apr 23, 2025
@edx-pipeline-bot
Copy link
Contributor

2U Release Notice: This PR has been deployed to the edX staging environment in preparation for a release to production.

@edx-pipeline-bot
Copy link
Contributor

2U Release Notice: This PR has been deployed to the edX production environment.

1 similar comment
@edx-pipeline-bot
Copy link
Contributor

2U Release Notice: This PR has been deployed to the edX production environment.

UsamaSadiq pushed a commit that referenced this pull request May 14, 2025
This PR is addressed at adding the functionality to import
modulestore-based content to the learning-core based learning
package.

Partof: openedx/frontend-app-authoring#1681
marlonkeating pushed a commit that referenced this pull request Jul 15, 2025
This PR is addressed at adding the functionality to import
modulestore-based content to the learning-core based learning
package.

Partof: openedx/frontend-app-authoring#1681
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

open-source-contribution PR author is not from Axim or 2U

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

5 participants