Skip to content

Add Publish Side Effects #307

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

ormsbee
Copy link
Contributor

@ormsbee ormsbee commented Apr 18, 2025

Create the PublishSideEffect model as the publishing analog for
the DraftSideEffect model, and create these side effects for
parent containers when one of the children is published. This
allows us to efficiently query when any subtree of content was
affected by a publish.

@bradenmacdonald
Copy link
Contributor

bradenmacdonald commented Apr 18, 2025

What is the use case for this?

We did have one related bug:

  • Edit a Component in a Unit:

    • Component + Unit show "unpublished changes" ✔️
  • Publish the Component only.

    • Component no longer shows "unpublished changes" ✔️
    • Unit no longer shows "unpublished changes" ❌ (in the search index)

Which was fixed with:

https://github.com/openedx/edx-platform/blob/3a9b4367e60ebbefdc2813b5587947abd02f01c0/openedx/core/djangoapps/content_libraries/api/blocks.py#L292-L301

So I guess having access to the list of all parent containers affected when publishing a component would let us rewrite this, and better handle multiple layers of containers, but it's still all just a wrapper around the same get_containers_with_entity code. Though with the changes in this PR, is it "more correct" because it's searching for published containers affected by the child's publish?

Let me see:

  1. You have a draft container and you add component A to it. The container has never been published.
  2. You publish component A. This should not include the container in the PublishLog, because the published version of the container doesn't even exist and definitely isn't affected. On the platform side, we don't need to update the search index entry for the container - it still has unpublished changes (to itself) and one child.

yeah OK - that makes sense to me.

@bradenmacdonald
Copy link
Contributor

I'm wondering if we could we wire up our platform events like LIBRARY_CONTAINER_UPDATED and LIBRARY_BLOCK_UPDATED directly to become side effects of the draft/publish actions, so we can guarantee those are always called, no matter at which level the APIs are used? Though we try to be smart about when those events are synchronous vs. asynchronous (e.g. publishing a unit will synchronously update its children because they're usually all visible in the UI at the time and we need to refresh the UI data as soon as the publish is completed, but when publishing a component we asynchronously update all the parents because you're usually not viewing them at the same time and a bit of a delay in updating their publish status is fine).

@ormsbee
Copy link
Contributor Author

ormsbee commented Apr 19, 2025

I suspect that we'll want to emit signals at the Learning Core layer that more or less correspond to the entire DraftPublishLog and PublishLog creation, along with side effects, containers, etc.--so one synchronous signal with everything changed or published, and let the listeners extract what they find interesting.

@ormsbee ormsbee marked this pull request as ready for review April 29, 2025 15:49
@ormsbee ormsbee force-pushed the publish-side-effects-2 branch 2 times, most recently from 63970c2 to a5076e5 Compare April 30, 2025 17:33
@ormsbee
Copy link
Contributor Author

ormsbee commented Apr 30, 2025

Note to self: Need to rebase this also to take into account the existing 0008 migration.

@ormsbee ormsbee force-pushed the publish-side-effects-2 branch 3 times, most recently from 8cfe370 to a493077 Compare May 12, 2025 16:13
Create the PublishSideEffect model as the publishing analog for
the DraftSideEffect model, and create these side effects for
parent containers when one of the children is published. This
allows us to efficiently query when any subtree of content was
affected by a publish.
@ormsbee ormsbee force-pushed the publish-side-effects-2 branch from a493077 to f504a6a Compare May 12, 2025 16:19
@ormsbee
Copy link
Contributor Author

ormsbee commented May 12, 2025

Ready for review.

@kdmccormick kdmccormick self-requested a review May 13, 2025 18:35
publishable_entity__draft__version__containerversion__entity_list__entitylistrow__entity_id=publishable_entity_pk, # pylint: disable=line-too-long # noqa: E501
publishable_entity__draft__version__containerversion__entity_list__entitylistrow__entity_version_id=None, # pylint: disable=line-too-long # noqa: E501
)
# TODO: Do we need to run distinct() on this? Will fix in https://github.com/openedx/openedx-learning/issues/303
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are some rebase issues here - the comment above was removed in https://github.com/openedx/openedx-learning/pull/304/files (and I do see .distinct() below), and same with the large comment section "Could alternately do this query...." which I also removed.

Comment on lines +624 to +631
A container is considered to have been "published" if any of its descendents
were published. So publishing a Component implicitly publishes any Units it
was in, which publishes the Subsections those Units were in, etc. If the
Container is not already in the PublishLog (i.e. its own metadata did not
change), then we create a PublishLogRecord where the old_version is equal
to the new_version. We also create a PublishSideEffect for every child
change that affects a container, regardless of whether that container also
changed.
Copy link
Contributor

@bradenmacdonald bradenmacdonald May 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment needs to be clarified, because I am confused by it.

A container is considered to have been "published" if any of its descendents were published.

Publishing a component does not automatically nor necessarily "publish" any containers that use it, and we even have test cases that verify that - which still seem to be passing after this change. So I think either this is wrong, or I misunderstood something here.

publishing a Component implicitly publishes any Units itwas in, which publishes the Subsections those Units were in

Again, I don't think that's the case? But I would say "When a component is published, we may need to perform post-publish actions on containers where that component is a descendant. For example, publishing the last unpublished component in a Unit may make both the unit and its parent section no longer "contain unpublished changes", and that may require an update to the metadata about those containers in the search index used by Studio."

If the Container is not already in the PublishLog (i.e. its own metadata did not change), then we create a PublishLogRecord where the old_version is equal to the new_version. We also create a PublishSideEffect for every child change that affects a container, regardless of whether that container also changed.

Hmm, I would expect the opposite: when a container is not in the PublishLog (neither it nor any ancestor was directly published) but it has a descendant that was published, it will be added as a PublishSideEffect, because the publish could affect it but it is not itself changed, but it will not appear in the PublishLog. If the container or an ancestor was changed as part of this publish, it will be included directly in the PublishLog. I would never expect any container to appear in both the direct PublishLog and the PublishSideEffect list. (misunderstanding - see below)


I'm also wondering if you could open a related PR to show how https://github.com/openedx/edx-platform/blob/ba47f1fae5094d9608d7d2299f778afe023533e1/openedx/core/djangoapps/content_libraries/tasks.py#L99-L104 can be updated once this PR is merged?

Copy link
Member

@kdmccormick kdmccormick May 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I don't think that's the case? But I would say "When a component is published, we may need to perform post-publish actions on containers where that component is a descendant. For example, publishing the last unpublished component in a Unit may make both the unit and its parent section no longer "contain unpublished changes", and that may require an update to the metadata about those containers in the search index used by Studio."

+1, I have the same question as @bradenmacdonald .

Hmm, I would expect the opposite: when a container is not in the PublishLog (neither it nor any ancestor was directly published) but it has a descendant that was published, it will be added as a PublishSideEffect, because the publish could affect it but it is not itself changed, but it will not appear in the PublishLog. ... I would never expect any container to appear in both the direct PublishLog and the PublishSideEffect list.

This confuses me @bradenmacdonald , because PublishSideEffect joins together two PublishLogRecord rows (cause and effect). So, wouldn't the container necessarily have to have a PublishLogRecord so it could appear as the effect of the PublishSideEffect?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(@ormsbee For the never-published container K and its child C, to capture the PublishSideEffect of C -> K, would we need effect to be a new PublishLogRecord with entity=K, old_version==null and new_version==null?)

Copy link
Contributor

@bradenmacdonald bradenmacdonald May 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah OK, I definitely misunderstood that part. Please ignore that part of my comments. I was mostly reading the description and got confused by assuming that PublishSideEffect just listed any container implicitly affected by a publish, in a way completely analogous to PublishLogEntry.

Now that I (hopefully) understand it, I'm wondering if we [will] actually have a use case for tracking which children caused each particular parent to be included in the PublishLog that otherwise wouldn't be? i.e. when would we read PublishSideEffect? But either way, the comment about how it works at least makes sense to me.

Copy link
Member

@kdmccormick kdmccormick May 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A container is considered to have been "published" if any of its descendents were published. So publishing a Component implicitly publishes any Units it was in, which publishes the Subsections those Units were in, etc.

I'm still not sure about this. I feel like we're overloading the term "published" to mean two different things:

  1. I am at draft version N. My published version is N-M (i.e., M edits have occurred since last publish). Either me or my ancestor is published by the user, so I get a PublishLogRecord for (N-M)->N, and my published version becomes N.
  2. I am at draft version N. My published version is N-M. My descendent is published, as a side effect of that, I get a PublishLogRecord for (N-M)->(N-M). My published version remains at N-M.

IMO, scenario (1) makes sense to call "publishing". Scenario (2) does not. I don't know what to call it exactly, but does that distinction make sense @bradenmacdonald @ormsbee ?

Copy link
Contributor Author

@ormsbee ormsbee May 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kdmccormick:

(@ormsbee For the never-published container K and its child C, to capture the PublishSideEffect of C -> K, would we need effect to be a new PublishLogRecord with entity=K, old_version==null and new_version==null?)

No, because if a container K has never been published, then there is no side-effect to the published version of the container from the child being published. The container doesn't exist in a published state at all.

It also works the other way around. Say the published version of K has C as a child. But the draft version of K is updated to no longer have C as a child. Then someone edits and publishes just C. That still has a publish side-effect on K because the published version of K still references C, and so publishing C still changes the published state of K.

Edit: We touched on this briefly before, but this is one of the reasons why I had a hard time modeling side-effects as minor point versions of containers--because the relationships we have make it so that it's possible to get a published state of a container + child versions that has never existed as a draft.

Copy link
Contributor Author

@ormsbee ormsbee May 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A container is considered to have been "published" if any of its descendents were published. So publishing a Component implicitly publishes any Units it was in, which publishes the Subsections those Units were in, etc.

I'm still not sure about this. I feel like we're overloading the term "published" to mean two different things:

  1. I am at draft version N. My published version is N-M (i.e., M edits have occurred since last publish). Either me or my ancestor is published by the user, so I get a PublishLogRecord for (N-M)->N, and my published version becomes N.
  2. I am at draft version N. My published version is N-M. My descendent is published, as a side effect of that, I get a PublishLogRecord for (N-M)->(N-M). My published version remains at N-M.

IMO, scenario (1) makes sense to call "publishing". Scenario (2) does not. I don't know what to call it exactly, but does that distinction make sense @bradenmacdonald @ormsbee ?

Right, I get the distinction from the code point of view, but I'm not sure it matters much from the end-user point of view. When I see a "Unit last published at" timestamp as an end user, I would expect it to mean, "the published state of the Unit last changed at this time". Especially if a common workflow is going to be "search for that component that you know is in the Unit and edit/publish that". I think it would be strange for the "last published" date to never change on a Unit because we're only publishing the Components that live inside of it. Likewise, if I've recently published the Components in a Unit, I'd expect that Unit to show up as a recently published thing--just like I'd expect it to show up as a recently modified thing if one of its Components was edited.

Edit: Also, it's going to be very common that for a Unit, there will be no change at the Unit metadata level itself, and that even "publish this entire Unit" is just going to publish the one Component in it that changed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bradenmacdonald:

Now that I (hopefully) understand it, I'm wondering if we [will] actually have a use case for tracking which children caused each particular parent to be included in the PublishLog that otherwise wouldn't be? i.e. when would we read PublishSideEffect? But either way, the comment about how it works at least makes sense to me.

I think it makes it useful for a straightforward historical view of any particular thing (e.g. a Section), to see when its draft or published states changed because of its children, and which children. It's possible to reconstruct this in other ways from various tables, but more cumbersome and expensive to do so, particularly if you're listing a bunch of changes over time.

I also think we're going to want side-effect notations for things that are not parent-child relations, like how a course's published state is affected by things that are not strictly in the CourseBlock's inheritance hierarchy.

Copy link
Contributor Author

@ormsbee ormsbee May 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update to this, per this Slack thread, @jmakowski1123 confirms that a container is considered to have been "last published" at whatever time that container or any of its children/descendants are published. Copying the exact text of what was agreed to for context in case anyone is looking at this a few years down the line:

Product clarification request...

Setup: I have a Unit U1 with Components C1 and C2 in it. All changes were published at 9:00. Then I make a change to C2. That marks both C2 and U1 as having unpublished changes. Then I go and publish C2 at 10:00. Now neither U1 nor C2 show the "unpublished changes" status.

Question: Is the "last published" time for U1 9:00 or 10:00?

My current understanding is that it should be 10:00. Just like when we say, "the course was most recently published at X time", we mean that, "the published state of the course changed at X time because some part of it was published", not "everything in the course was published at X time". Similarly, my understanding is that a Unit would be considered "last modified" whenever one of its child Components was modified.

I'll try to rewrite the docstring to be clearer on how this works and why.

@bradenmacdonald
Copy link
Contributor

BTW, I had previously speculated that this information could be used to optimize get_components_in_published_unit_as_of:

TODO: optimize, perhaps by having the publishlog store a record of all
ancestors of every modified PublishableEntity in the publish.

It's definitely not super important (TBD if we'll even use that function), but I would be interested to know if you see a way to do that.

@ormsbee
Copy link
Contributor Author

ormsbee commented May 14, 2025

BTW, I had previously speculated that this information could be used to optimize get_components_in_published_unit_as_of:

TODO: optimize, perhaps by having the publishlog store a record of all
ancestors of every modified PublishableEntity in the publish.

It's definitely not super important (TBD if we'll even use that function), but I would be interested to know if you see a way to do that.

I can't think of a straightforward way to apply this work to make that function better. I'll chew on it for a bit.

@ormsbee
Copy link
Contributor Author

ormsbee commented May 15, 2025

BTW, one of the things I wish we had in the PublishLogRecord and DraftLogRecord is a cheap way to summarize the full state of a PublishableEntity, inclusive of everything that could cause a side-effect, so that we can have that as a hash that we can easily compare for questions like "does this have any unpublished changes in it". Ideally something that can be composed entirely as a product of the version_nums/uuids + hashes of the things making the side-effects.

I can think of ways to do it, but I don't know if I like it... For instance, because we only care about the state changes within a version, we don't have to worry about ordering of children as part of the hash, since any changes to that will be taken care of by the version of the container itself changing. So I think the inputs would just have to be a list of (UUID + optional hash) for all things that affect the container, in this case the children, sorted by UUID. We'd hash that whole thing together.

So when computing the state for the Unit, we'd only have to look at the UUIDs of all the child Components. When computing the state for the containing Subsection, we only look at the UUID+state for all the Units and hash that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants