Skip to content

Add Publish Side Effects #307

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion openedx_learning/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@
Open edX Learning ("Learning Core").
"""

__version__ = "0.26.0"
__version__ = "0.27.0"
127 changes: 114 additions & 13 deletions openedx_learning/apps/authoring/publishing/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@
PublishableEntityVersionMixin,
PublishLog,
PublishLogRecord,
PublishSideEffect,
)
from .models.publish_log import Published

Expand Down Expand Up @@ -357,10 +358,9 @@ def publish_from_drafts(
# TODO: this only handles one level deep and would need to be updated to support sections > subsections > units

# Get the IDs of the ContainerVersion for any Containers whose drafts are slated to be published.
container_version_ids = (
Container.objects.filter(publishable_entity__draft__in=draft_qset)
.values_list("publishable_entity__draft__version__containerversion__pk", flat=True)
)
container_version_ids = draft_qset.filter(entity__container__isnull=False) \
.values_list("version_id", flat=True)

if container_version_ids:
# We are publishing at least one container. Check if it has any child components that aren't already slated
# to be published.
Expand Down Expand Up @@ -414,6 +414,9 @@ def publish_from_drafts(
},
)

# We calculate side-effects as the last step in the process.
_create_container_side_effects_for_publish_log(publish_log)

return publish_log


Expand Down Expand Up @@ -614,6 +617,89 @@ def _add_to_existing_draft_change_log(
return change


def _create_container_side_effects_for_publish_log(publish_log: PublishLog):
"""
Iterate through PublishLog and add the appropriate PublishSideEffects.

A container is considered to have been "published" if any of its descendents
were published. So publishing a Component implicitly publishes any Units it
was in, which publishes the Subsections those Units were in, etc. If the
Container is not already in the PublishLog (i.e. its own metadata did not
change), then we create a PublishLogRecord where the old_version is equal
to the new_version. We also create a PublishSideEffect for every child
change that affects a container, regardless of whether that container also
changed.
Comment on lines +624 to +631
Copy link
Contributor

@bradenmacdonald bradenmacdonald May 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment needs to be clarified, because I am confused by it.

A container is considered to have been "published" if any of its descendents were published.

Publishing a component does not automatically nor necessarily "publish" any containers that use it, and we even have test cases that verify that - which still seem to be passing after this change. So I think either this is wrong, or I misunderstood something here.

publishing a Component implicitly publishes any Units itwas in, which publishes the Subsections those Units were in

Again, I don't think that's the case? But I would say "When a component is published, we may need to perform post-publish actions on containers where that component is a descendant. For example, publishing the last unpublished component in a Unit may make both the unit and its parent section no longer "contain unpublished changes", and that may require an update to the metadata about those containers in the search index used by Studio."

If the Container is not already in the PublishLog (i.e. its own metadata did not change), then we create a PublishLogRecord where the old_version is equal to the new_version. We also create a PublishSideEffect for every child change that affects a container, regardless of whether that container also changed.

Hmm, I would expect the opposite: when a container is not in the PublishLog (neither it nor any ancestor was directly published) but it has a descendant that was published, it will be added as a PublishSideEffect, because the publish could affect it but it is not itself changed, but it will not appear in the PublishLog. If the container or an ancestor was changed as part of this publish, it will be included directly in the PublishLog. I would never expect any container to appear in both the direct PublishLog and the PublishSideEffect list. (misunderstanding - see below)


I'm also wondering if you could open a related PR to show how https://github.com/openedx/edx-platform/blob/ba47f1fae5094d9608d7d2299f778afe023533e1/openedx/core/djangoapps/content_libraries/tasks.py#L99-L104 can be updated once this PR is merged?

Copy link
Member

@kdmccormick kdmccormick May 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I don't think that's the case? But I would say "When a component is published, we may need to perform post-publish actions on containers where that component is a descendant. For example, publishing the last unpublished component in a Unit may make both the unit and its parent section no longer "contain unpublished changes", and that may require an update to the metadata about those containers in the search index used by Studio."

+1, I have the same question as @bradenmacdonald .

Hmm, I would expect the opposite: when a container is not in the PublishLog (neither it nor any ancestor was directly published) but it has a descendant that was published, it will be added as a PublishSideEffect, because the publish could affect it but it is not itself changed, but it will not appear in the PublishLog. ... I would never expect any container to appear in both the direct PublishLog and the PublishSideEffect list.

This confuses me @bradenmacdonald , because PublishSideEffect joins together two PublishLogRecord rows (cause and effect). So, wouldn't the container necessarily have to have a PublishLogRecord so it could appear as the effect of the PublishSideEffect?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(@ormsbee For the never-published container K and its child C, to capture the PublishSideEffect of C -> K, would we need effect to be a new PublishLogRecord with entity=K, old_version==null and new_version==null?)

Copy link
Contributor

@bradenmacdonald bradenmacdonald May 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah OK, I definitely misunderstood that part. Please ignore that part of my comments. I was mostly reading the description and got confused by assuming that PublishSideEffect just listed any container implicitly affected by a publish, in a way completely analogous to PublishLogEntry.

Now that I (hopefully) understand it, I'm wondering if we [will] actually have a use case for tracking which children caused each particular parent to be included in the PublishLog that otherwise wouldn't be? i.e. when would we read PublishSideEffect? But either way, the comment about how it works at least makes sense to me.

Copy link
Member

@kdmccormick kdmccormick May 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A container is considered to have been "published" if any of its descendents were published. So publishing a Component implicitly publishes any Units it was in, which publishes the Subsections those Units were in, etc.

I'm still not sure about this. I feel like we're overloading the term "published" to mean two different things:

  1. I am at draft version N. My published version is N-M (i.e., M edits have occurred since last publish). Either me or my ancestor is published by the user, so I get a PublishLogRecord for (N-M)->N, and my published version becomes N.
  2. I am at draft version N. My published version is N-M. My descendent is published, as a side effect of that, I get a PublishLogRecord for (N-M)->(N-M). My published version remains at N-M.

IMO, scenario (1) makes sense to call "publishing". Scenario (2) does not. I don't know what to call it exactly, but does that distinction make sense @bradenmacdonald @ormsbee ?

Copy link
Contributor Author

@ormsbee ormsbee May 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kdmccormick:

(@ormsbee For the never-published container K and its child C, to capture the PublishSideEffect of C -> K, would we need effect to be a new PublishLogRecord with entity=K, old_version==null and new_version==null?)

No, because if a container K has never been published, then there is no side-effect to the published version of the container from the child being published. The container doesn't exist in a published state at all.

It also works the other way around. Say the published version of K has C as a child. But the draft version of K is updated to no longer have C as a child. Then someone edits and publishes just C. That still has a publish side-effect on K because the published version of K still references C, and so publishing C still changes the published state of K.

Edit: We touched on this briefly before, but this is one of the reasons why I had a hard time modeling side-effects as minor point versions of containers--because the relationships we have make it so that it's possible to get a published state of a container + child versions that has never existed as a draft.

Copy link
Contributor Author

@ormsbee ormsbee May 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A container is considered to have been "published" if any of its descendents were published. So publishing a Component implicitly publishes any Units it was in, which publishes the Subsections those Units were in, etc.

I'm still not sure about this. I feel like we're overloading the term "published" to mean two different things:

  1. I am at draft version N. My published version is N-M (i.e., M edits have occurred since last publish). Either me or my ancestor is published by the user, so I get a PublishLogRecord for (N-M)->N, and my published version becomes N.
  2. I am at draft version N. My published version is N-M. My descendent is published, as a side effect of that, I get a PublishLogRecord for (N-M)->(N-M). My published version remains at N-M.

IMO, scenario (1) makes sense to call "publishing". Scenario (2) does not. I don't know what to call it exactly, but does that distinction make sense @bradenmacdonald @ormsbee ?

Right, I get the distinction from the code point of view, but I'm not sure it matters much from the end-user point of view. When I see a "Unit last published at" timestamp as an end user, I would expect it to mean, "the published state of the Unit last changed at this time". Especially if a common workflow is going to be "search for that component that you know is in the Unit and edit/publish that". I think it would be strange for the "last published" date to never change on a Unit because we're only publishing the Components that live inside of it. Likewise, if I've recently published the Components in a Unit, I'd expect that Unit to show up as a recently published thing--just like I'd expect it to show up as a recently modified thing if one of its Components was edited.

Edit: Also, it's going to be very common that for a Unit, there will be no change at the Unit metadata level itself, and that even "publish this entire Unit" is just going to publish the one Component in it that changed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bradenmacdonald:

Now that I (hopefully) understand it, I'm wondering if we [will] actually have a use case for tracking which children caused each particular parent to be included in the PublishLog that otherwise wouldn't be? i.e. when would we read PublishSideEffect? But either way, the comment about how it works at least makes sense to me.

I think it makes it useful for a straightforward historical view of any particular thing (e.g. a Section), to see when its draft or published states changed because of its children, and which children. It's possible to reconstruct this in other ways from various tables, but more cumbersome and expensive to do so, particularly if you're listing a bunch of changes over time.

I also think we're going to want side-effect notations for things that are not parent-child relations, like how a course's published state is affected by things that are not strictly in the CourseBlock's inheritance hierarchy.

Copy link
Contributor Author

@ormsbee ormsbee May 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update to this, per this Slack thread, @jmakowski1123 confirms that a container is considered to have been "last published" at whatever time that container or any of its children/descendants are published. Copying the exact text of what was agreed to for context in case anyone is looking at this a few years down the line:

Product clarification request...

Setup: I have a Unit U1 with Components C1 and C2 in it. All changes were published at 9:00. Then I make a change to C2. That marks both C2 and U1 as having unpublished changes. Then I go and publish C2 at 10:00. Now neither U1 nor C2 show the "unpublished changes" status.

Question: Is the "last published" time for U1 9:00 or 10:00?

My current understanding is that it should be 10:00. Just like when we say, "the course was most recently published at X time", we mean that, "the published state of the course changed at X time because some part of it was published", not "everything in the course was published at X time". Similarly, my understanding is that a Unit would be considered "last modified" whenever one of its child Components was modified.

I'll try to rewrite the docstring to be clearer on how this works and why.


This is a slightly simplified version of the work we have to do with Drafts,
because we don't have to worry about being called from a bulk context or
whether we're calculating for an entire log or just one log record.

Note: I first tried to combine this logic with its draft-related counterpart
because of their obvious similaries, but I felt that the resulting code was
overly confusing, and that it was better to just repeat ourselves slightly,
rather than to try to parameterize things like the model classes, and have
ugly if-else blocks to work out the small differences.
"""
# processed_entity_ids holds the entity IDs that we've already calculated
# side-effects for. This is to save us from recalculating side-effects for
# the same ancestor relationships over and over again. So if we're calling
# this function in a loop for all the Components in a Unit, we won't be
# recalculating the Unit's side-effect on its Subsection, and its
# Subsection's side-effect on its Section each time through the loop.
# It also guards against infinite parent-child relationship loops, though
# those aren't *supposed* to be allowed anyhow.
processed_entity_ids: set[int] = set()
for original_change in publish_log.records.all():
changes_and_containers = [
(original_change, container)
for container
in get_containers_with_entity(
original_change.entity_id,
ignore_pinned=True,
published=True,
)
]
while changes_and_containers:
change, container = changes_and_containers.pop()

# If the container is not already in the PublishLog, we need to
# add it. Since it's being caused as a PublishSideEffect, we're
# going add it with the old_version == new_version convention, i.e.
# the *only* change is because of one its children.
container_published_version_pk = container.versioning.published.pk
container_change, _created = PublishLogRecord.objects.get_or_create(
publish_log=change.publish_log,
entity_id=container.pk,
defaults={
'old_version_id': container_published_version_pk,
'new_version_id': container_published_version_pk
}
)

# Mark that change in the current loop has the side effect of
# changing the parent container. We'll do this regardless of whether
# the container version itself also changed. If a Unit has a
# Component and both the Unit and Component have their versions
# incremented, then the Unit has changed in both ways (the Unit's
# internal metadata as well as the new version of the child
# component).
PublishSideEffect.objects.get_or_create(cause=change, effect=container_change)
processed_entity_ids.add(change.entity_id)

# Now we find the next layer up of containers. So if the originally
# passed in publishable_entity_id was for a Component, then the
# ``container`` we've been creating the side effect for in this loop
# is the Unit, and ``parents_of_container`` would be any Subsections
# that contain the Unit.
parents_of_container = get_containers_with_entity(container.pk, ignore_pinned=True)

changes_and_containers.extend(
(container_change, container_parent)
for container_parent in parents_of_container
if container_parent.pk not in processed_entity_ids
)


def _create_container_side_effects_for_draft_change_log(change_log: DraftChangeLog):
"""
Iterate through the whole DraftChangeLog and process side-effects.
Expand Down Expand Up @@ -685,7 +771,7 @@ def _create_container_side_effects_for_draft_change(
# Now we find the next layer up of containers. So if the originally
# passed in publishable_entity_id was for a Component, then the
# ``container`` we've been creating the side effect for in this loop
# is the Unit, and ``parents_of_container`` would be any Sequences
# is the Unit, and ``parents_of_container`` would be any Subsections
# that contain the Unit.
parents_of_container = get_containers_with_entity(container.pk, ignore_pinned=True)

Expand Down Expand Up @@ -1363,6 +1449,7 @@ def get_containers_with_entity(
publishable_entity_pk: int,
*,
ignore_pinned=False,
published=False,
) -> QuerySet[Container]:
"""
[ 🛑 UNSTABLE ]
Expand All @@ -1375,16 +1462,30 @@ def get_containers_with_entity(
publishable_entity_pk: The ID of the PublishableEntity to search for.
ignore_pinned: if true, ignore any pinned references to the entity.
"""
relation_model = "published" if published else "draft"
if ignore_pinned:
qs = Container.objects.filter(
# Note: these two conditions must be in the same filter() call, or the query won't be correct.
publishable_entity__draft__version__containerversion__entity_list__entitylistrow__entity_id=publishable_entity_pk, # pylint: disable=line-too-long # noqa: E501
publishable_entity__draft__version__containerversion__entity_list__entitylistrow__entity_version_id=None, # pylint: disable=line-too-long # noqa: E501
)
filter_dict = {
# Note: these two conditions must be in the same filter() call,
# or the query won't be correct.
(
f"publishable_entity__{relation_model}__version__"
"containerversion__entity_list__entitylistrow__entity_id"
): publishable_entity_pk,
(
f"publishable_entity__{relation_model}__version__"
"containerversion__entity_list__entitylistrow__entity_version_id"
): None,
}
qs = Container.objects.filter(**filter_dict)
else:
qs = Container.objects.filter(
publishable_entity__draft__version__containerversion__entity_list__entitylistrow__entity_id=publishable_entity_pk, # pylint: disable=line-too-long # noqa: E501
)
filter_dict = {
(
f"publishable_entity__{relation_model}__version__"
"containerversion__entity_list__entitylistrow__entity_id"
): publishable_entity_pk
}
qs = Container.objects.filter(**filter_dict)

return qs.order_by("pk").distinct() # Ordering is mostly for consistent test cases.


Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
# Generated by Django 4.2.16 on 2025-04-17 08:47

import django.db.models.deletion
from django.db import migrations, models


class Migration(migrations.Migration):

dependencies = [
("oel_publishing", "0008_alter_draftchangelogrecord_options_and_more"),
]

operations = [
migrations.AlterModelOptions(
name="draftchangelogrecord",
options={
"verbose_name": "Draft Change Log Record",
"verbose_name_plural": "Draft Change Log Records",
},
),
migrations.AlterModelOptions(
name="draftsideeffect",
options={
"verbose_name": "Draft Side Effect",
"verbose_name_plural": "Draft Side Effects",
},
),
migrations.AlterField(
model_name="draftchangelogrecord",
name="draft_change_log",
field=models.ForeignKey(
on_delete=django.db.models.deletion.CASCADE,
related_name="records",
to="oel_publishing.draftchangelog",
),
),
migrations.AlterField(
model_name="draftsideeffect",
name="effect",
field=models.ForeignKey(
on_delete=django.db.models.deletion.RESTRICT,
related_name="affected_by",
to="oel_publishing.draftchangelogrecord",
),
),
migrations.CreateModel(
name="PublishSideEffect",
fields=[
(
"id",
models.BigAutoField(
auto_created=True,
primary_key=True,
serialize=False,
verbose_name="ID",
),
),
(
"cause",
models.ForeignKey(
on_delete=django.db.models.deletion.RESTRICT,
related_name="causes",
to="oel_publishing.publishlogrecord",
),
),
(
"effect",
models.ForeignKey(
on_delete=django.db.models.deletion.RESTRICT,
related_name="affected_by",
to="oel_publishing.publishlogrecord",
),
),
],
options={
"verbose_name": "Publish Side Effect",
"verbose_name_plural": "Publish Side Effects",
},
),
migrations.AddConstraint(
model_name="publishsideeffect",
constraint=models.UniqueConstraint(
fields=("cause", "effect"), name="oel_pub_pse_uniq_c_e"
),
),
]
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
from .draft_log import Draft, DraftChangeLog, DraftChangeLogRecord, DraftSideEffect
from .entity_list import EntityList, EntityListRow
from .learning_package import LearningPackage
from .publish_log import Published, PublishLog, PublishLogRecord
from .publish_log import Published, PublishLog, PublishLogRecord, PublishSideEffect
from .publishable_entity import (
PublishableContentModelRegistry,
PublishableEntity,
Expand Down
55 changes: 55 additions & 0 deletions openedx_learning/apps/authoring/publishing/models/publish_log.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
"""
from django.conf import settings
from django.db import models
from django.utils.translation import gettext_lazy as _

from openedx_learning.lib.fields import case_insensitive_char_field, immutable_uuid_field, manual_date_time_field

Expand Down Expand Up @@ -61,6 +62,15 @@ class PublishLogRecord(models.Model):

To revert a publish, we would make a new publish that swaps ``old_version``
and ``new_version`` field values.

If the old_version and new_version of a PublishLogRecord match, it means
that the definition of the entity itself did not change (i.e. no new
PublishableEntityVersion was created), but something else was published that
had the side-effect of changing the published state of this entity. For
instance, if a Unit has unpinned references to its child Components (which
it almost always will), then publishing one of those Components will alter
the published state of the Unit, even if the UnitVersion does not change. In
that case, we still consider the Unit to have been "published".
"""

publish_log = models.ForeignKey(
Expand Down Expand Up @@ -148,3 +158,48 @@ class Published(models.Model):
class Meta:
verbose_name = "Published Entity"
verbose_name_plural = "Published Entities"


class PublishSideEffect(models.Model):
"""
Model to track when a change in one Published entity affects others.

Our first use case for this is that changes involving child components are
thought to affect parent Units, even if the parent's version doesn't change.

Side-effects are recorded in a collapsed form that only captures one level.
So if Components C1 and C2 are both published and they are part of Unit U1,
which is in turn a part of Subsection SS1, then the PublishSideEffect
entries are::

(C1, U1)
(C2, U1)
(U1, SS1)

We do not keep entries for (C1, SS1) or (C2, SS1). This is to make the model
simpler, so we don't have to differentiate between direct side-effects and
transitive side-effects in the model.
.. no_pii:
"""
cause = models.ForeignKey(
PublishLogRecord,
on_delete=models.RESTRICT,
related_name='causes',
)
effect = models.ForeignKey(
PublishLogRecord,
on_delete=models.RESTRICT,
related_name='affected_by',
)

class Meta:
constraints = [
# Duplicate entries for cause & effect are just redundant. This is
# here to guard against weird bugs that might introduce this state.
models.UniqueConstraint(
fields=["cause", "effect"],
name="oel_pub_pse_uniq_c_e",
)
]
verbose_name = _("Publish Side Effect")
verbose_name_plural = _("Publish Side Effects")
Loading