Skip to content

Conversation

@samuelallan72
Copy link
Contributor

@samuelallan72 samuelallan72 commented Aug 26, 2025

🛑 Note from @bradenmacdonald:

we are not planning to merge this PR for now. We are just using this branch to evaluate Typesense, and would only merge this if a decision is made to support Typesense in future releases of Open edX.

Description

Add a Typesense backend for searching.

NOTE: make compile-requirements is currently broken on master, so I hacked typesense into the compiled requirements .txt files by: make upgrade && make compile-requirements, then removing all the changes not related to the new typesense dependency. UPDATE: it's fixed on master since #226 ; I can rebase to update requirements here later if we continue work on this (if there are plans to get it merged).

Supporting information

Private-ref: https://tasks.opencraft.com/browse/BB-9975

Test instructions

Test searching

  • Set up a Tutor devstack with:
  • Create two test courses in devstack (the demo course is fine, but ideally also a second one).
  • Navigate to the courses' discussion tabs.
  • Create some posts and comments in both courses.
  • Verify the data is indexed in Typesense (perhaps by using https://bfritscher.github.io/typesense-dashboard to browse the typesense server data; use the TYPESENSE_API_KEY saved in the Tutor config.yml).
  • Do some searches.
  • Verify search returns expected results.
    • There should only be threads for the current course returned.
    • Verify it finds matches in the thread title, thread post body, and comments.

Test reindexing

  • Mess up the data in typesense (perhaps by using the typesense dashboard from above).
  • Run tutor dev exec lms -- python manage.py lms rebuild_forum_indices to reindex the data.
  • View the data in typesense again, and verify it's been fixed.
  • Do some searches again, and verify results return as expected again.

Test initializing

  • Run tutor dev exec lms -- python manage.py lms initialize_forum_indices --force and verify the indices are recreated.
  • Verify that viewing the data in typesense shows no results.
  • Run tutor dev exec lms -- python manage.py lms rebuild_forum_indices to reindex the data.

Test validating the indices

  • Run tutor dev exec lms -- python manage.py lms validate_forum_indices.
  • Verify no errors are raised.
  • Mess up the collections in Typesense - eg. remove a field from the schema or delete a collection.
  • Run tutor dev exec lms -- python manage.py lms validate_forum_indices again.
  • Verify errors are raised.
  • Run tutor dev exec lms -- python manage.py lms initialize_forum_indices --force.
  • Run tutor dev exec lms -- python manage.py lms validate_forum_indices again.
  • Verify no errors are raised.

Rough instructions for setting up a devstack for this:

cd devstack
git clone github.com/open-craft/forum
cd forum
git checkout samuel/typesense-backend
cd ..
tutor mounts add ./forum
tutor plugins enable forum

git clone github.com/open-craft/tutor-contrib-typesense
pip install -e ./tutor-contrib-typesense
tutor plugins enable typesense

tutor images build openedx-dev
tutor dev launch
tutor dev do importdemocourse

Deadline

None

Merge checklist:
Check off if complete or not applicable:

  • Version bumped
  • Changelog record added
  • Documentation updated (not only docstrings)
  • Fixup commits are squashed away
  • Unit tests added/updated
  • Manual testing instructions provided
  • Noted any: Concerns, dependencies, migration issues, deadlines, tickets

@openedx-webhooks openedx-webhooks added the open-source-contribution PR author is not from Axim or 2U label Aug 26, 2025
@openedx-webhooks
Copy link

openedx-webhooks commented Aug 26, 2025

Thanks for the pull request, @samuelallan72!

This repository is currently maintained by @openedx/wg-maintainers-forums.

Once you've gone through the following steps feel free to tag them in a comment and let them know that your changes are ready for engineering review.

🔘 Get product approval

If you haven't already, check this list to see if your contribution needs to go through the product review process.

  • If it does, you'll need to submit a product proposal for your contribution, and have it reviewed by the Product Working Group.
    • This process (including the steps you'll need to take) is documented here.
  • If it doesn't, simply proceed with the next step.
🔘 Provide context

To help your reviewers and other members of the community understand the purpose and larger context of your changes, feel free to add as much of the following information to the PR description as you can:

  • Dependencies

    This PR must be merged before / after / at the same time as ...

  • Blockers

    This PR is waiting for OEP-1234 to be accepted.

  • Timeline information

    This PR must be merged by XX date because ...

  • Partner information

    This is for a course on edx.org.

  • Supporting documentation
  • Relevant Open edX discussion forum threads
🔘 Get a green build

If one or more checks are failing, continue working on your changes until this is no longer the case and your build turns green.


Where can I find more information?

If you'd like to get more details on all aspects of the review process for open source pull requests (OSPRs), check out the following resources:

When can I expect my changes to be merged?

Our goal is to get community contributions seen and reviewed as efficiently as possible.

However, the amount of time that it takes to review and merge a PR can vary significantly based on factors such as:

  • The size and impact of the changes that it introduces
  • The need for product review
  • Maintenance status of the parent repository

💡 As a result it may take up to several weeks or months to complete a review and merge your PR.

@github-project-automation github-project-automation bot moved this to Needs Triage in Contributions Aug 26, 2025
@samuelallan72 samuelallan72 force-pushed the samuel/typesense-backend branch from 2323f98 to afbb4a7 Compare August 26, 2025 07:28
@mphilbrick211 mphilbrick211 added the FC Relates to an Axim Funded Contribution project label Aug 26, 2025
@samuelallan72 samuelallan72 force-pushed the samuel/typesense-backend branch 2 times, most recently from 6ee6966 to b1c75f2 Compare August 27, 2025 08:33
@mphilbrick211 mphilbrick211 moved this from Needs Triage to Waiting on Author in Contributions Aug 27, 2025
@samuelallan72 samuelallan72 force-pushed the samuel/typesense-backend branch from b1c75f2 to 1815843 Compare August 28, 2025 02:42
@samuelallan72 samuelallan72 changed the title wip: feat: add Typesense backend for search feat: add Typesense backend for search Aug 28, 2025
@samuelallan72 samuelallan72 force-pushed the samuel/typesense-backend branch from 1815843 to 6bd530b Compare August 28, 2025 03:16
@samuelallan72 samuelallan72 changed the title feat: add Typesense backend for search feat: add Typesense backend for search (FC-0091) Aug 28, 2025
@samuelallan72 samuelallan72 marked this pull request as ready for review August 28, 2025 07:22
@bradenmacdonald
Copy link

I ran make upgrade as part of this to get the pip-compile process working - this may need to be reverted or refactored out into a separate PR.

Yeah, please don't include unrelated package bumps if they're not needed. You should be able to just run make compile-requirements to update the .txt files without also upgrading unrelated packages.

Copy link

@bradenmacdonald bradenmacdonald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good so far. Did you look into the possibility of combining the threads and comments into a single index? Both object types have comment_thread_id, course_id, and body, and the other thread-only fields (title, commentable_id, context could either be omitted from the comments or de-normalized and included. I don't know if that's necessarily any better, just wondering your thoughts.

@samuelallan72
Copy link
Contributor Author

samuelallan72 commented Aug 28, 2025

@bradenmacdonald

Yeah, please don't include unrelated package bumps if they're not needed. You should be able to just run make compile-requirements to update the .txt files without also upgrading unrelated packages.

Unfortunately here I couldn't; make compile-requirements is currently broken on master:

pip._vendor.resolvelib.resolvers.ResolutionImpossible: [RequirementInformation(requirement=SpecifierRequirement('docutils==0.22'), parent=None), RequirementInformation(requirement=SpecifierRequirement('docutils>=0.21.2'), parent=LinkCandidate('https://files.pythonhosted.org/packages/e1/67/921ec3024056483db83953ae8e48079ad62b92db7880013ca77632921dd0/readme_renderer-44.0-py3-none-any.whl (from https://pypi.org/simple/readme-renderer/) (requires-python:>=3.9)')), RequirementInformation(requirement=SpecifierRequirement('docutils<=0.21.2,>=0.19'), parent=LinkCandidate('https://files.pythonhosted.org/packages/0c/f1/6ffd5d76578e98a8f21ae7216b88a7212c778f665f1a8f4f8ce6f9605da4/doc8-1.1.2-py3-none-any.whl (from https://pypi.org/simple/doc8/) (requires-python:>=3.8)'))]

Running make upgrade fixed it. Maybe I can open a new PR with the version bumps? Or do we want to avoid a make upgrade altogether, and try to manually upgrade/pin just the conflicting dependencies?

@bradenmacdonald
Copy link

bradenmacdonald commented Aug 28, 2025

In such cases, I would personally run make upgrade but then go through the diff and discard all the changes that aren't related to the new dependency before committing.

@samuelallan72
Copy link
Contributor Author

@bradenmacdonald

Did you look into the possibility of combining the threads and comments into a single index? Both object types have comment_thread_id, course_id, and body, and the other thread-only fields (title, commentable_id, context could either be omitted from the comments or de-normalized and included. I don't know if that's necessarily any better, just wondering your thoughts.

I did think about that. I'm not sure why it has been kept as two indexes, or why in Meilisearch, the data has been normalised but the indexes kept separate. I kept them separate to be consistent with the other approaches, and I'm not sure if there are any other hidden use cases.

I would imagine the most efficient method would be to combine to a single index, and normalise the data too:

  • id: thread-<id> for comment threads, or comment-<id> for comments
  • thread_id: comment_thread_id from comments, or id from comment threads
  • commentable_id: from comment threads, or none for comments
  • course_id: available on both
  • context: available on both
  • text: title + body from comment threads, body from comments

Then any search would be a single search operation, rather than two.

I might be missing some use cases though, and I'm not sure exactly how commentable_id or context are used.

@samuelallan72 samuelallan72 mentioned this pull request Aug 29, 2025
7 tasks
@samuelallan72
Copy link
Contributor Author

@bradenmacdonald

In such cases, I would personally run make upgrade but then go through the diff and discard all the changes that aren't related to the new dependency before committing.

Sounds good, I've done that for this PR now. :) I also noted in the description that this has been hacked together (ie. not a reproducible make compile-requirements), and I opened a PR to update the dependencies at #226

@bradenmacdonald
Copy link

@samuelallan72 My impression is that the only reason they are stored in two separate search indexes is because they're represented by two different Django models, and the original implementation of search was based around some 1:1 mapping of models to indexes. But there's no reason to stick to that necessarily as far as I can see.

For the Studio search features, we pull in all kinds of different models (course components, library components, library containers, ...) and normalize them into one index, and it works really well.

@samuelallan72
Copy link
Contributor Author

@bradenmacdonald that sounds good, I'll update the code to use a single index and we'll see how it goes. My only concern with this approach is that it may be more difficult to objectively compare with the Meilisearch backend, since it will be two indices vs one.

@samuelallan72
Copy link
Contributor Author

@bradenmacdonald I converted it to use a single index, ready for review. :)

Copy link

@bradenmacdonald bradenmacdonald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this mostly as described, and it's working very well! Didn't find any issues with it, though I had some suggestions for the code.

@samuelallan72
Copy link
Contributor Author

Thanks for your review @bradenmacdonald ! I've addressed your comments, and it's ready for another pass. :)

Copy link

@bradenmacdonald bradenmacdonald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks great, and is working very well as far as I can tell.

  • I tested this: similar to the described procedure
  • I read through the code
  • I checked for accessibility issues: n/a
  • Includes documentation: we will add docs if we plan to merge it.

@bradenmacdonald
Copy link

🛑 Note: we are not planning to merge this PR for now. We are just using this branch to evaluate Typesense, and would only merge this if a decision is made to support Typesense in future releases of Open edX.

samuelallan72 and others added 5 commits October 21, 2025 10:37
Also run `make upgrade` to be able to successfully compile dependencies.
This upgrade may need to be refactored out of this commit.

Private-ref: https://tasks.opencraft.com/browse/BB-9975
Co-authored-by: Braden MacDonald <mail@bradenm.com>
Also finish implementing the validation method,
and address lint errors.
@samuelallan72 samuelallan72 force-pushed the samuel/typesense-backend branch from 323c6c3 to 98a92fb Compare October 21, 2025 00:10
@samuelallan72
Copy link
Contributor Author

rebased on master and fixed merge conflicts

@mphilbrick211 mphilbrick211 moved this from Waiting on Author to In Eng Review in Contributions Oct 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

FC Relates to an Axim Funded Contribution project open-source-contribution PR author is not from Axim or 2U

Projects

Status: In Eng Review

Development

Successfully merging this pull request may close these issues.

4 participants