Skip to content

Conversation

@MoralCode
Copy link
Contributor

@MoralCode MoralCode commented Nov 11, 2025

Description
@cdolfi has reported an issue (#3129) where repos that have moved and are redirecting when visited in a browser are not having their URLs updated to reflect the move.

Using an AI tool to look over the relevant task and identify issues for review, it identified that the hit_api function that was being used for the API calls was internally passing follow_redirects=True to the underlying HTTP library.

This explains why the repo urls werent being updated - because all github calls were automatically following the redirect, meaning the check for response_code == 301 later in the code would practically never get called.

This is related to #3129 (but also slightly exacerbates it because it doesnt yet store the old url)

Notes for Reviewers
I have yet to test this locally. Trying to see if i can replicate the issue, although i have a lot of repos in my local instance now that I should probably clear out....

Signed commits

  • Yes, I signed my commits.

Generative AI disclosure

  • This contribution was assisted or created by Generative AI tools.
    • GPT-5 was used through the chat feature of Cursor to provide an initial summary of problems that were then reviewed by me, leading to the set of fixes present in this PR. The only generated code in this pr is in commit 12d7be2, where Cursor helped suggest some code to log and raise an error if, for some crazy reason, a 301 response comes back without a Location header. This code was reviewed and built upon by me (to make it casing-agnostic) before submitting.

@MoralCode MoralCode added bug Documents unexpected/wrong/buggy behavior disclosed-ai Label for contributions that contain disclosed, reviewed, or responsibly-submitted AI content. labels Nov 11, 2025
@sgoggins
Copy link
Member

@MoralCode : Your root cause analysis sounds DEAD ON, and explains why this appears not to be working in the case of automatic moving. This will fix that issue.

@MoralCode MoralCode force-pushed the move_detection branch 2 times, most recently from c362bb9 to 4eaae5f Compare November 11, 2025 20:17
@MoralCode MoralCode added this to the v0.92.0 Release milestone Nov 11, 2025
@MoralCode MoralCode self-assigned this Nov 11, 2025
sgoggins
sgoggins previously approved these changes Nov 13, 2025
Copy link
Member

@sgoggins sgoggins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@MoralCode
Copy link
Contributor Author

Still waiting on testing this to 100% confirm that this will update the repo_git url

@sgoggins
Copy link
Member

Still waiting on testing this to 100% confirm that this will update the repo_git url

Will hold off on marking it ready and merging until you green light it then.

@sgoggins sgoggins added the ready Items tested and seeking additional approvals or a merge. Usually for items under active development label Nov 13, 2025
@MoralCode MoralCode removed the ready Items tested and seeking additional approvals or a merge. Usually for items under active development label Nov 13, 2025
@MoralCode
Copy link
Contributor Author

Current issues with this: seems to be a dependence on using the repo URL for querying:

[augur]        | 2025-11-18 23:04:30 cd3ac88591d1 augur_collection_monitor[276] INFO Setting github repo core status to collecting for repo: https://github.com/moralcode/classclockapi
[augur]        | [2025-11-18 23:04:30,464: INFO/MainProcess] Task augur.tasks.github.detect_move.tasks.detect_github_repo_move_core[38307c62-7947-495d-ac2f-35d8bd2bb241] received
[augur]        | 2025-11-18 23:04:30 cd3ac88591d1 detect_github_repo_move_core[280] INFO Starting repo_move operation with https://github.com/moralcode/classclockapi
[augur]        | 2025-11-18 23:04:30 cd3ac88591d1 augur_collection_monitor[276] INFO Starting collection on 0 secondary repos
[augur]        | 2025-11-18 23:04:30 cd3ac88591d1 augur_collection_monitor[276] INFO Starting collection on 0 facade repos
[augur]        | 2025-11-18 23:04:30 cd3ac88591d1 detect_github_repo_move_core[280] INFO Pinging repo: https://github.com/moralcode/classclockapi
[augur]        | [2025-11-18 23:04:30,474: INFO/ForkPoolWorker-2] Task augur.tasks.start_tasks.augur_collection_monitor[bb2d2553-8919-4da8-ac60-e628d826b9c3] succeeded in 0.09832821798045188s: None
[augur]        | 2025-11-18 23:04:30 cd3ac88591d1 detect_github_repo_move_core[280] INFO Retrieved 1 github api keys for use
[augur]        | 2025-11-18 23:04:30 cd3ac88591d1 detect_github_repo_move_core[280] DEBUG Key used for request (masked): ghp_EA******kQm
[augur]        | 2025-11-18 23:04:30 cd3ac88591d1 analyze_commits_in_parallel[281] DEBUG Analyzing commit 1f9454ebee957364e9073378ad2562b37f9c0394 for repo_id=1
[augur]        | 2025-11-18 23:04:30 cd3ac88591d1 detect_github_repo_move_core[280] DEBUG Key used for request (masked): ghp_EA******kQm
[augur]        | 2025-11-18 23:04:30 cd3ac88591d1 detect_github_repo_move_core[280] INFO Updated repo for https://github.com/classclock/API
[augur]        | 
[augur]        | [2025-11-18 23:04:30,942: WARNING/ForkPoolWorker-2] 2025-11-18 23:04:30 cd3ac88591d1 core_task_failure[280] ERROR Task 38307c62-7947-495d-ac2f-35d8bd2bb241 raised exception: ERROR: Repo has moved! Resetting Collection!
[augur]        |  Traceback: Traceback (most recent call last):
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 453, in trace_task
[augur]        |     R = retval = fun(*args, **kwargs)
[augur]        |                  ^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 736, in __protected_call__
[augur]        |     return self.run(*args, **kwargs)
[augur]        |            ^^^^^^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/augur/tasks/github/detect_move/tasks.py", line 27, in detect_github_repo_move_core
[augur]        |     ping_github_for_repo_move(session, key_auth, repo, logger)
[augur]        |   File "/augur/augur/tasks/github/detect_move/core.py", line 89, in ping_github_for_repo_move
[augur]        |     raise Exception("ERROR: Repo has moved! Resetting Collection!")
[augur]        | Exception: ERROR: Repo has moved! Resetting Collection!
[augur]        | [2025-11-18 23:04:30,943: WARNING/ForkPoolWorker-2] 2025-11-18 23:04:30 cd3ac88591d1 core_task_failure[280] INFO Repo git: https://github.com/moralcode/classclockapi
[augur]        | [2025-11-18 23:04:30,955: WARNING/ForkPoolWorker-2] /augur/.venv/lib/python3.11/site-packages/celery/app/trace.py:662: RuntimeWarning: Exception raised outside body: NoResultFound('No row was found when one was required'):
[augur]        | Traceback (most recent call last):
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 453, in trace_task
[augur]        |     R = retval = fun(*args, **kwargs)
[augur]        |                  ^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 736, in __protected_call__
[augur]        |     return self.run(*args, **kwargs)
[augur]        |            ^^^^^^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/augur/tasks/github/detect_move/tasks.py", line 27, in detect_github_repo_move_core
[augur]        |     ping_github_for_repo_move(session, key_auth, repo, logger)
[augur]        |   File "/augur/augur/tasks/github/detect_move/core.py", line 89, in ping_github_for_repo_move
[augur]        |     raise Exception("ERROR: Repo has moved! Resetting Collection!")
[augur]        | Exception: ERROR: Repo has moved! Resetting Collection!
[augur]        | 
[augur]        | During handling of the above exception, another exception occurred:
[augur]        | 
[augur]        | Traceback (most recent call last):
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 470, in trace_task
[augur]        |     I, R, state, retval = on_error(task_request, exc)
[augur]        |                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 381, in on_error
[augur]        |     R = I.handle_error_state(
[augur]        |         ^^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 175, in handle_error_state
[augur]        |     return {
[augur]        |            ^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 233, in handle_failure
[augur]        |     task.on_failure(exc, req.id, req.args, req.kwargs, einfo)
[augur]        |   File "/augur/augur/tasks/init/celery_app.py", line 107, in on_failure
[augur]        |     self.augur_handle_task_failure(exc, task_id, repo_git, "core_task_failure")
[augur]        |   File "/augur/augur/tasks/init/celery_app.py", line 90, in augur_handle_task_failure
[augur]        |     repo = session.query(Repo).filter(Repo.repo_git == repo_git).one()
[augur]        |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/orm/query.py", line 2798, in one
[augur]        |     return self._iter().one()  # type: ignore
[augur]        |            ^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/engine/result.py", line 1827, in one
[augur]        |     return self._only_one_row(
[augur]        |            ^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/engine/result.py", line 760, in _only_one_row
[augur]        |     raise exc.NoResultFound(
[augur]        | sqlalchemy.exc.NoResultFound: No row was found when one was required
[augur]        | 
[augur]        |   warn(RuntimeWarning(
[augur]        | 
[augur]        | [2025-11-18 23:04:30,983: WARNING/ForkPoolWorker-2] 2025-11-18 23:04:30,983,983ms [PID: 280] core_task_failure [ERROR] Task 38307c62-7947-495d-ac2f-35d8bd2bb241 raised exception: No row was found when one was required
[augur]        |  Traceback: Traceback (most recent call last):
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 453, in trace_task
[augur]        |     R = retval = fun(*args, **kwargs)
[augur]        |                  ^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 736, in __protected_call__
[augur]        |     return self.run(*args, **kwargs)
[augur]        |            ^^^^^^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/augur/tasks/github/detect_move/tasks.py", line 27, in detect_github_repo_move_core
[augur]        |     ping_github_for_repo_move(session, key_auth, repo, logger)
[augur]        |   File "/augur/augur/tasks/github/detect_move/core.py", line 89, in ping_github_for_repo_move
[augur]        |     raise Exception("ERROR: Repo has moved! Resetting Collection!")
[augur]        | Exception: ERROR: Repo has moved! Resetting Collection!
[augur]        | 
[augur]        | During handling of the above exception, another exception occurred:
[augur]        | 
[augur]        | Traceback (most recent call last):
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 470, in trace_task
[augur]        |     I, R, state, retval = on_error(task_request, exc)
[augur]        |                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 381, in on_error
[augur]        |     R = I.handle_error_state(
[augur]        |         ^^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 175, in handle_error_state
[augur]        |     return {
[augur]        |            ^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 233, in handle_failure
[augur]        |     task.on_failure(exc, req.id, req.args, req.kwargs, einfo)
[augur]        |   File "/augur/augur/tasks/init/celery_app.py", line 107, in on_failure
[augur]        |     self.augur_handle_task_failure(exc, task_id, repo_git, "core_task_failure")
[augur]        |   File "/augur/augur/tasks/init/celery_app.py", line 90, in augur_handle_task_failure
[augur]        |     repo = session.query(Repo).filter(Repo.repo_git == repo_git).one()
[augur]        |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/orm/query.py", line 2798, in one
[augur]        |     return self._iter().one()  # type: ignore
[augur]        |            ^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/engine/result.py", line 1827, in one
[augur]        |     return self._only_one_row(
[augur]        |            ^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/engine/result.py", line 760, in _only_one_row
[augur]        |     raise exc.NoResultFound(
[augur]        | sqlalchemy.exc.NoResultFound: No row was found when one was required
[augur]        | [2025-11-18 23:04:30,983: WARNING/ForkPoolWorker-2] 2025-11-18 23:04:30 cd3ac88591d1 core_task_failure[280] ERROR Task 38307c62-7947-495d-ac2f-35d8bd2bb241 raised exception: No row was found when one was required
[augur]        |  Traceback: Traceback (most recent call last):
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 453, in trace_task
[augur]        |     R = retval = fun(*args, **kwargs)
[augur]        |                  ^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 736, in __protected_call__
[augur]        |     return self.run(*args, **kwargs)
[augur]        |            ^^^^^^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/augur/tasks/github/detect_move/tasks.py", line 27, in detect_github_repo_move_core
[augur]        |     ping_github_for_repo_move(session, key_auth, repo, logger)
[augur]        |   File "/augur/augur/tasks/github/detect_move/core.py", line 89, in ping_github_for_repo_move
[augur]        |     raise Exception("ERROR: Repo has moved! Resetting Collection!")
[augur]        | Exception: ERROR: Repo has moved! Resetting Collection!
[augur]        | 
[augur]        | During handling of the above exception, another exception occurred:
[augur]        | 
[augur]        | Traceback (most recent call last):
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 470, in trace_task
[augur]        |     I, R, state, retval = on_error(task_request, exc)
[augur]        |                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 381, in on_error
[augur]        |     R = I.handle_error_state(
[augur]        |         ^^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 175, in handle_error_state
[augur]        |     return {
[augur]        |            ^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 233, in handle_failure
[augur]        |     task.on_failure(exc, req.id, req.args, req.kwargs, einfo)
[augur]        |   File "/augur/augur/tasks/init/celery_app.py", line 107, in on_failure
[augur]        |     self.augur_handle_task_failure(exc, task_id, repo_git, "core_task_failure")
[augur]        |   File "/augur/augur/tasks/init/celery_app.py", line 90, in augur_handle_task_failure
[augur]        |     repo = session.query(Repo).filter(Repo.repo_git == repo_git).one()
[augur]        |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/orm/query.py", line 2798, in one
[augur]        |     return self._iter().one()  # type: ignore
[augur]        |            ^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/engine/result.py", line 1827, in one
[augur]        |     return self._only_one_row(
[augur]        |            ^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/engine/result.py", line 760, in _only_one_row
[augur]        |     raise exc.NoResultFound(
[augur]        | sqlalchemy.exc.NoResultFound: No row was found when one was required
[augur]        | [2025-11-18 23:04:30,983: WARNING/ForkPoolWorker-2] 2025-11-18 23:04:30,983,983ms [PID: 280] core_task_failure [INFO] Repo git: https://github.com/moralcode/classclockapi
[augur]        | [2025-11-18 23:04:30,983: WARNING/ForkPoolWorker-2] 2025-11-18 23:04:30 cd3ac88591d1 core_task_failure[280] INFO Repo git: https://github.com/moralcode/classclockapi
[augur]        | [2025-11-18 23:04:30,999: ERROR/MainProcess] Task handler raised error: NoResultFound('No row was found when one was required')
[augur]        | billiard.einfo.RemoteTraceback: 
[augur]        | """
[augur]        | Traceback (most recent call last):
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 453, in trace_task
[augur]        |     R = retval = fun(*args, **kwargs)
[augur]        |                  ^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 736, in __protected_call__
[augur]        |     return self.run(*args, **kwargs)
[augur]        |            ^^^^^^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/augur/tasks/github/detect_move/tasks.py", line 27, in detect_github_repo_move_core
[augur]        |     ping_github_for_repo_move(session, key_auth, repo, logger)
[augur]        |   File "/augur/augur/tasks/github/detect_move/core.py", line 89, in ping_github_for_repo_move
[augur]        |     raise Exception("ERROR: Repo has moved! Resetting Collection!")
[augur]        | Exception: ERROR: Repo has moved! Resetting Collection!
[augur]        | 
[augur]        | During handling of the above exception, another exception occurred:
[augur]        | 
[augur]        | Traceback (most recent call last):
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 470, in trace_task
[augur]        |     I, R, state, retval = on_error(task_request, exc)
[augur]        |                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 381, in on_error
[augur]        |     R = I.handle_error_state(
[augur]        |         ^^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 175, in handle_error_state
[augur]        |     return {
[augur]        |            ^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 233, in handle_failure
[augur]        |     task.on_failure(exc, req.id, req.args, req.kwargs, einfo)
[augur]        |   File "/augur/augur/tasks/init/celery_app.py", line 107, in on_failure
[augur]        |     self.augur_handle_task_failure(exc, task_id, repo_git, "core_task_failure")
[augur]        |   File "/augur/augur/tasks/init/celery_app.py", line 90, in augur_handle_task_failure
[augur]        |     repo = session.query(Repo).filter(Repo.repo_git == repo_git).one()
[augur]        |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/orm/query.py", line 2798, in one
[augur]        |     return self._iter().one()  # type: ignore
[augur]        |            ^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/engine/result.py", line 1827, in one
[augur]        |     return self._only_one_row(
[augur]        |            ^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/engine/result.py", line 760, in _only_one_row
[augur]        |     raise exc.NoResultFound(
[augur]        | sqlalchemy.exc.NoResultFound: No row was found when one was required
[augur]        | 
[augur]        | During handling of the above exception, another exception occurred:
[augur]        | 
[augur]        | Traceback (most recent call last):
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/billiard/pool.py", line 362, in workloop
[augur]        |     result = (True, prepare_result(fun(*args, **kwargs)))
[augur]        |                                    ^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 651, in fast_trace_task
[augur]        |     R, I, T, Rstr = tasks[task].__trace__(
[augur]        |                     ^^^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 574, in trace_task
[augur]        |     I, _, _, _ = on_error(task_request, exc)
[augur]        |                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 381, in on_error
[augur]        |     R = I.handle_error_state(
[augur]        |         ^^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 175, in handle_error_state
[augur]        |     return {
[augur]        |            ^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 233, in handle_failure
[augur]        |     task.on_failure(exc, req.id, req.args, req.kwargs, einfo)
[augur]        |   File "/augur/augur/tasks/init/celery_app.py", line 107, in on_failure
[augur]        |     self.augur_handle_task_failure(exc, task_id, repo_git, "core_task_failure")
[augur]        |   File "/augur/augur/tasks/init/celery_app.py", line 90, in augur_handle_task_failure
[augur]        |     repo = session.query(Repo).filter(Repo.repo_git == repo_git).one()
[augur]        |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/orm/query.py", line 2798, in one
[augur]        |     return self._iter().one()  # type: ignore
[augur]        |            ^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/engine/result.py", line 1827, in one
[augur]        |     return self._only_one_row(
[augur]        |            ^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/engine/result.py", line 760, in _only_one_row
[augur]        |     raise exc.NoResultFound(
[augur]        | sqlalchemy.exc.NoResultFound: No row was found when one was required
[augur]        | """
[augur]        | 
[augur]        | The above exception was the direct cause of the following exception:
[augur]        | 
[augur]        | Traceback (most recent call last):
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/billiard/pool.py", line 362, in workloop
[augur]        |     result = (True, prepare_result(fun(*args, **kwargs)))
[augur]        |                                    ^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 651, in fast_trace_task
[augur]        |     R, I, T, Rstr = tasks[task].__trace__(
[augur]        |                     ^^^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 574, in trace_task
[augur]        |     I, _, _, _ = on_error(task_request, exc)
[augur]        |                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 381, in on_error
[augur]        |     R = I.handle_error_state(
[augur]        |         ^^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 175, in handle_error_state
[augur]        |     return {
[augur]        |            ^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 233, in handle_failure
[augur]        |     task.on_failure(exc, req.id, req.args, req.kwargs, einfo)
[augur]        |   File "/augur/augur/tasks/init/celery_app.py", line 107, in on_failure
[augur]        |     self.augur_handle_task_failure(exc, task_id, repo_git, "core_task_failure")
[augur]        |   File "/augur/augur/tasks/init/celery_app.py", line 90, in augur_handle_task_failure
[augur]        |     repo = session.query(Repo).filter(Repo.repo_git == repo_git).one()
[augur]        |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/orm/query.py", line 2798, in one
[augur]        |     return self._iter().one()  # type: ignore
[augur]        |            ^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/engine/result.py", line 1827, in one
[augur]        |     return self._only_one_row(
[augur]        |            ^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/engine/result.py", line 760, in _only_one_row
[augur]        |     raise exc.NoResultFound(
[augur]        | sqlalchemy.exc.NoResultFound: No row was found when one was required

@MoralCode MoralCode added the deployed version Live problems with deployed versions label Nov 19, 2025
@cdolfi
Copy link
Contributor

cdolfi commented Nov 19, 2025

@MoralCode Would changing that query to be based on the repo_src_id fix the issue? Or does github require the URL to get to the repo_src_id?

@MoralCode
Copy link
Contributor Author

Thats what I was thinking, I just have to do it. And I suspect the code for it is buried somewhere in augurs various functions.

@MoralCode
Copy link
Contributor Author

The above stack trace seems to be happening in the error handler for augur's celery tasks. The fact that it is still repo_url based is a different tech debt issue. But the fact that we are getting it comes from us throwing an exception to stop collection on repo move or delete. This is the subject of #3166. ill likely try and solve both in this PR

@sgoggins
Copy link
Member

sgoggins commented Dec 9, 2025

@MoralCode : This one appears ready.

@sgoggins sgoggins added the discussion Seeking active feedback, usually for items under active development label Dec 9, 2025
@MoralCode
Copy link
Contributor Author

This one appears ready.

It largely is, however, I would like to also include a new database table alongside this fix so that, when the repo url gets updated, the old one gets saved in a repo_aliases table so that lookups can be performed using either the old url or new one (making the process of checking if a repo is already in the db when it is added easier)

we can merge this, but data will be lost until that secondary change is in as well, and because that secondary change requires a database migration, its largely blocked on some of the database sync/organizing PRs that are being reviewed currently

…nge it

Discovered by gpt5 via claude

Signed-off-by: Adrian Edwards <adredwar@redhat.com>
Discovered by gpt5 via claude

Signed-off-by: Adrian Edwards <adredwar@redhat.com>
Signed-off-by: Adrian Edwards <adredwar@redhat.com>
…tch them to re-emit celery exceptions.

Signed-off-by: Adrian Edwards <adredwar@redhat.com>
Signed-off-by: Adrian Edwards <adredwar@redhat.com>
…iases table

Signed-off-by: Adrian Edwards <adredwar@redhat.com>
Signed-off-by: Adrian Edwards <adredwar@redhat.com>
Signed-off-by: Adrian Edwards <adredwar@redhat.com>
@MoralCode
Copy link
Contributor Author

seems to be a dependence on using the repo URL for querying:

Ok so this was basically due to how the retry behavior in celery works. I think it was retrying the same task with the same URL, but now that the repo table has a new URL it wasnt finding it.

@MoralCode
Copy link
Contributor Author

OK this change now contains the new table and the code to populate it on move. Therefore it officially fixes #3129 🎉

I tested this with mild effort locally. I am noticing values populate the new tables when the task runs, and was able to fix a few issues with the code, but please someone else also test this.

here is a set of repos I have used that still have active redirects (test one at a time so you can iterate and not struggle to find new ones because you tested them all at once):

@MoralCode MoralCode added the database Related to Augur's unifed data model label Dec 15, 2025
@sgoggins sgoggins moved this to In Progress in Augur TSC Dec 15, 2025
@MoralCode
Copy link
Contributor Author

Maintainers call brought up the concern that, when github is redirecting the old url for a moved repo, another new (and different) repo can be created at the old url, and we would need a way to disambiguate.

I suspect the way that i presented it (i.e. that we would use this table for primary augur operations) was probably wrong. After talking to cali, it sounds like the best plan is just to always use repo source ID for operations, especially repo uniqueness checking.

Essentially this would mean that, newly added repos can use the github API for most cases (valid repo, moved repo, getting the src id), and, if that fails (i.e. repo url is a 404) we can basically fall back to a "best effort" strategy where we then check repo_aliases to see if there is anything, grab the most recent url if there is, and fail if there isnt.

@Ulincsys does that work as far as conflict resolution? the goal would be to essentially treat this aliases table as more of a historic log for analysis/not losing data

@cdolfi
Copy link
Contributor

cdolfi commented Dec 16, 2025

Im sorry if that had not been clear earlier! Absolutely still using the repo src id for operations (my personal agenda is to push for everything that can be based on the src id to be), but the prior urls are stored for historical reference. In the case of 8knot, that info would be integrated into the search bar at some point. Happy to discuss more, Ive thought about this issue a lot

@MoralCode
Copy link
Contributor Author

part of me is a little worried about "use the src id for everything" since it is fundamentally a git-dependent value. I think it makes a ton of sense to use it when interfacing with the outside world (i.e. someone gives us a git url and we need to check if we have it, we should ping the api and check github id for it), but i think for internal stuff (i.e. JOIN queries for data analysis, querying the list of all previously known urls for a repo) should maybe still JOIN on the repo_id.

At this point im not sure whether it makes more sense to also include the src_id in this new aliases table or not.

Im leaning no because i think it makes sense to treat this table as essentially an internal log of historical names for analysis/informational purposes (and a last-ditch effort to resolve a users URL to a repo that makes some sense before showing an error), but not as a primary form of deduplicating repos

@cdolfi
Copy link
Contributor

cdolfi commented Dec 16, 2025

On repo_id: completely agree, I should have been more specific. I meant for checking for uniqueness.
src_id in this new aliases table: I dont think so, just needs the repo_id

@sgoggins
Copy link
Member

se the src id for everything" since it is fundamentally a git-dependent value. I think it makes a ton of sense to use it when interfacing with the outside world (i.e. someone gives us a git url and we need to

the src_id is not Git dependent ... each platform has their own integer identifier that follows a repository even if you change its URL.

@sgoggins
Copy link
Member

Im sorry if that had not been clear earlier! Absolutely still using the repo src id for operations (my personal agenda is to push for everything that can be based on the src id to be), but the prior urls are stored for historical reference. In the case of 8knot, that info would be integrated into the search bar at some point. Happy to discuss more, Ive thought about this issue a lot

Agreed: We need to keep the url's ... they just won't be our primary identifier.

@MoralCode
Copy link
Contributor Author

the src_id is not Git dependent ... each platform has their own integer identifier that follows a repository even if you change its URL.

Do we know whether this applies to all the forges we plan to support (I.e. forgejo, cgit/generic git)?

@MoralCode
Copy link
Contributor Author

Also, where does this conversion leave us as far as this PR?

If we essentially write to the aliases table as if it is a log of each time a repo URL changes, does that reframing of its purpose prevent the issues that would arise (new different repo reusing an old url that was previously a redirect or something) if we used it for operational deduplication?

CC @Ulincsys

@cdolfi
Copy link
Contributor

cdolfi commented Dec 18, 2025

@MoralCode that's how I understand it

@MoralCode
Copy link
Contributor Author

Thinking about this again, I think either framing has the same issue, but the difference is essentially who deals with it.

It sounds like the possibility of the aliases table having two entries (two repo_ids) for the same url is likely rare enough that it can probably be dealt with at the time of data analysis using the collection date to differentiate.

@cdolfi does the collection date seem sufficient for distinguishing possible duplicates for analysis

@cdolfi
Copy link
Contributor

cdolfi commented Dec 18, 2025

@MoralCode yes, im not concerned personally about how to handle the situation where two repos had the same url at different points of time. Already much less difficult than navigating the current situation

@Ulincsys
Copy link
Contributor

As far as aliases for moved repos are concerned, I think there is no reason to suspect that every platform would allow such a feature to exist.

Support for such a feature would need to be implemented on a per-platform basis in Augur, possibly with a Factory or Builder design pattern approach. Here is my line of thinking:

  • For platforms that provide a unique global identifier that exists separately from the repo URL in addition to repo URL redirection for those which have moved, we can implement in the collection process the functionality of aliasing as described.
  • For platforms which do not provide both of the above, we do not implement aliasing.

This is because: there is no reason to suspect that a platform which implements URL redirects must also provide a unique global identifier separate from the URL.

  • It is IMO the simplest and most robust way of doing so, but simplicity and robustness are not universally appreciated.
  • Additionally, we cannot assume that all platforms would be willing to expose such global identifiers to external API clients.

In the event that a repo alias for a supported platform returns a conflicting source ID, that entry can simply be deleted. Though I do consider myself to be a tremendous data hoarder, I see no use in maintaining an infinite changelog for repo location histories.

Please let me know if there are any questions I can answer; @cdolfi, @MoralCode

@chaoss chaoss deleted a comment from github-actions bot Dec 19, 2025

url = to_insert['repo_git']
logger.info(f"Updated repo for {url}\n")
logger.info(f"Updated repo {old_url} to {url} and set alias\n")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the event that an IntegrityError occurs in the above try/catch, this log statement becomes untrue.

Either the conflict must be resolved above before continuing, or a separate log statement must be issued when setting an alias fails.

Copy link
Contributor Author

@MoralCode MoralCode Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yep, given how small this table is, I suspect the most likely integrity error is that the url+repo_id combination already exists in the table, so IMO logging a message and continuing is probably ideal, especially given the best-effort-ness of this table

@MoralCode
Copy link
Contributor Author

MoralCode commented Dec 19, 2025

I think the existence of this aliases table is more of a best effort/nice to have/convenience tier solution anyway.

If we wanted to be thorough about URL history, we would need a way to query that history from somewhere like github since the aliases are only sourced from the URLs people have attempted to load into augur.

I think the best effort ness of this generally lines up with johns point that not every platform is likely to even support it. It helps us not actively lose data when we detect a repo move, but I dont think the goal is to be perfectly comprehensive about every url move - just to provide a basic list of other urls we have previously seen a particular repo at.

@Ulincsys I guess my core question is: is there anything fundamentally problematic that would prevent us merging this? As Cali mentioned this will help improve the experience of managing duplicate urls by a lot, even if its a stepping stone to a better solution later.

@cdolfi
Copy link
Contributor

cdolfi commented Dec 19, 2025

@Ulincsys So personally I do see the value in keeping the historical log of the repo url. Repos can change name/org location but still be known from their prior identifier. It also helpful when doing data analysis around repo donations to foundation and things like that. As well as foundations like apache changes the repo name with their progress through graduation. Having the up to date repo url is definitely the biggest priority and 8knot has had user issues with it for months now but the historical is incredibly useful from an analysis standpoint

@cdolfi
Copy link
Contributor

cdolfi commented Dec 19, 2025

Id also say that id think about it like the contributor alias table. To my knowledge it does not/will not be compatible for every source but still useful in the cases where we can get that information

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Documents unexpected/wrong/buggy behavior database Related to Augur's unifed data model deployed version Live problems with deployed versions disclosed-ai Label for contributions that contain disclosed, reviewed, or responsibly-submitted AI content. discussion Seeking active feedback, usually for items under active development

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

5 participants