Fixes for repo url update on move detection #3391

MoralCode · 2025-11-11T18:36:40Z

Description
@cdolfi has reported an issue (#3129) where repos that have moved and are redirecting when visited in a browser are not having their URLs updated to reflect the move.

Using an AI tool to look over the relevant task and identify issues for review, it identified that the hit_api function that was being used for the API calls was internally passing follow_redirects=True to the underlying HTTP library.

This explains why the repo urls werent being updated - because all github calls were automatically following the redirect, meaning the check for response_code == 301 later in the code would practically never get called.

This is related to #3129 (but also slightly exacerbates it because it doesnt yet store the old url)

Notes for Reviewers
I have yet to test this locally. Trying to see if i can replicate the issue, although i have a lot of repos in my local instance now that I should probably clear out....

Signed commits

Yes, I signed my commits.

Generative AI disclosure

This contribution was assisted or created by Generative AI tools.
- GPT-5 was used through the chat feature of Cursor to provide an initial summary of problems that were then reviewed by me, leading to the set of fixes present in this PR. The only generated code in this pr is in commit 12d7be2, where Cursor helped suggest some code to log and raise an error if, for some crazy reason, a 301 response comes back without a Location header. This code was reviewed and built upon by me (to make it casing-agnostic) before submitting.

sgoggins · 2025-11-11T18:46:24Z

@MoralCode : Your root cause analysis sounds DEAD ON, and explains why this appears not to be working in the case of automatic moving. This will fix that issue.

sgoggins

LGTM!

MoralCode · 2025-11-13T16:05:19Z

Still waiting on testing this to 100% confirm that this will update the repo_git url

sgoggins · 2025-11-13T18:20:40Z

Still waiting on testing this to 100% confirm that this will update the repo_git url

Will hold off on marking it ready and merging until you green light it then.

MoralCode · 2025-11-19T14:16:02Z

Current issues with this: seems to be a dependence on using the repo URL for querying:

[augur]        | 2025-11-18 23:04:30 cd3ac88591d1 augur_collection_monitor[276] INFO Setting github repo core status to collecting for repo: https://github.com/moralcode/classclockapi
[augur]        | [2025-11-18 23:04:30,464: INFO/MainProcess] Task augur.tasks.github.detect_move.tasks.detect_github_repo_move_core[38307c62-7947-495d-ac2f-35d8bd2bb241] received
[augur]        | 2025-11-18 23:04:30 cd3ac88591d1 detect_github_repo_move_core[280] INFO Starting repo_move operation with https://github.com/moralcode/classclockapi
[augur]        | 2025-11-18 23:04:30 cd3ac88591d1 augur_collection_monitor[276] INFO Starting collection on 0 secondary repos
[augur]        | 2025-11-18 23:04:30 cd3ac88591d1 augur_collection_monitor[276] INFO Starting collection on 0 facade repos
[augur]        | 2025-11-18 23:04:30 cd3ac88591d1 detect_github_repo_move_core[280] INFO Pinging repo: https://github.com/moralcode/classclockapi
[augur]        | [2025-11-18 23:04:30,474: INFO/ForkPoolWorker-2] Task augur.tasks.start_tasks.augur_collection_monitor[bb2d2553-8919-4da8-ac60-e628d826b9c3] succeeded in 0.09832821798045188s: None
[augur]        | 2025-11-18 23:04:30 cd3ac88591d1 detect_github_repo_move_core[280] INFO Retrieved 1 github api keys for use
[augur]        | 2025-11-18 23:04:30 cd3ac88591d1 detect_github_repo_move_core[280] DEBUG Key used for request (masked): ghp_EA******kQm
[augur]        | 2025-11-18 23:04:30 cd3ac88591d1 analyze_commits_in_parallel[281] DEBUG Analyzing commit 1f9454ebee957364e9073378ad2562b37f9c0394 for repo_id=1
[augur]        | 2025-11-18 23:04:30 cd3ac88591d1 detect_github_repo_move_core[280] DEBUG Key used for request (masked): ghp_EA******kQm
[augur]        | 2025-11-18 23:04:30 cd3ac88591d1 detect_github_repo_move_core[280] INFO Updated repo for https://github.com/classclock/API
[augur]        | 
[augur]        | [2025-11-18 23:04:30,942: WARNING/ForkPoolWorker-2] 2025-11-18 23:04:30 cd3ac88591d1 core_task_failure[280] ERROR Task 38307c62-7947-495d-ac2f-35d8bd2bb241 raised exception: ERROR: Repo has moved! Resetting Collection!
[augur]        |  Traceback: Traceback (most recent call last):
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 453, in trace_task
[augur]        |     R = retval = fun(*args, **kwargs)
[augur]        |                  ^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 736, in __protected_call__
[augur]        |     return self.run(*args, **kwargs)
[augur]        |            ^^^^^^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/augur/tasks/github/detect_move/tasks.py", line 27, in detect_github_repo_move_core
[augur]        |     ping_github_for_repo_move(session, key_auth, repo, logger)
[augur]        |   File "/augur/augur/tasks/github/detect_move/core.py", line 89, in ping_github_for_repo_move
[augur]        |     raise Exception("ERROR: Repo has moved! Resetting Collection!")
[augur]        | Exception: ERROR: Repo has moved! Resetting Collection!
[augur]        | [2025-11-18 23:04:30,943: WARNING/ForkPoolWorker-2] 2025-11-18 23:04:30 cd3ac88591d1 core_task_failure[280] INFO Repo git: https://github.com/moralcode/classclockapi
[augur]        | [2025-11-18 23:04:30,955: WARNING/ForkPoolWorker-2] /augur/.venv/lib/python3.11/site-packages/celery/app/trace.py:662: RuntimeWarning: Exception raised outside body: NoResultFound('No row was found when one was required'):
[augur]        | Traceback (most recent call last):
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 453, in trace_task
[augur]        |     R = retval = fun(*args, **kwargs)
[augur]        |                  ^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 736, in __protected_call__
[augur]        |     return self.run(*args, **kwargs)
[augur]        |            ^^^^^^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/augur/tasks/github/detect_move/tasks.py", line 27, in detect_github_repo_move_core
[augur]        |     ping_github_for_repo_move(session, key_auth, repo, logger)
[augur]        |   File "/augur/augur/tasks/github/detect_move/core.py", line 89, in ping_github_for_repo_move
[augur]        |     raise Exception("ERROR: Repo has moved! Resetting Collection!")
[augur]        | Exception: ERROR: Repo has moved! Resetting Collection!
[augur]        | 
[augur]        | During handling of the above exception, another exception occurred:
[augur]        | 
[augur]        | Traceback (most recent call last):
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 470, in trace_task
[augur]        |     I, R, state, retval = on_error(task_request, exc)
[augur]        |                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 381, in on_error
[augur]        |     R = I.handle_error_state(
[augur]        |         ^^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 175, in handle_error_state
[augur]        |     return {
[augur]        |            ^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 233, in handle_failure
[augur]        |     task.on_failure(exc, req.id, req.args, req.kwargs, einfo)
[augur]        |   File "/augur/augur/tasks/init/celery_app.py", line 107, in on_failure
[augur]        |     self.augur_handle_task_failure(exc, task_id, repo_git, "core_task_failure")
[augur]        |   File "/augur/augur/tasks/init/celery_app.py", line 90, in augur_handle_task_failure
[augur]        |     repo = session.query(Repo).filter(Repo.repo_git == repo_git).one()
[augur]        |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/orm/query.py", line 2798, in one
[augur]        |     return self._iter().one()  # type: ignore
[augur]        |            ^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/engine/result.py", line 1827, in one
[augur]        |     return self._only_one_row(
[augur]        |            ^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/engine/result.py", line 760, in _only_one_row
[augur]        |     raise exc.NoResultFound(
[augur]        | sqlalchemy.exc.NoResultFound: No row was found when one was required
[augur]        | 
[augur]        |   warn(RuntimeWarning(
[augur]        | 
[augur]        | [2025-11-18 23:04:30,983: WARNING/ForkPoolWorker-2] 2025-11-18 23:04:30,983,983ms [PID: 280] core_task_failure [ERROR] Task 38307c62-7947-495d-ac2f-35d8bd2bb241 raised exception: No row was found when one was required
[augur]        |  Traceback: Traceback (most recent call last):
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 453, in trace_task
[augur]        |     R = retval = fun(*args, **kwargs)
[augur]        |                  ^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 736, in __protected_call__
[augur]        |     return self.run(*args, **kwargs)
[augur]        |            ^^^^^^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/augur/tasks/github/detect_move/tasks.py", line 27, in detect_github_repo_move_core
[augur]        |     ping_github_for_repo_move(session, key_auth, repo, logger)
[augur]        |   File "/augur/augur/tasks/github/detect_move/core.py", line 89, in ping_github_for_repo_move
[augur]        |     raise Exception("ERROR: Repo has moved! Resetting Collection!")
[augur]        | Exception: ERROR: Repo has moved! Resetting Collection!
[augur]        | 
[augur]        | During handling of the above exception, another exception occurred:
[augur]        | 
[augur]        | Traceback (most recent call last):
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 470, in trace_task
[augur]        |     I, R, state, retval = on_error(task_request, exc)
[augur]        |                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 381, in on_error
[augur]        |     R = I.handle_error_state(
[augur]        |         ^^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 175, in handle_error_state
[augur]        |     return {
[augur]        |            ^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 233, in handle_failure
[augur]        |     task.on_failure(exc, req.id, req.args, req.kwargs, einfo)
[augur]        |   File "/augur/augur/tasks/init/celery_app.py", line 107, in on_failure
[augur]        |     self.augur_handle_task_failure(exc, task_id, repo_git, "core_task_failure")
[augur]        |   File "/augur/augur/tasks/init/celery_app.py", line 90, in augur_handle_task_failure
[augur]        |     repo = session.query(Repo).filter(Repo.repo_git == repo_git).one()
[augur]        |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/orm/query.py", line 2798, in one
[augur]        |     return self._iter().one()  # type: ignore
[augur]        |            ^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/engine/result.py", line 1827, in one
[augur]        |     return self._only_one_row(
[augur]        |            ^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/engine/result.py", line 760, in _only_one_row
[augur]        |     raise exc.NoResultFound(
[augur]        | sqlalchemy.exc.NoResultFound: No row was found when one was required
[augur]        | [2025-11-18 23:04:30,983: WARNING/ForkPoolWorker-2] 2025-11-18 23:04:30 cd3ac88591d1 core_task_failure[280] ERROR Task 38307c62-7947-495d-ac2f-35d8bd2bb241 raised exception: No row was found when one was required
[augur]        |  Traceback: Traceback (most recent call last):
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 453, in trace_task
[augur]        |     R = retval = fun(*args, **kwargs)
[augur]        |                  ^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 736, in __protected_call__
[augur]        |     return self.run(*args, **kwargs)
[augur]        |            ^^^^^^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/augur/tasks/github/detect_move/tasks.py", line 27, in detect_github_repo_move_core
[augur]        |     ping_github_for_repo_move(session, key_auth, repo, logger)
[augur]        |   File "/augur/augur/tasks/github/detect_move/core.py", line 89, in ping_github_for_repo_move
[augur]        |     raise Exception("ERROR: Repo has moved! Resetting Collection!")
[augur]        | Exception: ERROR: Repo has moved! Resetting Collection!
[augur]        | 
[augur]        | During handling of the above exception, another exception occurred:
[augur]        | 
[augur]        | Traceback (most recent call last):
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 470, in trace_task
[augur]        |     I, R, state, retval = on_error(task_request, exc)
[augur]        |                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 381, in on_error
[augur]        |     R = I.handle_error_state(
[augur]        |         ^^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 175, in handle_error_state
[augur]        |     return {
[augur]        |            ^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 233, in handle_failure
[augur]        |     task.on_failure(exc, req.id, req.args, req.kwargs, einfo)
[augur]        |   File "/augur/augur/tasks/init/celery_app.py", line 107, in on_failure
[augur]        |     self.augur_handle_task_failure(exc, task_id, repo_git, "core_task_failure")
[augur]        |   File "/augur/augur/tasks/init/celery_app.py", line 90, in augur_handle_task_failure
[augur]        |     repo = session.query(Repo).filter(Repo.repo_git == repo_git).one()
[augur]        |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/orm/query.py", line 2798, in one
[augur]        |     return self._iter().one()  # type: ignore
[augur]        |            ^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/engine/result.py", line 1827, in one
[augur]        |     return self._only_one_row(
[augur]        |            ^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/engine/result.py", line 760, in _only_one_row
[augur]        |     raise exc.NoResultFound(
[augur]        | sqlalchemy.exc.NoResultFound: No row was found when one was required
[augur]        | [2025-11-18 23:04:30,983: WARNING/ForkPoolWorker-2] 2025-11-18 23:04:30,983,983ms [PID: 280] core_task_failure [INFO] Repo git: https://github.com/moralcode/classclockapi
[augur]        | [2025-11-18 23:04:30,983: WARNING/ForkPoolWorker-2] 2025-11-18 23:04:30 cd3ac88591d1 core_task_failure[280] INFO Repo git: https://github.com/moralcode/classclockapi
[augur]        | [2025-11-18 23:04:30,999: ERROR/MainProcess] Task handler raised error: NoResultFound('No row was found when one was required')
[augur]        | billiard.einfo.RemoteTraceback: 
[augur]        | """
[augur]        | Traceback (most recent call last):
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 453, in trace_task
[augur]        |     R = retval = fun(*args, **kwargs)
[augur]        |                  ^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 736, in __protected_call__
[augur]        |     return self.run(*args, **kwargs)
[augur]        |            ^^^^^^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/augur/tasks/github/detect_move/tasks.py", line 27, in detect_github_repo_move_core
[augur]        |     ping_github_for_repo_move(session, key_auth, repo, logger)
[augur]        |   File "/augur/augur/tasks/github/detect_move/core.py", line 89, in ping_github_for_repo_move
[augur]        |     raise Exception("ERROR: Repo has moved! Resetting Collection!")
[augur]        | Exception: ERROR: Repo has moved! Resetting Collection!
[augur]        | 
[augur]        | During handling of the above exception, another exception occurred:
[augur]        | 
[augur]        | Traceback (most recent call last):
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 470, in trace_task
[augur]        |     I, R, state, retval = on_error(task_request, exc)
[augur]        |                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 381, in on_error
[augur]        |     R = I.handle_error_state(
[augur]        |         ^^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 175, in handle_error_state
[augur]        |     return {
[augur]        |            ^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 233, in handle_failure
[augur]        |     task.on_failure(exc, req.id, req.args, req.kwargs, einfo)
[augur]        |   File "/augur/augur/tasks/init/celery_app.py", line 107, in on_failure
[augur]        |     self.augur_handle_task_failure(exc, task_id, repo_git, "core_task_failure")
[augur]        |   File "/augur/augur/tasks/init/celery_app.py", line 90, in augur_handle_task_failure
[augur]        |     repo = session.query(Repo).filter(Repo.repo_git == repo_git).one()
[augur]        |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/orm/query.py", line 2798, in one
[augur]        |     return self._iter().one()  # type: ignore
[augur]        |            ^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/engine/result.py", line 1827, in one
[augur]        |     return self._only_one_row(
[augur]        |            ^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/engine/result.py", line 760, in _only_one_row
[augur]        |     raise exc.NoResultFound(
[augur]        | sqlalchemy.exc.NoResultFound: No row was found when one was required
[augur]        | 
[augur]        | During handling of the above exception, another exception occurred:
[augur]        | 
[augur]        | Traceback (most recent call last):
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/billiard/pool.py", line 362, in workloop
[augur]        |     result = (True, prepare_result(fun(*args, **kwargs)))
[augur]        |                                    ^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 651, in fast_trace_task
[augur]        |     R, I, T, Rstr = tasks[task].__trace__(
[augur]        |                     ^^^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 574, in trace_task
[augur]        |     I, _, _, _ = on_error(task_request, exc)
[augur]        |                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 381, in on_error
[augur]        |     R = I.handle_error_state(
[augur]        |         ^^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 175, in handle_error_state
[augur]        |     return {
[augur]        |            ^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 233, in handle_failure
[augur]        |     task.on_failure(exc, req.id, req.args, req.kwargs, einfo)
[augur]        |   File "/augur/augur/tasks/init/celery_app.py", line 107, in on_failure
[augur]        |     self.augur_handle_task_failure(exc, task_id, repo_git, "core_task_failure")
[augur]        |   File "/augur/augur/tasks/init/celery_app.py", line 90, in augur_handle_task_failure
[augur]        |     repo = session.query(Repo).filter(Repo.repo_git == repo_git).one()
[augur]        |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/orm/query.py", line 2798, in one
[augur]        |     return self._iter().one()  # type: ignore
[augur]        |            ^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/engine/result.py", line 1827, in one
[augur]        |     return self._only_one_row(
[augur]        |            ^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/engine/result.py", line 760, in _only_one_row
[augur]        |     raise exc.NoResultFound(
[augur]        | sqlalchemy.exc.NoResultFound: No row was found when one was required
[augur]        | """
[augur]        | 
[augur]        | The above exception was the direct cause of the following exception:
[augur]        | 
[augur]        | Traceback (most recent call last):
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/billiard/pool.py", line 362, in workloop
[augur]        |     result = (True, prepare_result(fun(*args, **kwargs)))
[augur]        |                                    ^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 651, in fast_trace_task
[augur]        |     R, I, T, Rstr = tasks[task].__trace__(
[augur]        |                     ^^^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 574, in trace_task
[augur]        |     I, _, _, _ = on_error(task_request, exc)
[augur]        |                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 381, in on_error
[augur]        |     R = I.handle_error_state(
[augur]        |         ^^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 175, in handle_error_state
[augur]        |     return {
[augur]        |            ^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/celery/app/trace.py", line 233, in handle_failure
[augur]        |     task.on_failure(exc, req.id, req.args, req.kwargs, einfo)
[augur]        |   File "/augur/augur/tasks/init/celery_app.py", line 107, in on_failure
[augur]        |     self.augur_handle_task_failure(exc, task_id, repo_git, "core_task_failure")
[augur]        |   File "/augur/augur/tasks/init/celery_app.py", line 90, in augur_handle_task_failure
[augur]        |     repo = session.query(Repo).filter(Repo.repo_git == repo_git).one()
[augur]        |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/orm/query.py", line 2798, in one
[augur]        |     return self._iter().one()  # type: ignore
[augur]        |            ^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/engine/result.py", line 1827, in one
[augur]        |     return self._only_one_row(
[augur]        |            ^^^^^^^^^^^^^^^^^^^
[augur]        |   File "/augur/.venv/lib/python3.11/site-packages/sqlalchemy/engine/result.py", line 760, in _only_one_row
[augur]        |     raise exc.NoResultFound(
[augur]        | sqlalchemy.exc.NoResultFound: No row was found when one was required

cdolfi · 2025-11-19T16:01:10Z

@MoralCode Would changing that query to be based on the repo_src_id fix the issue? Or does github require the URL to get to the repo_src_id?

MoralCode · 2025-11-19T16:11:52Z

Thats what I was thinking, I just have to do it. And I suspect the code for it is buried somewhere in augurs various functions.

MoralCode · 2025-11-19T16:28:07Z

The above stack trace seems to be happening in the error handler for augur's celery tasks. The fact that it is still repo_url based is a different tech debt issue. But the fact that we are getting it comes from us throwing an exception to stop collection on repo move or delete. This is the subject of #3166. ill likely try and solve both in this PR

sgoggins · 2025-12-09T22:04:02Z

@MoralCode : This one appears ready.

MoralCode · 2025-12-11T14:37:01Z

This one appears ready.

It largely is, however, I would like to also include a new database table alongside this fix so that, when the repo url gets updated, the old one gets saved in a repo_aliases table so that lookups can be performed using either the old url or new one (making the process of checking if a repo is already in the db when it is added easier)

we can merge this, but data will be lost until that secondary change is in as well, and because that secondary change requires a database migration, its largely blocked on some of the database sync/organizing PRs that are being reviewed currently

…nge it Discovered by gpt5 via claude Signed-off-by: Adrian Edwards <adredwar@redhat.com>

Discovered by gpt5 via claude Signed-off-by: Adrian Edwards <adredwar@redhat.com>

Signed-off-by: Adrian Edwards <adredwar@redhat.com>

…tch them to re-emit celery exceptions. Signed-off-by: Adrian Edwards <adredwar@redhat.com>

Signed-off-by: Adrian Edwards <adredwar@redhat.com>

…iases table Signed-off-by: Adrian Edwards <adredwar@redhat.com>

Signed-off-by: Adrian Edwards <adredwar@redhat.com>

MoralCode · 2025-12-15T22:09:24Z

seems to be a dependence on using the repo URL for querying:

Ok so this was basically due to how the retry behavior in celery works. I think it was retrying the same task with the same URL, but now that the repo table has a new URL it wasnt finding it.

MoralCode · 2025-12-15T22:13:08Z

OK this change now contains the new table and the code to populate it on move. Therefore it officially fixes #3129 🎉

I tested this with mild effort locally. I am noticing values populate the new tables when the task runs, and was able to fix a few issues with the code, but please someone else also test this.

here is a set of repos I have used that still have active redirects (test one at a time so you can iterate and not struggle to find new ones because you tested them all at once):

MoralCode · 2025-12-16T14:32:18Z

Maintainers call brought up the concern that, when github is redirecting the old url for a moved repo, another new (and different) repo can be created at the old url, and we would need a way to disambiguate.

I suspect the way that i presented it (i.e. that we would use this table for primary augur operations) was probably wrong. After talking to cali, it sounds like the best plan is just to always use repo source ID for operations, especially repo uniqueness checking.

Essentially this would mean that, newly added repos can use the github API for most cases (valid repo, moved repo, getting the src id), and, if that fails (i.e. repo url is a 404) we can basically fall back to a "best effort" strategy where we then check repo_aliases to see if there is anything, grab the most recent url if there is, and fail if there isnt.

@Ulincsys does that work as far as conflict resolution? the goal would be to essentially treat this aliases table as more of a historic log for analysis/not losing data

cdolfi · 2025-12-16T14:44:37Z

Im sorry if that had not been clear earlier! Absolutely still using the repo src id for operations (my personal agenda is to push for everything that can be based on the src id to be), but the prior urls are stored for historical reference. In the case of 8knot, that info would be integrated into the search bar at some point. Happy to discuss more, Ive thought about this issue a lot

MoralCode · 2025-12-16T18:37:35Z

part of me is a little worried about "use the src id for everything" since it is fundamentally a git-dependent value. I think it makes a ton of sense to use it when interfacing with the outside world (i.e. someone gives us a git url and we need to check if we have it, we should ping the api and check github id for it), but i think for internal stuff (i.e. JOIN queries for data analysis, querying the list of all previously known urls for a repo) should maybe still JOIN on the repo_id.

At this point im not sure whether it makes more sense to also include the src_id in this new aliases table or not.

Im leaning no because i think it makes sense to treat this table as essentially an internal log of historical names for analysis/informational purposes (and a last-ditch effort to resolve a users URL to a repo that makes some sense before showing an error), but not as a primary form of deduplicating repos

cdolfi · 2025-12-16T18:57:38Z

On repo_id: completely agree, I should have been more specific. I meant for checking for uniqueness.
src_id in this new aliases table: I dont think so, just needs the repo_id

sgoggins · 2025-12-17T16:24:32Z

se the src id for everything" since it is fundamentally a git-dependent value. I think it makes a ton of sense to use it when interfacing with the outside world (i.e. someone gives us a git url and we need to

the src_id is not Git dependent ... each platform has their own integer identifier that follows a repository even if you change its URL.

sgoggins · 2025-12-17T16:25:19Z

Im sorry if that had not been clear earlier! Absolutely still using the repo src id for operations (my personal agenda is to push for everything that can be based on the src id to be), but the prior urls are stored for historical reference. In the case of 8knot, that info would be integrated into the search bar at some point. Happy to discuss more, Ive thought about this issue a lot

Agreed: We need to keep the url's ... they just won't be our primary identifier.

MoralCode · 2025-12-18T07:23:14Z

the src_id is not Git dependent ... each platform has their own integer identifier that follows a repository even if you change its URL.

Do we know whether this applies to all the forges we plan to support (I.e. forgejo, cgit/generic git)?

MoralCode · 2025-12-18T07:26:44Z

Also, where does this conversion leave us as far as this PR?

If we essentially write to the aliases table as if it is a log of each time a repo URL changes, does that reframing of its purpose prevent the issues that would arise (new different repo reusing an old url that was previously a redirect or something) if we used it for operational deduplication?

CC @Ulincsys

cdolfi · 2025-12-18T13:52:36Z

@MoralCode that's how I understand it

MoralCode · 2025-12-18T17:10:26Z

Thinking about this again, I think either framing has the same issue, but the difference is essentially who deals with it.

It sounds like the possibility of the aliases table having two entries (two repo_ids) for the same url is likely rare enough that it can probably be dealt with at the time of data analysis using the collection date to differentiate.

@cdolfi does the collection date seem sufficient for distinguishing possible duplicates for analysis

cdolfi · 2025-12-18T17:21:29Z

@MoralCode yes, im not concerned personally about how to handle the situation where two repos had the same url at different points of time. Already much less difficult than navigating the current situation

Ulincsys · 2025-12-19T03:09:01Z

As far as aliases for moved repos are concerned, I think there is no reason to suspect that every platform would allow such a feature to exist.

Support for such a feature would need to be implemented on a per-platform basis in Augur, possibly with a Factory or Builder design pattern approach. Here is my line of thinking:

For platforms that provide a unique global identifier that exists separately from the repo URL in addition to repo URL redirection for those which have moved, we can implement in the collection process the functionality of aliasing as described.
For platforms which do not provide both of the above, we do not implement aliasing.

This is because: there is no reason to suspect that a platform which implements URL redirects must also provide a unique global identifier separate from the URL.

It is IMO the simplest and most robust way of doing so, but simplicity and robustness are not universally appreciated.
Additionally, we cannot assume that all platforms would be willing to expose such global identifiers to external API clients.

In the event that a repo alias for a supported platform returns a conflicting source ID, that entry can simply be deleted. Though I do consider myself to be a tremendous data hoarder, I see no use in maintaining an infinite changelog for repo location histories.

Please let me know if there are any questions I can answer; @cdolfi, @MoralCode

Ulincsys · 2025-12-19T03:12:35Z

augur/tasks/github/detect_move/core.py


    url = to_insert['repo_git']
-    logger.info(f"Updated repo for {url}\n")
+    logger.info(f"Updated repo {old_url} to {url} and set alias\n")


In the event that an IntegrityError occurs in the above try/catch, this log statement becomes untrue.

Either the conflict must be resolved above before continuing, or a separate log statement must be issued when setting an alias fails.

Ah yep, given how small this table is, I suspect the most likely integrity error is that the url+repo_id combination already exists in the table, so IMO logging a message and continuing is probably ideal, especially given the best-effort-ness of this table

MoralCode · 2025-12-19T05:43:55Z

I think the existence of this aliases table is more of a best effort/nice to have/convenience tier solution anyway.

If we wanted to be thorough about URL history, we would need a way to query that history from somewhere like github since the aliases are only sourced from the URLs people have attempted to load into augur.

I think the best effort ness of this generally lines up with johns point that not every platform is likely to even support it. It helps us not actively lose data when we detect a repo move, but I dont think the goal is to be perfectly comprehensive about every url move - just to provide a basic list of other urls we have previously seen a particular repo at.

@Ulincsys I guess my core question is: is there anything fundamentally problematic that would prevent us merging this? As Cali mentioned this will help improve the experience of managing duplicate urls by a lot, even if its a stepping stone to a better solution later.

cdolfi · 2025-12-19T18:07:20Z

@Ulincsys So personally I do see the value in keeping the historical log of the repo url. Repos can change name/org location but still be known from their prior identifier. It also helpful when doing data analysis around repo donations to foundation and things like that. As well as foundations like apache changes the repo name with their progress through graduation. Having the up to date repo url is definitely the biggest priority and 8knot has had user issues with it for months now but the historical is incredibly useful from an analysis standpoint

cdolfi · 2025-12-19T18:08:20Z

Id also say that id think about it like the contributor alias table. To my knowledge it does not/will not be compatible for every source but still useful in the cases where we can get that information

MoralCode added bug Documents unexpected/wrong/buggy behavior disclosed-ai Label for contributions that contain disclosed, reviewed, or responsibly-submitted AI content. labels Nov 11, 2025

MoralCode mentioned this pull request Nov 11, 2025

Proposal: Add generative AI disclosure checkbox next to the DCO checkbox in the PR template #3371

Open

1 task

MoralCode force-pushed the move_detection branch 2 times, most recently from c362bb9 to 4eaae5f Compare November 11, 2025 20:17

MoralCode added this to the v0.92.0 Release milestone Nov 11, 2025

MoralCode added this to Augur TSC Nov 11, 2025

MoralCode self-assigned this Nov 11, 2025

MoralCode force-pushed the move_detection branch from 4eaae5f to f7a6249 Compare November 12, 2025 20:11

sgoggins previously approved these changes Nov 13, 2025

View reviewed changes

sgoggins added the ready Items tested and seeking additional approvals or a merge. Usually for items under active development label Nov 13, 2025

MoralCode removed the ready Items tested and seeking additional approvals or a merge. Usually for items under active development label Nov 13, 2025

MoralCode force-pushed the move_detection branch from f7a6249 to cff5aa7 Compare November 17, 2025 15:02

MoralCode dismissed sgoggins’s stale review via 2bd7482 November 18, 2025 14:50

MoralCode added the deployed version Live problems with deployed versions label Nov 19, 2025

MoralCode mentioned this pull request Nov 19, 2025

celery task failure handling still looks up repositories by their git url #3421

Open

MoralCode force-pushed the move_detection branch from 2bd7482 to 70e1ebd Compare December 1, 2025 20:15

sgoggins added the discussion Seeking active feedback, usually for items under active development label Dec 9, 2025

MoralCode mentioned this pull request Dec 12, 2025

re-check handling of ResourceGoneException when issues are disabled #3461

Open

MoralCode added 2 commits December 15, 2025 17:07

pass through follow_redirects parameter in hit_api so clients can cha…

23df46e

…nge it Discovered by gpt5 via claude Signed-off-by: Adrian Edwards <adredwar@redhat.com>

dont follow redirects when checking github move

a393e5b

Discovered by gpt5 via claude Signed-off-by: Adrian Edwards <adredwar@redhat.com>

MoralCode added 2 commits December 15, 2025 17:07

ok turns out the limited dict stuff broke and is causing nulls in the db

62926f0

Signed-off-by: Adrian Edwards <adredwar@redhat.com>

use custom exception types to bubble the exceptions up a level and ca…

14eb943

…tch them to re-emit celery exceptions. Signed-off-by: Adrian Edwards <adredwar@redhat.com>

MoralCode force-pushed the move_detection branch from 70e1ebd to d29e299 Compare December 15, 2025 22:07

MoralCode added 4 commits December 15, 2025 17:07

First draft of new database table for repo_aliases

86e1348

Signed-off-by: Adrian Edwards <adredwar@redhat.com>

add code in update_repo_with_dict that adds values to the new repo_al…

143bb02

…iases table Signed-off-by: Adrian Edwards <adredwar@redhat.com>

seems like retry needs a value passed into it.

99ff6d5

Signed-off-by: Adrian Edwards <adredwar@redhat.com>

Add migration for new table

e9b3f5a

Signed-off-by: Adrian Edwards <adredwar@redhat.com>

MoralCode force-pushed the move_detection branch from d29e299 to e9b3f5a Compare December 15, 2025 22:07

MoralCode added the database Related to Augur's unifed data model label Dec 15, 2025

sgoggins moved this to In Progress in Augur TSC Dec 15, 2025

chaoss deleted a comment from github-actions bot Dec 19, 2025

Ulincsys reviewed Dec 19, 2025

View reviewed changes

MoralCode mentioned this pull request Dec 19, 2025

[Tasks] Fix error handling for deleted repositories in GitHub move detection #3248

Closed

1 task

Fixes for repo url update on move detection #3391

Are you sure you want to change the base?

Fixes for repo url update on move detection #3391

Conversation

MoralCode commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sgoggins commented Nov 11, 2025

Uh oh!

sgoggins left a comment

Choose a reason for hiding this comment

Uh oh!

MoralCode commented Nov 13, 2025

Uh oh!

sgoggins commented Nov 13, 2025

Uh oh!

MoralCode commented Nov 19, 2025

Uh oh!

cdolfi commented Nov 19, 2025

Uh oh!

MoralCode commented Nov 19, 2025

Uh oh!

MoralCode commented Nov 19, 2025

Uh oh!

sgoggins commented Dec 9, 2025

Uh oh!

MoralCode commented Dec 11, 2025

Uh oh!

MoralCode commented Dec 15, 2025

Uh oh!

MoralCode commented Dec 15, 2025

Uh oh!

MoralCode commented Dec 16, 2025

Uh oh!

cdolfi commented Dec 16, 2025

Uh oh!

MoralCode commented Dec 16, 2025

Uh oh!

cdolfi commented Dec 16, 2025

Uh oh!

sgoggins commented Dec 17, 2025

Uh oh!

sgoggins commented Dec 17, 2025

Uh oh!

MoralCode commented Dec 18, 2025

Uh oh!

MoralCode commented Dec 18, 2025

Uh oh!

cdolfi commented Dec 18, 2025

Uh oh!

MoralCode commented Dec 18, 2025

Uh oh!

cdolfi commented Dec 18, 2025

Uh oh!

Ulincsys commented Dec 19, 2025

Uh oh!

Ulincsys Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

MoralCode Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MoralCode commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cdolfi commented Dec 19, 2025

Uh oh!

cdolfi commented Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

MoralCode commented Nov 11, 2025 •

edited

Loading

MoralCode Dec 19, 2025 •

edited

Loading

MoralCode commented Dec 19, 2025 •

edited

Loading