Skip to content

Kubler download_portage_snapshot() dl_name $_TODAY timezone difference can have different name to origin  #216

@berney

Description

@berney

The distfiles.gentoo.org hosting the portage snapshots has a portage-latest.tar.xz and portage-YYYYMMDD.tar.xz (and .bz2 files). The portage-latest.tar.xz will be identically to the latest portage-YYYYMMDD.tar.xz.

The function download_portage_snapshot() will download the portage snapshot, with $PORTAGE_DATE defaulting to latest. It will find portage-latest.tar.xz and download it with $dl_name based off $_TODAY. Due to timezone differences this can mean that the file will be named portage-20220914.tar.bz2, when the equivalent file on the server was portage-20220913.tar.bz2.

Later if upstream released a portage-20220914.tar.bz2, if locally $PORTAGE_DATE was set to 20220914, it would not download the new snapshot as it already has a file ("wrongly") named that, but they would be different files.

I'm working on running Kubler in CI/CD, and I'm caching downloads and other files to speed things up. I want to have consistent behaviour between runs. If I run a build before midnight and after midnight, and there's been no changes to the distfiles mirror, the 2nd run of kubler will download the same portage-latest.tar.xz file but name it differently.

In CI/CD I want consistency, I generally want things up-to-date. I like the default to latest, but I want the local name to match the remote name.

I wrote a kubler cmd to get the latest portage filename.

#!/usr/bin/env bash

# Based off lib/core.sh `fetch_stage3_archive_name()`
# Fetch latest portage snapshot archive name/type, returns exit signal 3 if no archive could be found
function fetch_portage_archive_name() {
    __fetch_portage_archive_name=
    local portage_url portage_regex remote_files remote_line remote_date remote_file_type max_cap
    portage_url="http://distfiles.gentoo.org/snapshots/"
    readarray -t remote_files <<< "$(wget -qO- "${portage_url}")"
    remote_date=0
    get_stage3_archive_regex "portage"
    # shellcheck disable=SC2154
    portage_regex="$__get_stage3_archive_regex"
    for remote_line in "${remote_files[@]}"; do
        if [[ "${remote_line}" =~ href=\"${portage_regex}\" ]]; then
            max_cap="${#BASH_REMATCH[@]}"
            is_newer_stage3_date "${remote_date}" "${BASH_REMATCH[$((max_cap-3))]}${BASH_REMATCH[$((max_cap-2))]}" \
                && { remote_date="${BASH_REMATCH[$((max_cap-3))]}${BASH_REMATCH[$((max_cap-2))]}";
                     remote_file_type="${BASH_REMATCH[$((max_cap-1))]}"; }
	    # We keep going to find the latest rather than the first
        fi
    done
    [[ "${remote_date//[!0-9]/}" -eq 0 ]] && return 3
    __fetch_portage_archive_name="portage-${remote_date}.tar.${remote_file_type}"
}

function main() {
    #echo "kubler dir: ${_KUBLER_DIR}"
    #echo "current namespace: ${_NAMESPACE_DIR}"

    #echo "Finding latest portage"
    # We are abusing `fetch_stage3_archive_name()`
    ## shellcheck disable=SC2034
    #STAGE3_BASE="portage"
    ## shellcheck disable=SC2034
    #ARCH_URL="http://distfiles.gentoo.org/snapshots/"
    ## This will find the first
    #fetch_stage3_archive_name
    ## shellcheck disable=SC2154
    #echo "$__fetch_stage3_archive_name"

    # This will find the latest
    fetch_portage_archive_name
    echo "$__fetch_portage_archive_name"
}

main "$@"

This works, and I could use it to set the $PORTAGE_DATE, to get the consistent behaviour.

$ kubler portage
portage-20220907.tar.bz2 <-- fetch_stage3_archive_name abuse
portage-20220914.tar.bz2 <-- fetch_portage_archive_name variant

I think it would be good to change Kubler's behaviour to download the latest YYYYMMDD portage snapshot rather than downloading and renaming portage-latest.

I might be worth refactoring fetch_stage3_archive_name() into a generic version, optionally exiting on first match (current behaviour), or continuing to latest match (needed for portage snapshots), and generalising the name of get_stage3_archive_regex().

I would also like the option to prefer the archive type bz2 vs xz.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions