Tools used for migrating full Subversion repositorie to GitHub, and to filter files from the big xml repository and create individual git repos with grouping of hybrid files.
- Install git / git bash
- Install git svn
- Install python > 3.9
- Install svn command line tools
- Install git filter-repo
- Use a venv for python -> pip install requests
- Uses git svn clone to migrate history from svn to git repository - svn revisions to github commits.
- Authors files needs to be mapped with all new github users in the right format. Refer to example file authors.txt for the format requested by github.
- After migration run the script that validates number of files migrated. Specifically XML files count comparison from svn against new git repo. Use script: compare-files.ps1
- Reconciliate migration - check if nr of files matches and nr of revisions are OK. (use the script svn_list.ps1)
Count XML files in SVN:
svn list -R <repo_url> | grep '\.xml$' | wc -l
Count XML files tracked in Git repository:
git ls-files '*.xml' | wc -l- Install git filter-repo tool. Instructions:
# 1- Install python from windows store / ADD PYTHON TO PATH
# 2- Install git filter-repo (download source code zip) https://github.com/newren/git-filter-repo/releases/tag/v2.45.0
# 3- Add Environmental system variable to env variables of windows system:
# & "C:\Users\Virginia\AppData\Local\Programs\Python\Python313\Scripts\git-filter-repo.exe"- The following is a simple script containing commands from git filter-repo tool.
- It filters all XML files only (mantains its history) from the whole massive new git/github repo.
- Is recommended to have svn command tools installed for interacting with svn server.
Create a branch of the actual repo (filtering) containing only XML files / Check that the number of FILES match SVN number of files (SVN count of files for this Friday: 3718) - command used:
svn list -R <repo_url> | grep '\.xml$' | wc -l)## Filter to keep only XML files
git filter-repo --path-glob '*.xml'
## Clean up
git reflog expire --expire=now --all
git gc --prune=now --aggressive
## Add remote and push changes
## git remote remove origin
git remote add origin https://github.com/Genesis-Empresarial-Demo/Interface.XML.git
git push --set-upstream origin <new_branch_name>
## Verify the push
git log origin/<new_branch_name>
## create another branch and test
## git checkout main
## git checkout -b filtering13. XML files to be used in final Script for migration with requested structure (grouping xml parent files with their hybrid files)
- Run script migration-final_4.py and in case of any problems with memory (script getting stuck/freeze or stalled) you can re run the same script (parallel jobs) and it will start from the grouping of files that got stuck processing.
- In case the repository exists the script is not going to fail, but overwrite the content of it with the requested structure.
- The script saves a process log to mantain traceability of jobs making it resilient and fault tolerant.
- Use this script carefully only to destroy a lab or a specific set of repos in GitHub. It delete repos with the pattern:
Interface.<baseName>
- It helps to validate and format the list of mapped_authors.txt before the git svn clone. It requires the list of GitHub Users in the expected format:
vhamra = <vhamra@cleveritgroup>
- It helps measuring files in a svn directory folder against a git directory folder, used to check nr of files after migration is the same in both folders.
- To use for check the number of xml files inside the Subversion repository. You can compare the count of these files agains the count resultant of the following command:
git ls-files '*.xml' | wc -l