Skip to content

Commit

Permalink
Alusta repo zipsi-util:in avulla
Browse files Browse the repository at this point in the history
Aiheen vaihdon yhteydessä on helppo aloittaa alusta. Aiemmin olin
synkronoinut .docx -tiedoston sellaisenaan ja se oli väärin.

Alkuperäinen aihe oli "Projektipäällikön keskeisimmät kompetenssit
ohjelmistoprojekteissa".
  • Loading branch information
anttiharju committed Jan 7, 2022
0 parents commit 6b270b0
Show file tree
Hide file tree
Showing 7 changed files with 190 additions and 0 deletions.
12 changes: 12 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Avoid committing file locks to remote
~$*
# Ignore .docx and .xlsx
#
# It's better to sync these with OneDrive or another cloud service as Github isn't meant for file
# storage and these are binary files. You can recreate the original files by running the zi.ps1
# script.
#
# The unzipped form of these files will benefit from Git's compression abilities and as the
# script also adds line breaks, the diffs will be somewhat readable.
*.docx
*.xlsx
52 changes: 52 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Zipsi: Unzip Word and Excel documents to have readable Git diffs

## Disclaimer

I don't take responsibility if these scripts accidentally delete your work. I think the MIT license reflects that. Always have multiple backups of important work.

That said, the scripts are relatively simple. If you know a programming language and the basics of Git, I believe you will be able to figure out what the scripts do. PowerShell is somewhat verbose scripting language, so even if you are not familiar with it you're very likely able to understand what the individual commands do.

## Description

A collection of small scripts I use to track my bachelor's thesis on GitHub. In short, .docx and .xlsx files are just zip archives and
- [/scripts/unzi.ps1](/scripts/unzi.ps1) unzips them into folders. It also adds line breaks between the xml tags to make the diffs more readable in GitHub commit history.
- [/scripts/zi.ps1](/scripts/zi.ps1) reconstructs the original files by zipping the unzipped folders. All operations are done on copies, meaning that **the original files shouldn't be modified by these scripts**. The reconstructed files (ones with a "zipped" prefix) won't be unzipped again. If you want them to be unzipped again, follow the steps below:
1. Move the original file somewhere else (to a backup folder for example).
2. Remove the prefix from the reconstructed file.
3. Run the unzip script again.
- [_save.bat](_save.bat) is just a quick & easy way to run the unzip script and to push all changes.
- [_open_terminal.bat](_open_terminal.bat) is just an easy way to open Windows Terminal to manually fix things if (when) the scripts fail. Plus you can run [/scripts/zi.ps1](/scripts/zi.ps1) through this.

The scripts scan all subfolders for relevant files and they assume they're being called from the root of the repository. Additionally, they are extendable to work with various other MS Office file extensions. Although by default only .docx and .xlsx file extensions are supported. I don't have a need for other file format and PowerPoint for example are more likely to contain images (large binary files) and I don't want to encourage uploading those to GitHub.

## Usage
1. Create a new (private) repo on GitHub and clone that to your computer.
2. Copy the files in this repo (Code -> Download zip) to your new repo's local folder.
- If you know what you're doing, you can also just clone this but don't do it because you want to have an easy update channel: I don't guarantee backwards compatibility with future versions.
3. Work on your .docx or .xlsx files.
4. Run [_save.bat](_save.bat) whenever you want to save your progress.

These scripts won't sync the original files. Therefore, I recommend using another sync system such as OneDrive to have multiple backups. I know it seems like a terrible practice, but for writing theses I think it's acceptable. Whatever works, right?

### But why?
- I find it easy to keep a journal of my progress in Git commit messages.
- With these scripts one can have their thesis available on GitHub.
- I don't think it's uncommon for people to have a repo called `bachelors-thesis` (GitHub search finds more than 4,000 repositories).
- Syncing the original binary file isn't great as you won't benefit from Git's compression abilities.
- By running the zip script anyone can have access to the original file.
- The diffs are readable* which might help you catch mistakes.
- *All text is still wrapped in a bunch of xml tags, but it's better than syncing the original file.

## Todo
- Improve Git-only workflow.
- Currently it's assumed that there's another sync system such as OneDrive being used alongside these scripts.
- This has to be carefully tought out to not destroy any unsaved work.
- Remove repetition from the scripts (DRY).
- Mac version?
- I don't have a Mac.
- I'm also planning to transition to some Linux distro when I no longer need Windows for university (most courses assume the students are using Windows PCs).

## Requirements
- Windows Terminal
- Git
- Probably something else too; I haven't tested these scripts with a clean install. You'll figure it out if something doesn't work.
5 changes: 5 additions & 0 deletions _open_terminal.bat
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
:: This is in the root folder for easy access to manual debugging if (when) the scripts fail.
:: The goal of this workflow is to pretty much automate everything.

:: Opens PowerShell in Windows Terminal and bypasses execution policy to allow running PowerShell (.ps1) scripts
wt -p "Windows PowerShell" -d "%cd%" cmd /k powershell.exe -noprofile -executionpolicy bypass
2 changes: 2 additions & 0 deletions _save.bat
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
powershell.exe -noprofile -executionpolicy bypass -command "scripts/unzi.ps1"
powershell.exe -noprofile -executionpolicy bypass -command "scripts/git.ps1"
11 changes: 11 additions & 0 deletions scripts/git.ps1
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
git status
Write-Output "`nDo you want to add all and commit?`n"
Pause
git add --all
git commit
Write-Output "`nPush to remote?`n"
Pause
git push
# Write-Output "`nOpen the repository in Microsoft Edge?`n"
# Pause
# Start-Process microsoft-edge:<insert-url-here>
48 changes: 48 additions & 0 deletions scripts/unzi.ps1
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
$files = Get-ChildItem .\ -recurse -include *.docx,*.xlsx -exclude zipped*,tmp_copy_*

# Create folders
ForEach ($file in $files)
{
$path = Split-Path -Path $file.FullName
$extension = ((Split-Path $file.FullName -Leaf).Split('.'))[1]
$folderName = ($file.BaseName + "_unzipped_" + $extension)
$folderPath = ($path + "\" + $folderName)
$destinationPath = ($path + "\tmp_copy_" + $file.BaseName + ".")

Remove-Item -LiteralPath $folderPath -Force -Recurse -ErrorAction SilentlyContinue

$folder = New-Item -Path $path -Type Directory -Name $folderName

Copy-Item -Path $file.FullName -Destination ($destinationPath + $extension)
Rename-Item -Path ($destinationPath + $extension) -NewName ($destinationPath + "zip")
Expand-Archive -Path ($destinationPath + "zip") -DestinationPath $folderPath
Remove-Item ($destinationPath + "zip")
}

# Add line breaks to .docx xmls
$directories = Get-ChildItem . -recurse -filter "*_docx" -Directory

ForEach ($directory in $directories)
{
$files = Get-ChildItem $directory.FullName -recurse | where {$_.extension -in ".xml",".rels"}

ForEach ($file in $files)
{
$content = Get-Content -LiteralPath $file.FullName
$content -replace "><", ">`n<" | Set-Content -LiteralPath $file.FullName
}
}

## Add line breaks to .xlsx xmls - I know, I know, this should be DRY'd
$directories = Get-ChildItem . -recurse -filter "*_xlsx" -Directory

ForEach ($directory in $directories)
{
$files = Get-ChildItem $directory.FullName -recurse | where {$_.extension -in ".xml",".rels"}

ForEach ($file in $files)
{
$content = Get-Content -LiteralPath $file.FullName
$content -replace "><", ">`n<" | Set-Content -LiteralPath $file.FullName
}
}
60 changes: 60 additions & 0 deletions scripts/zi.ps1
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# Mr. PowerShell I don't feel so good (this feels kind of hacky)
$root = Split-Path (Split-Path $MyInvocation.MyCommand.Path -Parent) -Parent

# Zip .docx
$directories = Get-ChildItem . -recurse -filter "*_unzipped_docx" -Directory
ForEach ($directory in $directories)
{
# Setup
Set-Location $directory.FullName
$path = Split-Path $directory.FullName -Parent
$split = ($directory.BaseName -split "_unzipped_")
$name = $split[0]
$extension = $split[1]
$tmpName = ($name + "_" + $extension + ".zip")
$tmpFullPath = ($path + "\" + $tmpName)
$zippedName = ("zipped " + $name + "." + $extension)
$zippedFullPath = ($path + "\" + $zippedName)

# Remove old files
if (Test-Path -Path $tmpFullPath -PathType Leaf) {
Remove-Item $tmpFullPath
}
if (Test-Path -Path $zippedFullPath -PathType Leaf) {
Remove-Item $zippedFullPath
}

# Create new files
Compress-Archive * $tmpFullPath
Rename-Item -Path $tmpFullPath -NewName $zippedName
}
Set-Location $root

# Zip .xlsx - yes, this should also be DRY'd
$directories = Get-ChildItem . -recurse -filter "*_unzipped_xlsx" -Directory
ForEach ($directory in $directories)
{
# Setup
Set-Location $directory.FullName
$path = Split-Path $directory.FullName -Parent
$split = ($directory.BaseName -split "_unzipped_")
$name = $split[0]
$extension = $split[1]
$tmpName = ($name + "_" + $extension + ".zip")
$tmpFullPath = ($path + "\" + $tmpName)
$zippedName = ("zipped " + $name + "." + $extension)
$zippedFullPath = ($path + "\" + $zippedName)

# Remove old files
if (Test-Path -Path $tmpFullPath -PathType Leaf) {
Remove-Item $tmpFullPath
}
if (Test-Path -Path $zippedFullPath -PathType Leaf) {
Remove-Item $zippedFullPath
}

# Create new files
Compress-Archive * $tmpFullPath
Rename-Item -Path $tmpFullPath -NewName $zippedName
}
Set-Location $root

0 comments on commit 6b270b0

Please sign in to comment.