Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backup and Restore functionality #2405

Open
partoneoftwo opened this issue Dec 11, 2020 · 13 comments
Open

Backup and Restore functionality #2405

partoneoftwo opened this issue Dec 11, 2020 · 13 comments

Comments

@partoneoftwo
Copy link

partoneoftwo commented Dec 11, 2020

User story
As the owner of a bookstack wiki that contains a lot of information that I have spent huge amounts of time in curating. In addition this information is very important to me. Bookstack is so great and I use it for business and private purposes.
Because of this heavy reliance, I want to be able to know that I can ensure that I have a way to back up the entire bookstack instance. I also need to be able to restore this backup regardless of the runtime version of the Bookstack version.

Security description
This is important to strengthen the NIST dimension of RECOVER.
For operational security perspective, this feature will strengthen the CIA aspect; Whenever Confidentiality, Integrity and or Availability has been impaired/breached, then this recovery functionality is a critical thing to have.

Describe the feature you'd like
A backup feature where I can back up all the information and images and structure that I have entered as an end user of Bookstack.

A restore feature where I can restore all the information and images and structure that I have entered as an end user of Bookstack, which my backup container contains. It is of course critical that restore is possible without failing, regardless if a backup package was done on an older version than what is currently installed. But this is of course hard to achieve.

Describe the benefits this feature would bring to BookStack users
Benefits for a user:
The backup and restore feature will make it very easy to secure information which I as a user really care about.

As a user I am able to quickly be able to restore Bookstack instances, and I can do it without interacting without touching the infrastructure / container layer*

Benefits for the product/project: Bookstack will be perceived as more reliable and a secure viable solution, for use in different scenarios where information criticality is high.

Additional context
This is needed to make the product more mature.
It is a highly usable feature which will make the product more attractive.

I am aware of a feature request which has been closed #43, regarding backup of Bookstack data. However it focused on backing up singular pages. This feature request is regarding the entire bookstack instance covering the following data objects:

Data

  • Books
  • Books structure
  • Pages
  • Page structure
  • Page Metadata

Configuration

  • Bookstack instance user configuration
  • Users in the solution
  • Groups in the solution

Did I mention I'm a massive fan of this software?

@ssddanbrown
Copy link
Member

Did I mention I'm a massive fan of this software?

Hi @partoneoftwo, Thanks!

This is needed to make the product more mature.
It is a highly usable feature which will make the product more attractive.

You could make the maturity statement about pretty much any addition to be honest, and I'm not looking to chase maturity itself; Same with "attractiveness", These are not primary goals of the project, I'd rather focus on improving the experience for existing users which, as you've explained, this would also benefit.

To be honest, I'm aware this is an area that we're lacking in, at least in manner that's intuitive.

My main concern has always been something you requested in this line:

As a user I am able to quickly be able to restore Bookstack instances, and I can do it without interacting without touching the infrastructure / container layer*

Bringing backup mechanisms into the application layer brings a lot of risk and instability, the application layer relies on the web-server and infrastructure layers it's sat upon. We could quickly get into trouble with things like timeouts, file permissions and request/response size limits. We'd then likely end up needed a lot of configuration options to suit the different requirements and environments that BookStack may run in. I'm not saying it's not possible at all, Just that it would require some ongoing effort while increasing accessibility to backups while decreasing reliability. This is why I've guided people so far in the direction of doing backup at the infrastructure layer.

How about we instead spend some time increasing accessibility of backups at an infrastructure layer? We could start adding some example scripts to the devops bookstack repo then link to these in the docs with some guidance, to the point where someone with a common setup could just download the script, add it to cron for scheduling, then be done with it. We could then include these to be used in the install scripts and container publishes could build these in.

@ssddanbrown
Copy link
Member

Related to #723

@modem7
Copy link

modem7 commented Jan 10, 2021

Just to add another point of view: for those on Docker, the infrastructure layer is somewhat different, and far more controllable, so this functionality, especially if it's able to be done on a cron job would definitely be useful.

@numen31337
Copy link

numen31337 commented Sep 1, 2021

Hey guys, while this feature is still being developed, I can share a script I use for continuous backup performed by a server once a week/month. It uses the API and a separate read-only user to perform a full backup. The parsing part is so weird because I need it to work on macOS without non-greedy Perl-style grep.

#!/bin/bash

DATE_MONTH=`date +"%Y.%m"` # Naming for monthly backups
TOKEN='QcUVf4yXMKh9hh81vOzGxMRxINnBsheM:dEUhdx4o6w4359yoBXrKIPsN8yrAWgW1';
BASE_URL='http://192.168.10.19:6885';

BOOKS=`curl --request GET --url "$BASE_URL/api/books" --header "Authorization: Token $TOKEN" --silent --stderr -`
BOOK_IDS=`echo $BOOKS | awk -F=":" -v RS="," '$1~/"id"/ {print}' | sed 's/.*://'`

for i in $BOOK_IDS; do
  curl --request GET --url "$BASE_URL/api/books/$i/export/html" --header "Authorization: Token $TOKEN" -o "$DATE_MONTH-$i.html"
done

Perhaps it will come in handy for someone looking for an automated solution.

Here's the full script with a bit more bells and whistles, which I personally use for basic backup, which is enough for my needs. Tested on macOS and Synology NAS Linux.

#!/bin/bash
# Does basic BookStack backup by fetching contained html for every book.
# Usage: backup.sh ~/Desktop/ "QcUVf4yXMKh9hh81vOzGxMRxINnBsheM:dEUhdx4o6w4359yoBXrKIPsN8yrAWgW1" "http://192.168.10.19:6885"

if [ "$#" -ne 3 ]; then
  echo "Illegal number of parameters."
  exit 1
fi
if [[ "$1" != */ ]]; then
  echo "Enter the path to the output folder with a trailing slash."
  exit 1
fi
if ! command -v 7z &>/dev/null; then
  echo "7z is not installed. Try brew install p7zip or sudo apt install p7zip-full."
  exit 1
fi

DATE_MONTH=`date +"%Y.%m"` # Naming for monthly backups
TOKEN=$2;
BASE_URL=$3;

BOOKS=`curl --request GET --url "$BASE_URL/api/books" --header "Authorization: Token $TOKEN" --silent --stderr -`
BOOK_IDS=`echo $BOOKS | awk -F=":" -v RS="," '$1~/"id"/ {print}' | sed 's/.*://'`

if [ -z "$BOOK_IDS" ]; then
  echo "No book IDs received. Check access right."
  exit 1
fi

OUTPUT_TMP_DIR="$1$DATE_MONTH-backup/" # add input of output dir
mkdir "$OUTPUT_TMP_DIR"
OUTPUT_FILE="$1$DATE_MONTH.zip"

for i in $BOOK_IDS; do
  FILENAME="$OUTPUT_TMP_DIR$DATE_MONTH-$i.html"
  curl --request GET --url "$BASE_URL/api/books/$i/export/html" --header "Authorization: Token $TOKEN" --silent -o $FILENAME
done

rm -f "$OUTPUT_FILE" # Delete the archive if already exists
7z a "$OUTPUT_FILE" "$OUTPUT_TMP_DIR*" -bsp0 -bso0 # Archive silently
rm -fr "$OUTPUT_TMP_DIR" # Delete temp dir

@modem7
Copy link

modem7 commented Sep 1, 2021

Hey guys, while this feature is still being developed, I can share a script I use for continuous backup performed by a server once a week/month. It uses the API and a separate read-only user to perform a full backup. The parsing part is so weird because I need it to work on macOS without non-greedy Perl-style grep.

#!/bin/bash

DATE_MONTH=`date +"%Y.%m"` # Naming for monthly backups
TOKEN='QcUVf4yXMKh9hh81vOzGxMRxINnBsheM:dEUhdx4o6w4359yoBXrKIPsN8yrAWgW1';
BASE_URL='http://192.168.10.19:6885';

BOOKS=`curl --request GET --url "$BASE_URL/api/books" --header "Authorization: Token $TOKEN" --silent --stderr -`
BOOK_IDS=`echo $BOOKS | awk -F=":" -v RS="," '$1~/"id"/ {print}' | sed 's/.*://'`

for i in $BOOK_IDS; do
  curl --request GET --url "$BASE_URL/api/books/$i/export/html" --header "Authorization: Token $TOKEN" -o "$DATE_MONTH-$i.html"
done

Perhaps it will come in handy for someone looking for an automated solution.

Good shout!

I'd recommend changing your token (assuming it's your real one) though just in case.

@aslmx
Copy link

aslmx commented Dec 7, 2021

@numen31337 thanks a lot for your comments here. Not sure if it was existing already somewhere else, but i added a little jq trickery around the json and now have the bookname in the filename.

https://gist.github.com/aslmx/a0fded5c4b180b45a6bb54963a3643bf

@mhjor70
Copy link

mhjor70 commented Dec 12, 2021

So once you have the backup how do you restore it ? Here is why i ask. I am setting up bookstack at several sites to store configuration docs for clients. Some of the "framework" of the shelves->books->pages will be the same. So rather than repeat the creation on multiple sites i would like to import the base "framework" from a master site when i start a new instance.

@patbcc
Copy link

patbcc commented Feb 10, 2023

I'd like to add my few cents to this issue.

We have a production system running v22.07.3. This runs in a VM. We generally rely on VM snapshots as backups. However, as with the OP we have a lot of content that would be lost should the snapshots not work when a restoral was needed. So I was testing the backup and restore method referenced in the docs (https://www.bookstackapp.com/docs/admin/backup-restore/). In doing so I ran into an issue caused by version differences. The test environment has the latest release; 23.01.1. Although importing was successful, the site fails to load due to two missing columns in the entity_permissions table (entity_type and view). I'm guessing these were added at some point after 22.07.3.

So it would appear that this type of issue is another hurdle for adding backup and restore functionality. Either the functionality would need to be able to determine the differences between versions and correct them or older releases would need to be made available so that the BookStack server could be rebuilt to the same version as the backup that was made prior to restoring then updated to the latest version (if so desired).

As a side note, the documentation should be updated to point out the issues caused by differences in the version where the backup was made and the version where the restore is occurring.

@AuthorShin
Copy link

@ssddanbrown Each page on BookStack got this HTML code that you can copy and paste somewhere else and have the exact same page/document (except the images), so why not do this?

Backup function can can turn shelves and books to the folders and sub-folders (since there are only one level of them it's very easy and straightforward to do so) and then there would be a .txt file that contain the HTML code of the page which later can be used for restore via GUI or manual one and automate this would be fairly easy I guess.

So let's say we got a book called "Black and White" with 12 chapters which is under "Dark" shelves the folder structure would be :

Dark (S) > Black and White (B) > chapter1 (C) > pageswiththeirtitle.txt
Dark (S) > Black and White (B) > chapter2 (C) > pageswiththeirtitle.txt
Dark (S) > Black and White (B) > chapter3 (C) > pageswiththeirtitle.txt
Dark (S) > Black and White (B) > chapter4 (C) > pageswiththeirtitle.txt
Dark (S) > Black and White (B) > chapter5 (C) > pageswiththeirtitle.txt
Dark (S) > Black and White (B) > chapter6 (C) > pageswiththeirtitle.txt
Dark (S) > Black and White (B) > chapter7 (C) > pageswiththeirtitle.txt
Dark (S) > Black and White (B) > chapter8 (C) > pageswiththeirtitle.txt
Dark (S) > Black and White (B) > chapter9 (C) > pageswiththeirtitle.txt
Dark (S) > Black and White (B) > chapter10 (C) > pageswiththeirtitle.txt
Dark (S) > Black and White (B) > chapter11 (C) > pageswiththeirtitle.txt
Dark (S) > Black and White (B) > chapter12 (C) > pageswiththeirtitle.txt

And have all of these folders and files into a zip file which we can encrypt with a password as well.

@AuthorShin
Copy link

Any thoughts on this one?#2405 (comment) @ssddanbrown

@ssddanbrown
Copy link
Member

@AuthorShin That would be more of an export/import format, rather than backup/restore format, since it's quite minimal in terms of the overall related content unless you go to a lot of extra effort. If I was going to do an import/export format of some kind, it'd more likely be that following our API content structure, but that's out of scope for this issue.
If you wanted something like that defined, then a couple of our API scripts come close, may be able to get what you want with a little extra tweaking.

@ssddanbrown
Copy link
Member

ssddanbrown commented Sep 16, 2023

As a general related update to this, earlier this year BookStack started including a System CLI, which can help automate tasks like backup and restore. It is in an alpha state.

@eleo56 eleo56 mentioned this issue Feb 27, 2024
1 task
@dw5
Copy link

dw5 commented Sep 23, 2024

IMO A good example is what Snipe-it does. Backups content and database into a zip file, from which it can be imported back 1:1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

9 participants