diff --git a/docs/assets/adding_a_jwt_rsa_key.png b/docs/assets/adding_a_jwt_rsa_key.png new file mode 100644 index 000000000..1c485b876 Binary files /dev/null and b/docs/assets/adding_a_jwt_rsa_key.png differ diff --git a/docs/assets/adding_a_search_index.png b/docs/assets/adding_a_search_index.png new file mode 100644 index 000000000..cbbe74d7b Binary files /dev/null and b/docs/assets/adding_a_search_index.png differ diff --git a/docs/assets/adding_a_solr_search_server.png b/docs/assets/adding_a_solr_search_server.png new file mode 100644 index 000000000..6beef092b Binary files /dev/null and b/docs/assets/adding_a_solr_search_server.png differ diff --git a/docs/assets/configuring_flysystem_to_use_fedora.png b/docs/assets/configuring_flysystem_to_use_fedora.png new file mode 100644 index 000000000..25a24299f Binary files /dev/null and b/docs/assets/configuring_flysystem_to_use_fedora.png differ diff --git a/docs/assets/configuring_iiif.png b/docs/assets/configuring_iiif.png new file mode 100644 index 000000000..fdb1e1718 Binary files /dev/null and b/docs/assets/configuring_iiif.png differ diff --git a/docs/assets/configuring_islandora.png b/docs/assets/configuring_islandora.png new file mode 100644 index 000000000..22fa9822e Binary files /dev/null and b/docs/assets/configuring_islandora.png differ diff --git a/docs/assets/configuring_openseadragon.png b/docs/assets/configuring_openseadragon.png new file mode 100644 index 000000000..cf0c98879 Binary files /dev/null and b/docs/assets/configuring_openseadragon.png differ diff --git a/docs/assets/configuring_standard_solr_connector.png b/docs/assets/configuring_standard_solr_connector.png new file mode 100644 index 000000000..22cb5ce9e Binary files /dev/null and b/docs/assets/configuring_standard_solr_connector.png differ diff --git a/docs/assets/configuring_the_jwt_rsa_key_for_use.png b/docs/assets/configuring_the_jwt_rsa_key_for_use.png new file mode 100644 index 000000000..11da41319 Binary files /dev/null and b/docs/assets/configuring_the_jwt_rsa_key_for_use.png differ diff --git a/docs/assets/setting_the_solr_install_directory.png b/docs/assets/setting_the_solr_install_directory.png new file mode 100644 index 000000000..573153e0d Binary files /dev/null and b/docs/assets/setting_the_solr_install_directory.png differ diff --git a/docs/assets/specifying_the_solr_server.png b/docs/assets/specifying_the_solr_server.png new file mode 100644 index 000000000..99ca13c3d Binary files /dev/null and b/docs/assets/specifying_the_solr_server.png differ diff --git a/docs/installation/component_overview.md b/docs/installation/component_overview.md new file mode 100644 index 000000000..9b8274b04 --- /dev/null +++ b/docs/installation/component_overview.md @@ -0,0 +1,68 @@ +# Component Overview + +A functioning Islandora 8 Stack is made up of dozens of components working in synchronization with each other to store information in your repository, manage that information, and disseminate it intelligently to users. Whether running an installation using the provided Ansible playbook or installing the stack manually, it may be helpful to have a brief overview of all the components we're going to need, in the order we're going to install them, as well as a brief introduction to each component's installation and configuration process. + +This list includes four different kinds of components: + +- Components which are hard-required (such as Drupal and the Islandora module) +- Components for which defaults are provided but which can be swapped out (such as the software managing databases, or the repository's storage system) +- Components that can't easily be swapped out but are not necessarily required (such as using Solr as the site's internal search engine) +- Components which do not have official alternatives and are not necessarily required, but will likely exist on the vast majority of Islandora 8 installations (such as Alpaca and Crayfish) + +## The Webserver Stack - Apache, PHP, and MySQL/PostgreSQL + +Combined together, Apache, PHP, and MySQL/PostgreSQL comprise a LAMP or LAPP server used to provide end-user-facing components - namely, the website. + +**Apache** is the webserver that will serve up webpages to the public. It will also manage some internal funcionality provided by Crayfish, and will expose Cantaloupe to the public. We’ll be making changes to the VirtualHost entry, enabling some modules, and modifying the ports configuration. The VirtualHost entry will eventually be modified when we need to expose other services like Cantaloupe to the public. + +**PHP** is the runtime interpreter for all the code Drupal and Crayfish need to be processed. By default, installing PHP 7.2 will give us a command-line interpreter, as well as an interpreter for Apache. We’re going to install several PHP modules required and/or useful for the components that make use of PHP. + +**MySQL** and **PostgreSQL** are database management systems that we will use to store information for many different components like Drupal and Fedora. By default, the Ansible playbook installs MySQL, though this can be switched to PostgreSQL. The manual installation guide recommends and walks through installing and using PostgreSQL. + +## The Front-Facing CDM - Composer, Drush, and Drupal + +Composer will be used to install both Drupal and Drush simultaneously using Islandora's fork of the [drupal-project](https://github.com/Islandora/drupal-project) repository. + +**Composer** is an installer and dependency manager for PHP projects. We're going to need it to install components for any PHP code we need to make use of, including Drupal and Crayfish. + +**Drush** and **Drupal** are installed simultaneously using [drupal-project](https://github.com/Islandora/drupal-project). Drupal will serve up webpages and manage Islandora content, and Drush will help us get some things done from the command-line. + +## The Web Application Server - Tomcat and Cantaloupe + +Several applets will be deployed via their `.war` files into Tomcat, including Fedora and Cantaloupe. + +**Tomcat** serves up webpages and other kinds of content much like Apache, but is specifically designed to deploy Java applications as opposed to running PHP code. + +**Cantaloupe** is an image tileserver that Islandora will connect to and use to serve up extremely large images in a way that doesn't have an adverse effect on the overall system. + +## The Back-End File Management Repository - Fedora, Syn, and Blazegraph + +Fedora will be installed in its own section, rather than as part of the Tomcat installation, as the installation process is rather involved and requires some authorization pieces to be set up in order to connect them back to Drupal and other components. + +**Fedora** is the default backend repository that Islandora content will be synchronized with and stored in. A great deal of configuration will be required to get it up and running, including ensuring a database is created and accessible. + +**Syn** is the authorization piece that allows Fedora to connect to other components. + +**Blazegraph** will store representative graph data about the repository that can be queried using SPARQL. Some configuration will also be required to link it back to Fedora, as well as to ensure it is being properly indexed. + +## The Search Engine - Solr and search_api_solr + +The installation of Solr itself is rather straightforward, but a configuration will have to be generated and applied from the Drupal side. + +**Solr** will be installed as a standalone application. Nothing of particular importance needs to happen here; the configuration will be applied when `search_api_solr` is installed. + +**search_api_solr** is the Drupal module that implements the Solr API for Drupal-side searches. After installing and configuring the module, the `drush solr-gsc` command will be used to generate Solr configs, and these configs will be moved to the Solr configuration location. + +## The Asynchronous Background Services - Crayfish + +**Crayfish** is a series of microservices that perform different asynchronous tasks kicked off by Islandora. It contains a series of submodules that will be installed via Composer. Later, these configured components will be connected to Alpaca. + +## The Broker Connecting Everything - Karaf and Alpaca + +**Karaf**’s job is similar to Tomcat, except where Tomcat is a web-accessible endpoint for Java applets, Karaf is simply meant to be a container for system-level applets to communicate via its OSGI. Alpaca is one such applet; it will broker messages between Fedora and Drupal, and between Drupal and various derivative generation applications. + +**Alpaca** contains Karaf services to manage moving information between Islandora, Fedora, and Blazegraph as well as kicking off derivative services in Crayfish. These will be configured to broker between Drupal and Fedora using an ActiveMQ queue. + +## Finalized Drupal Configurations + +**Drupal configuration** exists as a series of .yaml files that can either be created in a feature, or exported from Drupal using the `content_sync` module. It can also be manually entered in via the UI. We're going to place configuration in a few different ways; Some content will be synchronized onto the site, and some core configurations from the main Islandora module will need to be run in order to facilitate ingest. diff --git a/docs/installation/manual/configuring_drupal.md b/docs/installation/manual/configuring_drupal.md new file mode 100644 index 000000000..70fa5f514 --- /dev/null +++ b/docs/installation/manual/configuring_drupal.md @@ -0,0 +1,170 @@ +# Configuring Drupal + +After all of the above pieces are in place, installed, configured, started, and otherwise prepared, the last thing we need to do is to finally configure the front-end Drupal instance to wire all the installed components together. + +## Drupal Pre-Configuration + +### `settings.php` + +!!! notice + By default, `settings.php` is read-only for all users. It should be made writable while this pre-configuration is being done, then set back to `444` afterwards. + +Some additional settings will need to be established in your default `settings.php` before Drupal-side configuration can occur. + +The below configuration will establish `localhost` as a trusted host pattern, but on production sites this will need to be expanded to include the actual host patterns used by the site. + +`/opt/drupal/web/sites/default/settings.php` + +**Before**: +> 789 | 'driver' => 'pgsql', + +> 790 | ); + +**After**: +> 789 | 'driver' => 'pgsql', + +> 790 | ); + +> 791 | $settings['trusted_host_patterns'] = [ + +> 792 | 'localhost', + +> 793 | ]; + +> 794 | $settings['flysystem'] = [ + +> 795 | 'fedora' => [ + +> 796 | 'driver' => 'fedora', + +> 797 | 'config' => [ + +> 798 | 'root' => 'http://localhost:8080/fcrepo/rest/', + +> 799 | ], + +> 800 | ], + +> 801 | ]; + +Once this is done, refresh the cache to take hold of the new settings. + +```bash +cd /opt/drupal +drush -y cr +``` + +## Islandora + +### Downloading Islandora + +The Islandora Drupal module contains the core code to create a repository ecosystem in a Drupal environment. It also includes several submodules; of importance to us is `islandora_core_feature`, which contains the key configurations that turn a Drupal site into an Islandora site. + +Islandora also provides an [`islandora_defaults`](https://github.com/Islandora/islandora_defaults) module that contains additional configurations considered an appropriate launching point for configuring a site. We're going to first enable the `islandora_defaults` module before doing any wiring on the front-end. + +Take note of some of the other comments in the below bash script for an idea of what the other components are expected, and which may be considered optional. + +```bash +cd /opt/drupal +# This is a convenience piece that will help speed up most of the rest of our +# process working with Composer and Drupal. +sudo -u www-data composer require zaporylie/composer-drupal-optimizations:^1.0 +# Since islandora_defaults is near the bottom of the dependency chain, requiring +# it will get most of the modules and libraries we need to deploy a standard +# Islandora site. +sudo -u www-data composer require islandora/islandora_defaults:dev-8.x-1.x +# These can be considered important or required depending on your site's +# requirements; some of them represent dependencies of Islandora submodules. +sudo -u www-data composer require drupal/pdf:1.x-dev +sudo -u www-data composer require drupal/rest_oai_pmh:^1.0 +sudo -u www-data composer require drupal/facets:^1.3 +sudo -u www-data composer require drupal/restui:^1.16 +sudo -u www-data composer require drupal/rdfui:^1.0-beta1 +sudo -u www-data composer require drupal/content_browser:^1.0@alpha +# These tend to be good to enable for a development environment, or just for a +# higher quality of life when managing Islandora. That being said, devel should +# NEVER be enabled on a production environment, as it intentionally gives the +# user tools that compromise the security of a site. +sudo -u www-data composer require drupal/console:~1.0 +sudo -u www-data composer require drupal/devel:^2.0 +sudo -u www-data composer require drupal/admin_toolbar:^2.0 +# Islandora also provides a theme called Carapace designed to work well out of +# the box with an Islandora site. +sudo -u www-data composer require islandora/carapace:dev-8.x-3.x +``` + +### Enabling Downloaded Components + +Components we've now downloaded using `composer require` can be enabled simultaneously via `drush`, which will ensure they are installed in the correct dependent order. Enabling `islandora_defaults` will also ensure all content types and configurations are set up in Islandora. The installation process for all of these modules will likely take some time. + +!!! notice + This list of modules assumes that all of the above components were downloaded using `composer require`; if this is not the case, you may need to pare down this list manually. It also includes `devel`, which again, should not be enabled on production sites. + +```bash +cd /opt/drupal +drush -y en rdf responsive_image devel syslog serialization basic_auth rest restui search_api_solr search_api_solr_defaults facets content_browser pdf admin_toolbar islandora_defaults controlled_access_terms_defaults islandora_breadcrumbs islandora_iiif islandora_oaipmh +# If Carapace was downloaded, now is the time to enable and set it as well. +drush -y theme:enable carapace +drush -y config-set system.theme default carapace +# After all of this, rebuild the cache. +drush -y cr +``` + +### Adding a JWT Configuration to Drupal + +To allow our installation to talk to other services via Syn, we need to establish a Drupal-side JWT configuration using the keys we generated at that time. + +Log onto your site as an administrator at `/user`, then navigate to `/admin/config/system/keys/add`. Some of the settings here are unimportant, but pay close attention to the **Key type**, which should match the key we created earlier (an RSA key), and the **File location**, which should be the ultimate location of the key we created for Syn on the filesystem, `/opt/keys/syn_private.key`. + +![Adding a JWT RSA Key](../../assets/adding_a_jwt_rsa_key.png) + +Click **Save** to create the key. + +Once this key is created, navigate to `/admin/config/system/jwt` to select the key you just created from the list. Note that before the key will show up in the **Private Key** list, you need to select that key's type in the **Algorithm** section, namely `RSASSA-PKCS1-v1_5 using SHA-256 (RS256)`. + +![Configuring the JWT RSA Key for Use](../../assets/configuring_the_jwt_rsa_key_for_use.png) + +Click **Save configuration** to establish this as the JWT key configuration. + +### Configuring Islandora + +Navigate to the Islandora core configuration page at `/admin/config/islandora/core` to set up the core configuration to connect to Gemini. Of note here, the **Gemini URL** will need to be established to facilitate the connection to Fedora, and the appropriate **Bundles with Gemini URI pseudo field** types will need to be checked off. + +!!! notice + Any other Drupal content types you wish to synchronize with Fedora should also be checked off here. + +![Configuring Islandora](../../assets/configuring_islandora.png) + +### Configuring Islandora IIIF + +Navigate to `/admin/config/islandora/iiif` to ensure that Islandora IIIF is pointing to our Cantaloupe server. + +![Configuring Islandora IIIF](../../assets/configuring_iiif.png) + +Next, configure Openseadragon by navigating to `/admin/config/media/openseadragon` and ensuring everything is set up properly. + +![Configuring Openseadragon](../../assets/configuring_openseadragon.png) + +### Establishing Flysystem as the Default Download Method + +Navigate to `/admin/config/media/file-system` to set the **Default download method** to the one we created in our `settings.php`. + +![Configuring Flysystem to Use Fedora](../../assets/configuring_flysystem_to_use_fedora.png) + +### Giving the Administrative User the `fedoraAdmin` Role + +In order for data to be pushed back to Fedora, the site administrative user needs the `fedoraAdmin` role. + +``` +cd /opt/drupal +sudo -u www-data drush -y urol "fedoraadmin" islandora +``` + +### Running Feature Migrations + +Finally, to get everything up and running, run the Islandora Core Features and Islandora Defaults migrations. + +```bash +cd /opt/drupal +sudo -u www-data drush -y -l localhost --userid=1 mim --group=islandora +``` diff --git a/docs/installation/manual/installing_composer_drush_and_drupal.md b/docs/installation/manual/installing_composer_drush_and_drupal.md new file mode 100644 index 000000000..ee457165c --- /dev/null +++ b/docs/installation/manual/installing_composer_drush_and_drupal.md @@ -0,0 +1,128 @@ +# Installing Composer, Drush, and Drupal + +## In this section, we will install: + +- [Composer](https://getcomposer.org/) at its current latest version, the package manager that will allow us to install PHP applications +- The Islandora fork of the composer installer from [drupal-composer/drupal-project](https://github.com/Islandora/drupal-project), which will install, among other things: + - [Drush 9](https://www.drush.org/) at its latest version, the command-line PHP application for running tasks in Drupal + - [Drupal 8](https://www.drupal.org/) at its latest version, the content management system Islandora uses for content modelling and front-end display + +## Composer 1.x + +### Download and install Composer + +Composer provides PHP code that we can use to install it. After downloading and running the installer, we’re going to move the generated executable to a place in `$PATH`, removing its extension: + +```bash +curl "https://getcomposer.org/installer" > composer-install.php +chmod +x composer-install.php +php composer-install.php +sudo mv composer.phar /usr/local/bin/composer +``` + +## Drush 9 and Drupal 8 + +### Clone `drupal-project` and run `composer install` + +Before we can fully install Drupal, we’re going to need to clone `drupal-project` and provision it using Composer. We’re going to install it into the `/opt` directory: + +```bash +# Start by giving Drupal somewhere to live. The Drupal project is installed to +# an existing, empty folder. +sudo mkdir /opt/drupal +sudo chown www-data:www-data /opt/drupal +sudo chmod 775 /opt/drupal +# Clone drupal-project and build it in our newly-created folder. +git clone https://github.com/Islandora/drupal-project.git +cd drupal-project +# Expect this to take a little while, as this is grabbing the entire +# requirements set for Drupal. +sudo -u www-data composer create-project drupal-composer/drupal-project:8.x-dev /opt/drupal --no-interaction +``` + +### Make Drush accessible in `$PATH` + +While it’s not required for Drush to be accessible in `$PATH`, not needing to type out the full path to it every time we need to use it is going to be incredibly convenient for our purposes. The rest of this guide will assume that we can simply run Drush from the command line when necessary without having to reference the full path. + +```bash +sudo ln -s /opt/drupal/vendor/drush/drush/drush /usr/local/bin/drush +``` + +### Make the new webroot accessible in Apache + +Before we can proceed with the actual site installation, we’re going to need to make our new Drupal installation the default web-accessible location Apache serves up. This will include an appropriate `ports.conf` file, and replacing the default enabled site. + +!!! notice + Out of the box, these files will contain support for SSL, which we will not be setting up in this guide (and therefore removing with these overwritten configurations), but which are **absolutely indispensible** to a production site. This guide does not recommend any particular SSL certificate authority or installation method, but you may find [DigitalOcean's tutorial](https://www.digitalocean.com/community/tutorials/how-to-install-an-ssl-certificate-from-a-commercial-certificate-authority) helpful. + +`/etc/apache2/ports.conf | root:root/644` +``` +Listen 80 +``` + +`/etc/apache2/sites-enabled/000-default.conf | root:root/777` +```xml + + ServerName SERVER_NAME + DocumentRoot "/opt/drupal/web" + + Options Indexes FollowSymLinks MultiViews + AllowOverride all + Require all granted + + # Ensure some logging is in place. + ErrorLog "/var/log/apache2/localhost_error.log" + CustomLog "/var/log/apache2/localhost_access.log" combined + +``` +- `SERVER_NAME`: `localhost` + - For a development environment hosted on your own machine or a VM, `localhost` should suffice. Realistically, this should be the domain the server will be accessed at. + +Restart the Apache 2 service to apply these changes: + +```bash +sudo systemctl restart apache2 +``` + +### Prepare the PostgreSQL database + +PostgreSQL roles are directly tied to users. We’re going to ensure a user is in place, create a role for them in PostgreSQL, and create a database for them that we can use to install Drupal. + +```bash +# Run psql as the postgres user, the only user currently with any PostgreSQL +# access. +sudo -u postgres psql +# Then, run these commands within psql itself: +create database DRUPAL_DB; +create user DRUPAL_DB_USER with encrypted password 'DRUPAL_DB_PASSWORD'; +grant all privileges on database DRUPAL_DB to DRUPAL_DB_USER; +# Then, quit psql. +\q +``` +- `DRUPAL_DB`: `drupal8` + - This will be used as the core database that Drupal is installed into +- `DRUPAL_DB_USER`: `drupal` + - Specifically, this is the user that will connect to the PostgreSQL database being created, not the user that will be logging into Drupal +- `DRUPAL_DB_PASSWORD`: `drupal` + - This should be a secure password; it’s recommended to use a password generator to create this such as the one provided by [random.org](https://www.random.org/passwords/) + +### Run the Drupal installer with Drush + +The standard Drupal installation method involves navigating to your site’s front page and navigating through a series of form steps, but we can fast-track this using Drush’s `site-install` command. + +```bash +# Rather than defining the root directory in our Drush command, we're going to +# do this from the site root context. +cd /opt/drupal/web +drush -y site-install standard --db-url="pgsql://DRUPAL_DB_USER:DRUPAL_DB_PASSWORD@127.0.0.1:5432/DRUPAL_DB" --site-name="SITE_NAME" --account-name=DRUPAL_LOGIN --account-pass=DRUPAL_PASS +``` +This uses the same parameters from the above step, as well as: + +- `SITE_NAME`: Islandora 8 + - This is arbitrary, and is simply used to title the site on the home page +- `DRUPAL_LOGIN`: `islandora` + - The Drupal administrative username to use +- `DRUPAL_PASS`: `islandora` + - The password to use for the Drupal administrative user + +Congratulations, you have a Drupal site! It currently isn’t really configured to do anything, but we’ll get those portions set up in the coming sections. diff --git a/docs/installation/manual/installing_crayfish.md b/docs/installation/manual/installing_crayfish.md new file mode 100644 index 000000000..3befdab26 --- /dev/null +++ b/docs/installation/manual/installing_crayfish.md @@ -0,0 +1,315 @@ +# Installing Crayfish + +## In this section, we will install: +- [Islandora/Crayfish](https://github.com/islandora/crayfish), the suite of microservices that power the backend of Islandora 8 +- Indvidual microservices underneath Crayfish + +## Crayfish 1.0 + +### Installing Prerequisites + +Some packages need to be installed before we can proceed with installing Crayfish; these packages are used by the microservices within Crayfish. These include: + +- Imagemagick, which will be used for image processing. We'll be using the LYRASIS build of imagemagick here, which supports JP2 files. +- Tesseract, which will be used for optical character recognition; note that by default Tesseract can only understand English; several other individual Tesseract language packs can be installed using `apt-get`, and a list of available packs can be procured with `sudo apt-cache search tesseract-ocr` +- FFMPEG, which will be used for video processing +- Poppler, which will be used for generating PDFs + +```bash +sudo add-apt-repository -y ppa:lyrasis/imagemagick-jp2 +sudo apt-get update +sudo apt-get -y install imagemagick tesseract-ocr ffmpeg poppler-utils +``` + +### Preparing a Gemini Database + +This database will be set up (and function) mostly the same as the other databases we’ve previously installed. + +```bash +sudo -u postgres psql +create database CRAYFISH_DB; +create user CRAYFISH_DB_USER with encrypted password 'CRAYFISH_DB_PASSWORD'; +grant all privileges on database CRAYFISH_DB to CRAYFISH_DB_USER; +\q +``` +- `CRAYFISH_DB`: `gemini` +- `CRAYFISH_DB_USER`: `gemini` +- `CRAYFISH_DB_PASSWORD`: `gemini` + - As always, this should be a secure password of some kind, and not this default. + +### Cloning and Installing Crayfish + +We’re going to clone Crayfish to `/opt`, and individually run `composer install` against each of the microservice subdirectories. + +```bash +cd /opt +sudo git clone https://github.com/Islandora/Crayfish.git crayfish +sudo chown -R www-data:www-data crayfish +sudo -u www-data composer install -d crayfish/Gemini +sudo -u www-data composer install -d crayfish/Homarus +sudo -u www-data composer install -d crayfish/Houdini +sudo -u www-data composer install -d crayfish/Hypercube +sudo -u www-data composer install -d crayfish/Milliner +sudo -u www-data composer install -d crayfish/Recast +``` + +### Preparing Logging + +Not much needs to happen here; Crayfish opts for a simple logging approach, with one `.log` file for each component. We’ll create a folder where each logfile can live. + +```bash +sudo mkdir /var/log/islandora +sudo chown www-data:www-data /var/log/islandora +``` + +### Configuring Crayfish Components + +Each Crayfish component requires a `.yaml` file to ensure everything is wired up correctly. + +!!! notice + The following configuration files represent somewhat sensible defaults; you should take consideration of the logging levels in use, as this can vary in desirability from installation to installation. Also note that in all cases, `http` URLs are being used, as this guide does not deal with setting up https support. In a production installation, this should not be the case. These files also assume a connection to a PostgreSQL database; use a `pdo_mysql` driver and the appropriate `3306` port if using MySQL. + +`/opt/crayfish/Gemini/cfg/config.yaml | www-data:www-data/644` +```yaml +--- +debug: false +fedora_base_url: http://localhost:8080/fcrepo/rest +db.options: + driver: pdo_pgsql + host: 127.0.0.1 + port: 5432 + dbname: CRAYFISH_DB + user: CRAYFISH_DB_USER + password: CRAYFISH_DB_PASSWORD +log: + level: NOTICE + file: /var/log/islandora/gemini.log +syn: + enable: true + config: /opt/fcrepo/config/syn-settings.xml +``` + +`/opt/crayfish/Homarus/cfg/config.yaml | www-data:www-data/644` +```yaml +--- +homarus: + executable: ffmpeg + mime_types: + valid: + - video/mp4 + - video/x-msvideo + - video/ogg + - audio/x-wav + - audio/mpeg + - audio/aac + - image/jpeg + - image/png + default: video/mp4 + mime_to_format: + valid: + - video/mp4_mp4 + - video/x-msvideo_avi + - video/ogg_ogg + - audio/x-wav_wav + - audio/mpeg_mp3 + - audio/aac_m4a + - image/jpeg_image2pipe + - image/png_image2pipe + default: mp4 +fedora_resource: + base_url: http://localhost:8080/fcrepo/rest +log: + level: NOTICE + file: /var/log/islandora/homarus.log +syn: + enable: true + config: /opt/fcrepo/config/syn-settings.xml +``` + +`/opt/crayfish/Houdini/cfg/config.yaml | www-data:www-data/644` +```yaml +--- +houdini: + executable: convert + formats: + valid: + - image/jpeg + - image/png + - image/tiff + - image/jp2 + default: image/jpeg +fedora_resource: + base_url: http://localhost:8080/fcrepo/rest +log: + level: NOTICE + file: /var/log/islandora/houdini.log +syn: + enable: true + config: /opt/fcrepo/config/syn-settings.xml +``` + +`/opt/crayfish/Hypercube/cfg/config.yaml | www-data:www-data/644` +```yaml +--- +hypercube: + tesseract_executable: tesseract + pdftotext_executable: pdftotext +fedora_resource: + base_url: http://localhost:8080/fcrepo/rest +log: + level: NOTICE + file: /var/log/islandora/hypercube.log +syn: + enable: true + config: /opt/fcrepo/config/syn-settings.xml +``` + +`/opt/crayfish/Milliner/cfg/config.yaml | www-data:www-data/644` +```yaml +--- +fedora_base_url: http://localhost:8080/fcrepo/rest +drupal_base_url: http://localhost +gemini_base_uri: http://localhost/gemini +modified_date_predicate: http://schema.org/dateModified +strip_format_jsonld: true +debug: false +db.options: + driver: pdo_pgsql + host: 127.0.0.1 + port: 5432 + dbname: CRAYFISH_DB + user: CRAYFISH_DB_USER + password: CRAYFISH_DB_PASSWORD +log: + level: NOTICE + file: /var/log/islandora/milliner.log +syn: + enable: true + config: /opt/fcrepo/config/syn-settings.xml +``` + +`/opt/crayfish/Recast/cfg/config.yaml | www-data:www-data/644` +```yaml +--- +fedora_resource: + base_url: http://localhost:8080/fcrepo/rest +gemini_base_url: http://localhost/gemini +drupal_base_url: http://localhost +debug: false +log: + level: NOTICE + file: /var/log/islandora/recast.log +syn: + enable: true + config: /opt/fcrepo/config/syn-settings.xml +namespaces: +- + acl: "http://www.w3.org/ns/auth/acl#" + fedora: "http://fedora.info/definitions/v4/repository#" + ldp: "http://www.w3.org/ns/ldp#" + memento: "http://mementoweb.org/ns#" + pcdm: "http://pcdm.org/models#" + pcdmuse: "http://pcdm.org/use#" + webac: "http://fedora.info/definitions/v4/webac#" + vcard: "http://www.w3.org/2006/vcard/ns#" +``` + +### Installing the Gemini Database + +Our Gemini database is unusable until it's installed. + +```bash +cd /opt/crayfish/Gemini +php bin/console --no-interaction migrations:migrate +``` + +### Creating Apache Configurations for Crayfish Components + +Finally, we need appropriate Apache configurations for Crayfish; these will allow other services to connect to Crayfish components via their HTTP endpoints. + +Each endpoint we need to be able to connect to will get its own `.conf` file, which we will then enable. + +!!! notice + These configurations would potentially have collisions with Drupal routes, if any are created in Drupal with the same name. If this is a concern, it would likely be better to reserve a subdomain or another port specifically for Crayfish. For the purposes of this installation guide, these endpoints will suffice. + +`/etc/apache2/conf-available/Gemini.conf | root:root/644` +``` +Alias "/gemini" "/opt/crayfish/Gemini/src" + + FallbackResource /gemini/index.php + Require all granted + DirectoryIndex index.php + SetEnvIf Authorization "(.*)" HTTP_AUTHORIZATION=$1 + +``` + +`/etc/apache2/conf-available/Homarus.conf | root:root/644` +``` +Alias "/homarus" "/opt/crayfish/Homarus/src" + + FallbackResource /homarus/index.php + Require all granted + DirectoryIndex index.php + SetEnvIf Authorization "(.*)" HTTP_AUTHORIZATION=$1 + +``` + +`/etc/apache2/conf-available/Houdini.conf | root:root/644` +``` +Alias "/houdini" "/opt/crayfish/Houdini/src" + + FallbackResource /houdini/index.php + Require all granted + DirectoryIndex index.php + SetEnvIf Authorization "(.*)" HTTP_AUTHORIZATION=$1 + +``` + +`/etc/apache2/conf-available/Hypercube.conf | root:root/644` +``` +Alias "/hypercube" "/opt/crayfish/Hypercube/src" + + FallbackResource /hypercube/index.php + Require all granted + DirectoryIndex index.php + SetEnvIf Authorization "(.*)" HTTP_AUTHORIZATION=$1 + +``` + +`/etc/apache2/conf-available/Milliner.conf | root:root/644` +``` +Alias "/milliner" "/opt/crayfish/Milliner/src" + + FallbackResource /milliner/index.php + Require all granted + DirectoryIndex index.php + SetEnvIf Authorization "(.*)" HTTP_AUTHORIZATION=$1 + +``` + +`/etc/apache2/conf-available/Recast.conf | root:root/644` +``` +Alias "/recast" "/opt/crayfish/Recast/src" + + FallbackResource /recast/index.php + Require all granted + DirectoryIndex index.php + SetEnvIf Authorization "(.*)" HTTP_AUTHORIZATION=$1 + +``` + +### Enabling Each Crayfish Component Apache Configuration + +Enabling each of these configurations involves creating a symlink to them in the `conf-enabled` directory; the standardized method of doing this in Apache is with `a2enconf`. + +```bash +sudo a2enconf Gemini Homarus Houdini Hypercube Milliner Recast +``` + +### Restarting the Apache Service + +Finally, to get these new endpoints up and running, we need to restart the Apache service. + +``` +sudo systemctl restart apache2 +``` diff --git a/docs/installation/manual/installing_fedora_syn_and_blazegraph.md b/docs/installation/manual/installing_fedora_syn_and_blazegraph.md new file mode 100644 index 000000000..4085b3897 --- /dev/null +++ b/docs/installation/manual/installing_fedora_syn_and_blazegraph.md @@ -0,0 +1,489 @@ +# Installing Fedora, Syn, and Blazegraph + +## In this section, we will install: + +- [Fedora 5](https://duraspace.org/fedora/), the back-end repository that Islandora will use +- [Syn](https://github.com/Islandora/Syn), the authentication broker that will manage communication with Fedora +- [Blazegraph](https://blazegraph.com/), the resource index layer on top of Fedora for managing discoverability via RDF + +## Fedora 5 + +### Creating a Working Space for Fedora + +Fedora’s configuration and data won’t live with Tomcat itself; rather, we’re going to prepare a space for them to make them easier to manage. + +```bash +sudo mkdir -p /opt/fcrepo/data/objects +sudo mkdir /opt/fcrepo/config +sudo chown -R tomcat:tomcat /opt/fcrepo +``` + +### Creating a Database for Fedora + +The method for creating the database here will closely mimic the method we used to create our database for Drupal. + +```bash +sudo -u postgres psql +create database FEDORA_DB; +create user FEDORA_DB_USER with encrypted password 'FEDORA_DB_PASSWORD'; +grant all privileges on database FEDORA_DB to FEDORA_DB_USER; +\q +``` +- `FEDORA_DB`: `fcrepo` + - This will be used as the database Fedora will store the repository in. +- `FEDORA_DB_USER`: `fedora` +- `FEDORA_DB_PASSWORD`: `fedora` + - Again, this should be a secure password of some kind; leaving it as `fedora` is not recommended. + +### Adding a Fedora Configuration + +The Fedora configuration is going to come in a few different chunks that need to be in place before Fedora will be functional. We’re going to place several files outright, with mildly modified parameters according to our configuration. + +The basics of these configuration files have been pulled largely from the templates in [Islandora-Devops/ansible-role-fcrepo](https://github.com/islandora-devops/ansible-role-fcrepo); you may consider referencing the playbook’s templates directory for more details. + +`i8_namespaces.cnd` is a list of namespaces used by Islandora 8 that may not necessarily be present in Fedora; we add them here to ensure we can use them in queries. + +`/opt/fcrepo/config/i8_namespaces.cnd | tomcat:tomcat/644` +``` + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +``` + +We intend to have Crayfish installed later. Since Fedora needs to be able to read data from Crayfish, we need to tell Fedora that the Crayfish endpoint is a valid data source. + +`/opt/fcrepo/config/allowed_hosts.txt | tomcat:tomcat/644` +``` +http://localhost:CRAYFISH_PORT/ +``` +- `CRAYFISH_PORT`: 80 + - This guide will install Crayfish on the same port that Drupal is installed on. This may not be desirable, and if Crayfish is installed on a different port later, that change should be reflected here. + +The next part of the configuration defines where the pieces of the actual repository will live. Note that this file contains some of the defined `FEDORA_DB` variables from earlier. + +`/opt/fcrepo/config/repository.json | tomcat:tomcat/644` +```json +{ + "name" : "repo", + "jndiName" : "", + "workspaces" : { + "predefined" : ["default"], + "default" : "default", + "allowCreation" : true, + "cacheSize" : 10000 + }, + "storage" : { + "persistence": { + "type" : "db", + "connectionUrl": "jdbc:postgresql://localhost:5432/FEDORA_DB", + "driver" : "org.postgresql.Driver", + "username" : "FEDORA_DB_USER", + "password" : "FEDORA_DB_PASSWORD" + }, + "binaryStorage" : { + "type" : "file", + "directory" : "/opt/fcrepo/data/binaries", + "minimumBinarySizeInBytes" : 4096 + } + }, + "security" : { + "anonymous" : { + "roles" : ["readonly","readwrite","admin"], + "useOnFailedLogin" : false + }, + "providers" : [ + { "classname" : "org.fcrepo.auth.common.BypassSecurityServletAuthenticationProvider" } + ] + }, + "garbageCollection" : { + "threadPool" : "modeshape-gc", + "initialTime" : "00:00", + "intervalInHours" : 24 + }, + "node-types" : ["fedora-node-types.cnd", "file:/opt/fcrepo/config/i8_namespaces.cnd"] +} +``` + +Finally, we need an actual `fcrepo-config.xml` to pull this configuration into place. There's nothing to edit in here by default, but pay attention to the `p:repositoryConfiguration` property of the `modeshapeRepofactory` bean, which contains the path to the `repository.json` file we made earlier. If you've placed this somewhere else, you'll need to change it here. + +`/opt/fcrepo/config/fcrepo-config.xml | tomcat:tomcat/644` +```xml + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + /** = servletContainerAuthFilter,headerProvider,delegatedPrincipalProvider,webACFilter + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +``` + +### Adding the Fedora Variables to `JAVA_OPTS` + +We need our Tomcat `JAVA_OPTS` to include references to our repository configuration. + +`/opt/tomcat/bin/setenv.sh` + +**Before**: +> 3 | export JAVA_OPTS="-Djava.awt.headless=true -Dcantaloupe.config=/opt/cantaloupe_config/cantaloupe.properties -server -Xmx1500m -Xms1000m" + +**After**: +> 3 | export JAVA_OPTS="-Djava.awt.headless=true -Dcantaloupe.config=/opt/cantaloupe_config/cantaloupe.properties -Dfcrepo.modeshape.configuration=file:///opt/fcrepo/config/repository.json -Dfcrepo.home=/opt/fcrepo/data -Dfcrepo.spring.configuration=file:///opt/fcrepo/config/fcrepo-config.xml -server -Xmx1500m -Xms1000m" + +### Ensuring Tomcat Users Are In Place + +While not strictly necessary, we can use the `tomcat-users.xml` file to give us direct access to the Fedora endpoint. Fedora defines, out of the box, a `fedoraAdmin` and `fedoraUser` role that can be reflected in the users list for access. The following file will also include the base `tomcat` user. As always, these default passwords should likely not stay as the defaults. + +`/opt/tomcat/conf/tomcat-users.xml | tomcat:tomcat/600` +```xml + + + + + + + + + +``` +- `TOMCAT_PASSWORD`: `tomcat` +- `FEDORA_ADMIN_PASSWORD`: `islandora` +- `FEDORA_USER_PASSWORD`: `islandora` + +### Downloading and Placing the Latest Release + +Fedora `.war` files are packaged up as releases on the official GitHub repository; you can find the latest version at the releases page; the official GitHub repository is labelled as fcrepo4 but does actually contain more recent versions than 4. You should download the most recent stable release. + +```bash +sudo wget -O fcrepo.war FCREPO_WAR_URL +sudo mv fcrepo.war /opt/tomcat/webapps +sudo chown tomcat:tomcat /opt/tomcat/webapps/fcrepo.war +``` +- `FCREPO_WAR_URL`: This can be found at the [fcrepo downloads page](https://github.com/fcrepo4/fcrepo4/releases); the file you're looking for is: + - Tagged in green as the 'Latest release' + - The `.war` version of the file + +### Restarting the Tomcat Service + +As before, restart the Tomcat service to get Fedora up and running. + +```bash +sudo systemctl restart tomcat +``` + +## Syn + +### Downloading the Syn JAR File + +A compiled JAR of Syn can be found on the [Syn releases page](https://github.com/Islandora/Syn/releases). We’re going to add this to the list libraries accessible to Tomcat. + +``` +sudo wget -P /opt/tomcat/lib SYN_JAR_URL +# Ensure the library has the correct permissions. +sudo chown -R tomcat:tomcat /opt/tomcat/lib +sudo chmod -R 640 /opt/tomcat/lib +``` +- `SYN_JAR_URL`: The latest stable release of the Syn JAR from the [releases page](https://github.com/Islandora/Syn/releases). Specifically, the JAR compiled as `-all.jar` is required. + +### Generating an SSL Key for Syn + +For Islandora and Fedora to talk to each other, an SSL key needs to be generated for use with Syn. We’re going to make a spot where such keys can live, and generate one. + +```bash +sudo mkdir /opt/keys +sudo openssl genrsa -out "/opt/keys/syn_private.key" 2048 +sudo openssl rsa -pubout -in "/opt/keys/syn_private.key" -out "/opt/keys/syn_public.key" +sudo chown www-data:www-data /opt/keys/syn* +``` + +### Placing the Syn Settings + +Syn sites and tokens belong in a settings file that we’re going to reference in Tomcat. + +`/opt/fcrepo/config/syn-settings.xml | tomcat:tomcat/600` +```xml + + + ISLANDORA_SYN_TOKEN + +``` +- `ISLANDORA_SYN_TOKEN`: `islandora` + - This should be a secure generated token rather than this default; it will be configured on the Drupal side later. + +### Adding the Syn Valve to Tomcat + +Referencing the valve we’ve created in our `syn-settings.xml` involves creating a `` entry in Tomcat’s `context.xml`: + +`/opt/tomcat/conf/context.xml` + +**Before**: +> 29 | `-->` + +> 30 | `` + +**After**: +> 29 | `-->` + +> 30 | `` + +> 31 | `` + +### Restarting Tomcat + +Finally, restart tomcat to apply the new configurations. + +```bash +sudo systemctl restart tomcat +``` + +## Blazegraph 2 + +### Creating a Working Space for Blazegraph + +Blazegraph needs a space for configurations and data; we’re going to create this space in `/opt`. + +```bash +sudo mkdir -p /opt/blazegraph/data +sudo mkdir /opt/blazegraph/conf +sudo chown -R tomcat:tomcat /opt/blazegraph +``` + +### Downloading and Placing the Blazegraph WAR + +The Blazegraph `.war` file can be found in a few different places, but to ensure we’re able to easily `wget` it, we’re going to use the [maven.org](https://maven.org/) repository link to grab it. + +```bash +cd /opt +sudo wget -O blazegraph.war BLAZEGRAPH_WARFILE_LINK +sudo mv blazegraph.war /opt/tomcat/webapps +sudo chown tomcat:tomcat /opt/tomcat/webapps/blazegraph.war +``` +- BLAZEGRAPH_WAR_URL: You can find a link to this at the [Maven repository for Blazegraph](https://repo1.maven.org/maven2/com/blazegraph/bigdata-war/); you’ll want to click the link for the latest version of Blazegraph 2.1.x, then get the link to the `.war` file within that version folder. + +Once this is downloaded, give it a moment to expand before moving on to the next step. + +### Configuring Logging + +We would like to have an appropriate logging configuration for Blazegraph, which can be useful for looking at incoming traffic and determining if anything has gone wrong with Blazegraph. Our logger isn’t going to be much different than the default logger; it can be made more or less verbose by changing the default `WARN` levels. There are several other loggers that can be enabled, like a SPARQL query trace or summary query evaluation log; if these are desired they should be added in. Consult the Blazegraph documentation for more details. + +`/opt/blazegraph/conf/log4j.properties | tomcat:tomcat/644` +``` +log4j.rootCategory=WARN, dest1 + +# Loggers. +log4j.logger.com.bigdata=WARN +log4j.logger.com.bigdata.btree=WARN + +# Normal data loader (single threaded). +#log4j.logger.com.bigdata.rdf.store.DataLoader=INFO + +# dest1 +log4j.appender.dest1=org.apache.log4j.ConsoleAppender +log4j.appender.dest1.layout=org.apache.log4j.PatternLayout +log4j.appender.dest1.layout.ConversionPattern=%-5p: %F:%L: %m%n +#log4j.appender.dest1.layout.ConversionPattern=%-5p: %r %l: %m%n +#log4j.appender.dest1.layout.ConversionPattern=%-5p: %m%n +#log4j.appender.dest1.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n +#log4j.appender.dest1.layout.ConversionPattern=%-4r(%d) [%t] %-5p %c(%l:%M) %x - %m%n + +# Rule execution log. This is a formatted log file (comma delimited). +log4j.logger.com.bigdata.relation.rule.eval.RuleLog=INFO,ruleLog +log4j.additivity.com.bigdata.relation.rule.eval.RuleLog=false +log4j.appender.ruleLog=org.apache.log4j.FileAppender +log4j.appender.ruleLog.Threshold=ALL +log4j.appender.ruleLog.File=/var/log/blazegraph/rules.log +log4j.appender.ruleLog.Append=true +log4j.appender.ruleLog.BufferedIO=false +log4j.appender.ruleLog.layout=org.apache.log4j.PatternLayout +log4j.appender.ruleLog.layout.ConversionPattern=%m +``` + +### Adding a Blazegraph Configuration + +Our configuration will be built from a few different files that we will eventually reference in `JAVA_OPTS` and directly apply to Blazegraph; these include most of the functional pieces Blazegraph requires, as well as a generalized configuration for the `islandora` namespace it will use. As with most large configurations like this, these should likely be tuned to your preferences, and the following files only represent sensible defaults. + +`/opt/blazegraph/conf/RWStore.properties | tomcat:tomcat/644` +``` +com.bigdata.journal.AbstractJournal.file=/opt/blazegraph/data/blazegraph.jnl +com.bigdata.journal.AbstractJournal.bufferMode=DiskRW +com.bigdata.service.AbstractTransactionService.minReleaseAge=1 +com.bigdata.journal.Journal.groupCommit=false +com.bigdata.btree.writeRetentionQueue.capacity=4000 +com.bigdata.btree.BTree.branchingFactor=128 +com.bigdata.journal.AbstractJournal.initialExtent=209715200 +com.bigdata.journal.AbstractJournal.maximumExtent=209715200 +com.bigdata.rdf.sail.truthMaintenance=false +com.bigdata.rdf.store.AbstractTripleStore.quads=true +com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=false +com.bigdata.rdf.store.AbstractTripleStore.textIndex=false +com.bigdata.rdf.store.AbstractTripleStore.axiomsClass=com.bigdata.rdf.axioms.NoAxioms +com.bigdata.namespace.kb.lex.com.bigdata.btree.BTree.branchingFactor=400 +com.bigdata.namespace.kb.spo.com.bigdata.btree.BTree.branchingFactor=1024 +com.bigdata.journal.Journal.collectPlatformStatistics=false +``` + +`/opt/blazegraph/conf/blazegraph.properties | tomcat:tomcat/644` +``` +com.bigdata.rdf.store.AbstractTripleStore.textIndex=false +com.bigdata.rdf.store.AbstractTripleStore.axiomsClass=com.bigdata.rdf.axioms.OwlAxioms +com.bigdata.rdf.sail.isolatableIndices=false +com.bigdata.rdf.store.AbstractTripleStore.justify=true +com.bigdata.rdf.sail.truthMaintenance=true +com.bigdata.rdf.sail.namespace=islandora +com.bigdata.rdf.store.AbstractTripleStore.quads=false +com.bigdata.namespace.islandora.lex.com.bigdata.btree.BTree.branchingFactor=400 +com.bigdata.journal.Journal.groupCommit=false +com.bigdata.namespace.islandora.spo.com.bigdata.btree.BTree.branchingFactor=1024 +com.bigdata.rdf.store.AbstractTripleStore.geoSpatial=false +com.bigdata.rdf.store.AbstractTripleStore.statementIdentifiers=false +``` + +`/opt/blazegraph/conf/inference.nt | tomcat:tomcat/644` +``` + . + . +``` + +### Specifying the `RWStore.properties` in `JAVA_OPTS` + +In order to enable our configuration when Tomcat starts, we need to reference the location of `RWStore.properties` in the `JAVA_OPTS` environment variable that Tomcat uses. + +`/opt/tomcat/bin/setenv.sh` + +**Before**: +> 3 | export JAVA_OPTS="-Djava.awt.headless=true -Dcantaloupe.config=/opt/cantaloupe_config/cantaloupe.properties -Dfcrepo.modeshape.configuration=file:///opt/fcrepo/config/repository.json -Dfcrepo.home=/opt/fcrepo/data -Dfcrepo.spring.configuration=file:///opt/fcrepo/config/fcrepo-config.xml -server -Xmx1500m -Xms1000m" + +**After**: +> 3 | export JAVA_OPTS="-Djava.awt.headless=true -Dcantaloupe.config=/opt/cantaloupe_config/cantaloupe.properties -Dfcrepo.modeshape.configuration=file:///opt/fcrepo/config/repository.json -Dfcrepo.home=/opt/fcrepo/data -Dfcrepo.spring.configuration=file:///opt/fcrepo/config/fcrepo-config.xml -Dcom.bigdata.rdf.sail.webapp.ConfigParams.propertyFile=/opt/blazegraph/conf/RWStore.properties -Dlog4j.configuration=file:/opt/blazegraph/conf/log4j.properties -server -Xmx1500m -Xms1000m" + +### Restarting Tomcat + +Finally, restart Tomcat to pick up the changes we’ve made. + +```bash +sudo systemctl restart tomcat +``` + +### Installing Blazegraph Namespaces and Inference + +The two other files we created, `blazegraph.properties` and `inference.nt`, contain information that Blazegraph requires in order to establish and correctly use the datasets Islandora will send to it. First, we need to create a dataset - contained in `blazegraph.properties` - and then we need to inform that dataset of the inference set we have contained in `inference.nt`. + +```bash +curl -X POST -H "Content-Type: text/plain" --data-binary @/opt/blazegraph/conf/blazegraph.properties http://localhost:8080/blazegraph/namespace +# If this worked correctly, Blazegraph should respond with "CREATED: islandora" +# to let us know it created the islandora namespace. +curl -X POST -H "Content-Type: text/plain" --data-binary @/opt/blazegraph/conf/inference.nt http://localhost:8080/blazegraph/namespace/islandora/sparql +# If this worked correctly, Blazegraph should respond with some XML letting us +# know it added the 2 entries from inference.nt to the namespace. +``` diff --git a/docs/installation/manual/installing_karaf_and_alpaca.md b/docs/installation/manual/installing_karaf_and_alpaca.md new file mode 100644 index 000000000..7639de40d --- /dev/null +++ b/docs/installation/manual/installing_karaf_and_alpaca.md @@ -0,0 +1,362 @@ +# Installing Karaf and Alpaca + +## In this section, we will install: + +- [Apache ActiveMQ](https://activemq.apache.org/), a messaging server that will be used to handle communication between Alpaca and other components +- [Apache Karaf](https://karaf.apache.org/), the Java application runtime that Alpaca will be deployed in +- [Islandora/Alpaca](https://github.com/Islandora/Alpaca), a suite of Java middleware applications that will handle communication between various components of Islandora 8 + +## ActiveMQ 5 + +### Installing ActiveMQ + +In our case, the default installation method for ActiveMQ via `apt-get` will suffice. + +```bash +sudo apt-get -y install activemq +``` + +This will give us: + +- A base configuration at `/var/lib/activemq/conf` +- A data storage directory at `/var/lib/activemq/data` +- The base ActiveMQ installation at `/usr/share/activemq` +- An `activemq` service that will be run on boot +- A user, `activemq`, who will be in charge of the ActiveMQ service + +## Karaf 4 + +## Creating a Karaf User + +Karaf, as well as its processes and service, will be owned by a user in charge of ensuring this portion of the stack is segregated and that the service is running. + +```bash +sudo addgroup karaf +sudo adduser karaf --ingroup karaf --home /opt/karaf --shell /usr/bin +``` + +As always, you will be prompted for a password, which you should create at this time. All other options can be left blank. + +### Downloading and Placing Karaf + +Since there’s no `apt-get` installer for Karaf, we’re going to manually download and install it directly from its binary installer. + +```bash +cd /opt +sudo wget -O karaf.tar.gz KARAF_TARBALL_LINK +sudo tar -xzvf karaf.tar.gz +sudo chown -R karaf:karaf KARAF_DIRECTORY +sudo mv KARAF_DIRECTORY/* /opt/karaf +``` +- `KARAF_TARBALL_LINK`: It’s recommended to get the most recent version of Karaf 4.x. This will depend on the current version of Karaf, which can be found on the [Karaf downloads page](https://karaf.apache.org/download.html) under “Karaf Runtime”. Like Solr, you can’t directly `wget` these links, but clicking on the `.tar.gz` link for the binary distribution will bring you to a list of mirrors, as well as provide you with a recommended mirror you can use here. +- `KARAF_DIRECTORY`: This will depend on the exact version being used, but will likely be `/opt/apache-karaf-VERSION`, where `VERSION` is the current Karaf version number. + +### Configuring Karaf Logging + +We’re going to apply some basic logging to our Karaf installation that should suffice for an example. In a production installation, you may want to play around with some of these values for more personally useful logging. + +```bash +sudo mkdir /var/log/karaf +sudo chown karaf:karaf /var/log/karaf +``` + +`/opt/karaf/etc/org.pos4j.pax.logging.cfg | karaf:karaf/644` +``` +# Root logger +log4j.rootLogger=INFO, out, osgi:* +log4j.throwableRenderer=org.apache.log4j.OsgiThrowableRenderer + +# File appender +log4j.appender.out=org.apache.log4j.RollingFileAppender +log4j.appender.out.layout=org.apache.log4j.PatternLayout +log4j.appender.out.layout.ConversionPattern=%d{ISO8601} | %-5.5p | %-16.16t | %-32.32c{1} | %X{bundle.id} - %X{bundle.name } - %X{bundle.version} | %m%n +log4j.appender.out.file=/var/log/karaf/karaf.log +log4j.appender.out.append=true +log4j.appender.out.maxFileSize=1MB +log4j.appender.out.maxBackupIndex=10 + +# Camel Logger +log4j.appender.camel=org.apache.log4j.RollingFileAppender +log4j.appender.camel.layout=org.apache.log4j.PatternLayout +log4j.appender.camel.layout.ConversionPattern=%d{ISO8601} | %-5.5p | %-16.16t | %-32.32c{1} | %X{bundle.id} - %X{bundle.na me} - %X{bundle.version} | %m%n +log4j.appender.camel.file=/var/log/karaf/camel.log +log4j.appender.camel.append=false +log4j.appender.camel.maxFileSize=1MB +log4j.appender.camel.maxBackupIndex=10 + +log4j.logger.org.apache.camel=INFO, camel + +# Islandora Logger +log4j.appender.islandora=org.apache.log4j.RollingFileAppender +log4j.appender.islandora.layout=org.apache.log4j.PatternLayout +log4j.appender.islandora.layout.ConversionPattern=%d{ISO8601} | %-5.5p | %-16.16t | %-32.32c{1} | %X{bundle.id} - %X{bundl e.name} - %X{bundle.version} | %m%n +log4j.appender.islandora.file=/var/log/karaf/islandora.log +log4j.appender.islandora.append=false +log4j.appender.islandora.maxFileSize=1MB +log4j.appender.islandora.maxBackupIndex=10 + +log4j.logger.ca.islandora.camel=INFO, islandora +``` + +### Creating a `setenv.sh` Script for Karaf + +Similar to Tomcat, our Karaf service is going to rely on a `setenv` shell script to determine environment variables Karaf needs in place when running. For now, this will simply be the path to `JAVA_HOME`, but this also accepts many other parameters you can find in the default `setenv` script. + +`/opt/karaf/bin/setenv | karaf:karaf/755` +``` +#!/bin/sh +export JAVA_HOME="PATH_TO_JAVA_HOME" +``` +- `PATH_TO_JAVA_HOME`: This will be the same `JAVA_HOME` we used when installing Tomcat , and can be found using the same method (i.e., still `/usr/lib/jvm/java-8-openjdk-amd64` if that's what it was before). + +### Initializing Karaf + +We’re going to start Karaf, then run the installer to put our configurations in place and generate a Karaf service. Once these are installed, we’re going to stop Karaf, as from there on out its start/stop management should be handled via that service. + +```bash +sudo -u karaf /opt/karaf/bin/start +# You may want to wait a bit for Karaf to start. +# If you're not sure whether or not it's running, you can always run: +# ps aux | grep karaf +# to see if the server is up and running. +/opt/karaf/bin/client feature:install wrapper +/opt/karaf/bin/client wrapper:install +/opt/karaf/bin/stop +``` + +### Creating and Starting the Karaf Service + +Installing the Karaf wrapper generates several service files that can be used on different types of systems. For this example installation on an Ubuntu 18.04 machine, we want to enable the `karaf.service` service so that Karaf is properly started on boot. + +```bash +sudo systemctl enable /opt/karaf/bin/karaf.service +sudo systemctl start karaf +``` + +## Alpaca 1.0.x + +### Adding the Required Karaf Repositories + +Karaf features can be installed from several different types of sources, but the fastest and easiest way to do so is from existing repository URLs that we can just plug into Karaf to provide us feature lists prepared and ready for installation. Like most interactions with Karaf, we can add these repositories using its built-in `client`. + +!!! notice + These repositories are updated consistently, and their updates include revised dependency lists. Commonly, when repositories are out of date or otherwise mismatched, feature installation can result in an `Unable to resolve root: missing requirement` error; for this reason, this guide recommends using recently-updated versions of these repositories. That being said, if such errors occur despite installing the latest versions of these features, the maintainer of the features repository should be informed. + +For the Karaf features we’re going to install, we need a few different repositories to be added to the list: + +```bash +/opt/karaf/bin/client repo-add mvn:org.apache.activemq/activemq-karaf/ACTIVEMQ_KARAF_VERSION/xml/features +/opt/karaf/bin/client repo-add mvn:org.apache.camel.karaf/apache-camel/APACHE_CAMEL_VERSION/xml/features +/opt/karaf/bin/client repo-add mvn:ca.islandora.alpaca/islandora-karaf/LATEST/xml/features +# XXX: This shouldn't be strictly necessary, but appears to be a missing +# upstream dependency for some fcrepo features. +/opt/karaf/bin/client repo-add mvn:org.apache.jena/jena-osgi-features/JENA_OSGI_VERSION/xml/features +``` +- `ACTIVEMQ_KARAF_VERSION`: The latest version of ActiveMQ Karaf 5.x.x; you can find this listed at the [activemq-karaf repository page](https://mvnrepository.com/artifact/org.apache.activemq/activemq-karaf) (e.g., 5.15.11 at the time of writing) +- `APACHE_CAMEL_VERSION`: The latest version of Apache Camel 2.x.x; you can find this listed at the [apache-camel repository page](https://mvnrepository.com/artifact/org.apache.camel.karaf/apache-camel) (e.g., 2.25.0 at the time of writing) +- `JENA_OSGI_VERSION`: The latest version of the Apache Jena OSGi features; you can find this listed at the [jena-osgi-features repository page](https://mvnrepository.com/artifact/org.apache.jena/jena-osgi-features) (e.g., 3.14.0 at the time of writing) + +### Configuring Karaf Features + +Our installed Karaf features require configuration files to know exactly where to route things coming and going from them. + +`/opt/karaf/etc/ca.islandora.alpaca.http.client.cfg | karaf:karaf/644` +``` +token.value=ISLANDORA_SYN_TOKEN +``` +- `ISLANDORA_SYN_TOKEN`: This should be the same token that was established during the installation of Syn in your `syn-settings.xml` file + +`/opt/karaf/etc/org.fcrepo.camel.indexing.triplestore.cfg | karaf:karaf/644` +``` +input.stream=activemq:topic:fedora +triplestore.reindex.stream=activemq:queue:triplestore.reindex +triplestore.baseUrl=http://localhost:8080/blazegraph/namespace/islandora/sparql +``` + +`/opt/karaf/etc/ca.islandora.alpaca.indexing.triplestore.cfg | karaf:karaf/644` +``` +error.maxRedeliveries=10 +index.stream=activemq:queue:islandora-indexing-triplestore-index +delete.stream=activemq:queue:islandora-indexing-triplestore-delete +triplestore.baseUrl=http://localhost:8080/blazegraph/namespace/islandora/sparql +``` + +`/opt/karaf/etc/ca.islandora.alpaca.indexing.fcrepo.cfg | karaf:karaf/644` +``` +error.maxRedeliveries=5 +node.stream=activemq:queue:islandora-indexing-fcrepo-content +node.delete.stream=activemq:queue:islandora-indexing-fcrepo-delete +media.stream=activemq:queue:islandora-indexing-fcrepo-media +file.stream=activemq:queue:islandora-indexing-fcrepo-file +file.delete.stream=activemq:queue:islandora-indexing-fcrepo-file-delete +milliner.baseUrl=http://localhost/milliner +gemini.baseUrl=http://localhost/gemini +``` + +### Blueprinting Karaf Derivative Connectors + +For those services in Crayfish we have set up to provide derivatives to Islandora resources, we need connector blueprints to tell the derivative connector how to route incoming requests, run conversions, and return outgoing derivatives. + +Our blueprints are going to look largely similar between services, with only a few properties changing between them. Largely, these mainly just need to match the ActiveMQ queues we established in the previous configuration, and route to the correct Crayfish service. + +`/opt/karaf/deploy/ca.islandora.alpaca.connector.ocr.blueprint.xml | karaf:karaf/644` +```xml + + + + + + + + + + + + + + + + + + ca.islandora.alpaca.connector.derivative + + + +``` + +`/opt/karaf/deploy/ca.islandora.alpaca.connector.houdini.blueprint.xml | karaf:karaf/644` +```xml + + + + + + + + + + + + + + + + + + ca.islandora.alpaca.connector.derivative + + + +``` + +`/opt/karaf/deploy/ca.islandora.alpaca.connector.homarus.blueprint.xml | karaf:karaf/644` +```xml + + + + + + + + + + + + + + + + + + ca.islandora.alpaca.connector.derivative + + + +``` + +`/opt/karaf/deploy/ca.islandora.alpaca.connector.fits.blueprint.xml | karaf:karaf/644` +```xml + + + + + + + + + + + + + + + + + + ca.islandora.alpaca.connector.derivative + + + +``` + +### Installing the Required Karaf Features + +Before we can configure the features we’re going to use, they need to be installed. Some of these installations may take some time. + +```bash +/opt/karaf/bin/client feature:install camel-blueprint +/opt/karaf/bin/client feature:install activemq-blueprint +/opt/karaf/bin/client feature:install fcrepo-service-activemq +# This again should not be strictly necessary, since this isn't the triplestore +# we're using, but is being included here to resolve the aforementioned +# missing link in the dependency chain. +/opt/karaf/bin/client feature:install jena +/opt/karaf/bin/client feature:install fcrepo-camel +/opt/karaf/bin/client feature:install fcrepo-indexing-triplestore +/opt/karaf/bin/client feature:install islandora-http-client +/opt/karaf/bin/client feature:install islandora-indexing-triplestore +/opt/karaf/bin/client feature:install islandora-indexing-fcrepo +/opt/karaf/bin/client feature:install islandora-connector-derivative +``` + +### Verifying Karaf Components are Running (Optional But Recommended) + +At this point, Karaf components should be up and running, but it's a good idea to double-check that this is the case. We can do this from within the Karaf client by taking a look at its component list. + +```bash +# Until this point, we've been running Karaf commands from outside; we can hop +# into the client, however, and run commands from directly within. +/opt/karaf/bin/client +# This takes us into the Karaf client so we can run commands. +la | grep islandora +la | grep fcrepo +# It may be a good idea to use this to look up to the other components we +# installed. +logout +``` + +For the above `la | grep` commands, components that are running should be listed as `Active`. diff --git a/docs/installation/manual/installing_solr.md b/docs/installation/manual/installing_solr.md new file mode 100644 index 000000000..310db03c3 --- /dev/null +++ b/docs/installation/manual/installing_solr.md @@ -0,0 +1,138 @@ +# Installing Solr + +## In this section, we will install: +- [Apache Solr 8](https://lucene.apache.org/solr/), the search engine used to index and find Drupal content +- [search_api_solr](https://www.drupal.org/project/search_api_solr), the Solr implementation of Drupal's search API + +## Solr 8 + +### Downloading and Placing Solr + +The Solr binaries can be found at the [Solr downloads page](https://lucene.apache.org/solr/downloads.html); the most recent stable release of Solr 8 should be used. + +```bash +# While generally we download tarballs as .tar.gz files without version +# information, the Solr installer is a bit particular in that it expects a .tgz +# file with the same name as the extracted folder it contains. It's odd, and we +# can't really get around it. +wget SOLR_DOWNLOAD_LINK +tar -xzvf SOLR_TARBALL +``` +- `SOLR_DOWNLOAD_LINK`: This will depend on a few different things, not least of all the current version of Solr. The link to the `.tgz` for the binary on the downloads page will take you to a list of mirrors that Solr can be downloaded from, and provide you with a preferred mirror at the top. This preferred mirror should be used as the `SOLR_DOWNLOAD_LINK`. +- `SOLR_TARBALL`: The filename that was downloaded, e.g., `solr-8.3.0.tgz` + +### Running the Solr Installer + +Solr includes an installer that does most of the heavy lifting of ensuring we have a Solr user, a location where Solr lives, and configurations in place to ensure it’s running on boot. + +```bash +sudo UNTARRED_SOLR_FOLDER/bin/install_solr_service.sh SOLR_TARBALL +``` +- `UNTARRED_SOLR_FOLDER`: This will likely simply be `solr-VERSION`, where `VERSION` is the version number that was downloaded. + +The port that Solr runs on can potentially be configured at ths point, but we'll expect it to be running on `8983`. + +### Increasing the Open File Limit (Optional) + +Solr's installation guide recommends that you increase the open file limit so that operations aren't disrupted while Solr is trying to access things in its index. This limit can be increased while the system is running, but doing so won't persist after a reboot. You can hard-increase this limit using your system's `sysctl` file: + +`/etc/sysctl.conf` + +**Before**: +> 77 | #fs.protected_symlinks=0 + +**After**: +> 77 | #fs.protected_symlinks=0 + +> 78 | fs.file-max = 65535 + +Then apply your new configuration. + +```bash +sudo sysctl -p +``` + +### Creating a New Solr Core + +Initially, our new Solr core will contain a configuration copied from the example included with the installation, so that we have something to work with when we configure this on the Drupal side. We’ll later update this with generated configurations we create in Drupal. + +```bash +cd /opt/solr +sudo mkdir -p /var/solr/data/SOLR_CORE/conf +sudo cp -r example/files/conf/* /var/solr/data/SOLR_CORE/conf +sudo chown -R solr:solr /var/solr +sudo -u solr bin/solr create -c SOLR_CORE -p 8983 +``` +- `SOLR_CORE`: `islandora8` + +### Installing `search_api_solr` + +Rather than use an out-of-the-box configuration that won’t be suitable for our purposes, we’re going to use the Drupal `search_api_solr` module to generate one for us. This will also require us to install the module so we can create these configurations using Drush. + +```bash +cd /opt/drupal +sudo -u www-data composer require drupal/search_api_solr:^3.0 +drush -y en search_api_solr +``` + +### Configuring search_api_solr + +Before we can create configurations to use with Solr, the core we created earlier needs to be referenced in Drupal. + +Log in to the Drupal site at `/user` using the sitewide administrator username and password, then navigate to `/admin/config/search/search-api/add-server`. + +Fill out the server addition form using the following options: + +![Adding a Solr Search Server](../../assets/adding_a_solr_search_server.png) + +![Configuring the Standard Solr Connector](../../assets/configuring_standard_solr_connector.png) + +![Setting the Solr Install Directory](../../assets/setting_the_solr_install_directory.png) + +- `SERVER_NAME`: `islandora8` + - This is completely arbitrary, and is simply used to differentiate this search server configuration from all others. **Write down** or otherwise pay attention to the `machine_name` generated next to the server name you type in; this will be used in the next step. + +As a recap for this configuration: + +- **Server name** should be an arbitrary identifier for this server +- **Enabled** should be checked +- **Backend** should be set to **Solr** +- Under **CONFIGURE SOLR BACKEND**, **Solr Connector** should be set to **Standard** +- Under **CONFIGURE STANDARD SOLR CONNECTOR**: + - **HTTP protocol** is simply set to **http** since we've set this up on the same machine Drupal lives on. On a production installation, Solr should likely be installed behind an HTTPS connection. + - **Solr host** can be set to **localhost** since, again, this is set up on the same machine Drupal lives on. On a production installation, this may vary, especially if parts of the installation live on different severs + - **Solr port** should be set to the port Solr was installed on, which is **8983** by default + - **Solr path** should be set to the configured path to the instance of Solr; in a default installation, there is only one Solr instance, and it lives at **/** + - **Solr core** should be the name of the Solr core you created earlier, which is why it's listed as **SOLR_CORE** here +- Under **ADVANCED SERVER CONFIGURATION**, **solr.install.dir** should be set to the path where we installed Solr, which this guide has established at **/opt/solr** + +Click **Save** to create the server configuration. + +!!! notice + You can ignore the error about an incompatible Solr schema; we're going to set this up in the next step. In fact, if you refresh the page after restarting Solr in the next step, you should see the error disappear. + +### Generating and Applying Solr Configurations + +Now that our core is in place and our Drupal-side configurations exist, we’re ready to generate Solr configuration files to connect this site to our search engine. + +```bash +cd /opt/drupal +drush solr-gsc SERVER_MACHINE_NAME /opt/drupal/solrconfig.zip +unzip -d ~/solrconfig solrconfig.zip +sudo cp ~/solrconfig/* /var/solr/data/SOLR_CORE/conf +sudo systemctl restart solr +``` +- `SERVER_MACHINE_NAME`: This should be the `machine_name` that was automatically generated when creating the configuration in the above step. + +### Adding an Index + +In order for content to be indexed back into Solr, a search index needs to be added to our server. Navigate to `/admin/config/search/search-api/add-index` and check off the things you'd like to be indexed. + +!!! notice + You should come back here later and reconfigure this after completing the last step in this guide. The default indexing configuration is pretty permissive, and you may want to restrict, for example, indexed content to just Islandora-centric bundles. This guide doesn't set up the index's fields either, which are going to be almost wholly dependent on the needs of your installation. Once you complete that configuration later on, re-index Solr from the configuration page of the index we're creating here. + +![Adding a Search Index](../../assets/adding_a_search_index.png) + +![Specifying the Solr Server](../../assets/specifying_the_solr_server.png) + +Click **Save** to add your index and kick off indexing of existing items. diff --git a/docs/installation/manual/installing_tomcat_and_cantaloupe.md b/docs/installation/manual/installing_tomcat_and_cantaloupe.md new file mode 100644 index 000000000..fd034fec2 --- /dev/null +++ b/docs/installation/manual/installing_tomcat_and_cantaloupe.md @@ -0,0 +1,152 @@ +# Installing Tomcat and Cantaloupe + +## In this section, we will install: +- [Tomcat 8](https://tomcat.apache.org/download-80.cgi), the Java servlet container that will serve up some Java applications on various endpoints, including, importantly, Fedora +- [Cantaloupe 4](https://cantaloupe-project.github.io/), the image tileserver - running in Tomcat - that will be used to serve up large images in a web-accessible fashion + +## Tomcat 8 + +### Installing OpenJDK 8 + +Tomcat runs in a Java runtime environment, so we'll need one to continue. In our case, OpenJDK 8 is open-source, free to use, and can fairly simply be installed using `apt-get`: + +```bash +sudo apt-get -y install openjdk-8-jdk openjdk-8-jre +``` + +The installation of OpenJDK via `apt-get` establishes it as the de-facto Java runtime environment to be used on the system, so no further configuration is required. + +The resultant location of the java JRE binary (and therefore, the correct value of `JAVA_HOME` when it’s referenced) will vary based on the specifics of the machine it’s being installed on; that being said, you can find its exact location using `update-alternatives`: + +```bash +update-alternatives --list java +``` + +### Creating a `tomcat` User + +Apache Tomcat, and all its processes, will be owned and managed by a specific user for the purposes of keeping parts of the stack segregated and accountable. + +```bash +sudo addgroup tomcat +sudo adduser tomcat --ingroup tomcat --home /opt/tomcat --shell /usr/bin +``` + +You will be prompted to create a password for the `tomcat` user; all the other information as part of the `adduser` command can be ignored. + +### Downloading and Placing Tomcat 8 + +Tomcat 8 itself can be installed in several different ways; while it’s possible to install via `apt-get`, this doesn’t give us a great deal of control over exactly how we’re going to run and manage it; as a critical part of the stack, it is beneficial for our purposes to have a good frame of reference for the inner workings of Tomcat. + +We’re going to download the latest version of Tomcat to `/opt` and set it up so that it runs automatically. Bear in mind that with the following commands, this is going to be entirely relative to the current version of Tomcat 8, which we’ll try to mitigate as we go. + +```bash +cd /opt +sudo wget -O tomcat.tar.gz TOMCAT_TARBALL_LINK +sudo tar -zxvf tomcat.tar.gz +sudo mv /opt/TOMCAT_DIRECTORY/* /opt/tomcat +sudo chown -R tomcat:tomcat /opt/tomcat +``` +- `TOMCAT_TARBALL_LINK`: No default can be provided here; you should navigate to the [Tomcat 8 downloads page](https://tomcat.apache.org/download-80.cgi) and grab the link to the latest `.tar.gz` file under the “Core” section of “Binary Distributions”. It is highly recommended to grab the latest version of Tomcat 8, as it will come with associated security patches and fixes. +- `TOMCAT_DIRECTORY`: This will also depend entirely on the exact version of tomcat downloaded - for example, `apache-tomcat-8.5.47`. Again, `ls /opt` can be used to find this. + +### Creating a setenv.sh Script + +When Tomcat runs, some configuration needs to be pre-established as a series of environment variables that will be used by the script that runs it. + +`/opt/tomcat/bin/setenv.sh | tomcat:tomcat/755` +``` +export CATALINA_HOME="/opt/tomcat" +export JAVA_HOME="PATH_TO_JAVA_HOME" +export JAVA_OPTS="-Djava.awt.headless=true -server -Xmx1500m -Xms1000m" +``` +- `PATH_TO_JAVA_HOME`: This will vary a bit depending on the environment, but will likely live in `/usr/lib/jvm` somewhere (e.g., `/usr/lib/jvm/java-8-openjdk-amd64` for an installation on a machine with an AMD processor); again, in an Ubunutu environment you can check a part of this using `update-alternatives --list java`, which will give you the path to the JRE binary within the Java home + +### Creating the Tomcat Service + +Tomcat includes two shell scripts we’re going to make use of - `startup.sh` and `shutdown.sh` - which are light wrappers on top of a third script, `catalina.sh`, which manages spinning up and shutting down the Tomcat server. + +Ubuntu 18.04 uses `systemctl` to manage services; we’re going to create a .service file that can run these shell scripts. + +`/etc/systemd/system/tomcat.service | root:root/755` +``` +[Unit] +Description=Tomcat + +[Service] +Type=forking +ExecStart=/opt/tomcat/bin/startup.sh +ExecStop=/opt/tomcat/bin/shutdown.sh +SyslogIdentifier=tomcat + +[Install] +WantedBy=multi-user.target +``` + +### Enabling and Starting Tomcat + +We’re going to both `enable` and `start` Tomcat. Enabling Tomcat will ensure that it starts on boot, the timing of which is defined by the `[Install]` section’s `WantedBy` statement, which specifies what it should start after. This is separate from starting it, which we need to do now in order to get Tomcat up and running without requiring a reboot. + +```bash +sudo systemctl enable tomcat +sudo systemctl start tomcat +``` + +We can check that Tomcat has started by running `systemctl status tomcat | grep Active`; we should see that Tomcat is `active (running)`, which is the correct result of startup.sh finishing its run successfully. + +## Installing Cantaloupe 4 + +### Stopping the Tomcat service + +Before we start working with Cantaloupe, we should `stop` Tomcat; otherwise, Cantaloupe will automatically be deployed from its .war file, and we’d like everything to be in place before the deployment. + +```bash +sudo systemctl stop tomcat +``` + +### Downloading and Placing the Cantaloupe WAR + +Releases of Cantaloupe live on the [Cantaloupe release page](https://github.com/cantaloupe-project/cantaloupe/releases); the latest version can be found here as a `.zip` file. + +```bash +sudo wget -O /opt/cantaloupe.zip CANTALOUPE_RELEASE_URL +sudo unzip /opt/cantaloupe.zip +sudo cp CANTALOUPE_DIR/CANTALOUPE_WAR /opt/tomcat/webapps/cantaloupe.war +sudo chown tomcat:tomcat /opt/tomcat/webapps/cantaloupe.war +``` +- `CANTALOUPE_RELEASE_URL`: It’s recommended we grab the latest version of Cantaloupe 4. This can be found on the above-linked release page, as the `.zip` version; for example, https://github.com/cantaloupe-project/cantaloupe/releases/download/v4.1.4/cantaloupe-4.1.4.zip +- `CANTALOUPE_DIR`: This will depend on the exact version of Cantaloupe downloaded; in the above example release, this would be `cantaloupe-4.1.4` +- `CANTALOUPE_WAR`: This will also depend on the exact version of Cantaloupe downloaded; in the above example release, this would be `cantaloupe-4.1.4.war` + +### Creating a Cantaloupe Configuration + +Cantaloupe pulls its configuration from a file called `cantaloupe.properties`; there are also some other files that can contain instructions for Cantaloupe while it’s running; specifically, we’re going to copy over the `delegates.rb` file, which can also contain custom configuration. We won’t make use of this file; we’re just copying it over for demonstration purposes. + +Creating these files from scratch is *not* recommended; rather, we’re going to take the default cantaloupe configurations and plop them into their own folder so we can work with them. + +```bash +sudo mkdir /opt/cantaloupe_config +sudo cp CANTALOUPE_DIR/cantaloupe.properties.sample /opt/cantaloupe_config/cantaloupe.properties +sudo cp CANTALOUPE_DIR/delegates.rb.sample /opt/cantaloupe_config/delegates.rb +``` + +The out-of-the-box configuration will work fine for our purposes, but it’s highly recommended that you take a look through the `cantaloupe.properties` and see what changes can be made; specifically, logging to actual logfiles isn’t set up by default, so you may want to take a peek at the `log.application.SyslogAppender` or `log.application.RollingFileAppender`, as well as changing the logging level. + +### Defining the Cantaloupe Configuration Location + +Now that we have a Cantaloupe configuration, we need to make a change to Tomcat’s `JAVA_OPTS` so that its location can be referenced when Tomcat spins it up. This will involve changing the `setenv.sh` created when setting up Tomcat. + +`/opt/tomcat/bin/setenv.sh` + +**Before**: +> 3 | export JAVA_OPTS="-Djava.awt.headless=true -server -Xmx1500m -Xms1000m" + +**After**: +> 3 | export JAVA_OPTS="-Djava.awt.headless=true -Dcantaloupe.config=/opt/cantaloupe_config/cantaloupe.properties -server -Xmx1500m -Xms1000m" + +### Starting the Tomcat Service + +After Cantaloupe has been completely provisioned, we’re ready to switch Tomcat back on so that Cantaloupe automatically deploys with the established configuration. + +```bash +sudo systemctl start tomcat +``` diff --git a/docs/installation/manual/introduction.md b/docs/installation/manual/introduction.md new file mode 100644 index 000000000..6baa4f250 --- /dev/null +++ b/docs/installation/manual/introduction.md @@ -0,0 +1,88 @@ +# Introduction + +!!! notice + The manual installation guide is not intended to describe *the* Islandora 8 installation but rather *an* Islandora 8 installation. The server created using this guide is not hardened, will not be easily scalable, and the components may not be configured in a way you consider easy to work with. A production instance of Islandora 8 should be installed and maintained by a professional with an understanding of Linux and server administration. + +This guide will contain generalized steps on installation and configuration of the various components, but will contain specific example commands for executing these steps on an Ubuntu 18.04 Server. + +## Some Prerequisite Knowledge + +This guide assumes the user has some knowledge: + +- A general idea of how to work on the command-line of a Linux server using Bash. Commands are described in detail, but servers are volatile, and knowledge is still assumed in case anything happens outside of your expectations. +- An understanding of how to modify files from the command line. Configurations will often need to be created or modified in order to get things up and running. This might involve using an application like `nano` or `vi`/`vim`, or creating these files locally and uploading them to the server. It should also generally be assumed that most of these configuration files will have to be created or edited using `sudo`, and that permissions and ownership may need to be specified on these files. + +## Conventions Used in This Guide + +### Chronological Organization + +The steps in this guide are listed in chronological order of installation and configuration. Some sections will reference variables and rely on components installed in previous sections. It does not account for skipping over or otherwise changing the order of installation steps; bear this in mind if you decide to do things out of the provided order. + +### Replacement Variables + +It is expected that the person setting up the site may want to use different usernames, passwords, and other such variables than the ones presented by default in this guide. Additionally, some defaults can't be provided, such as up-to-date version information for externally-provided components. In such cases, the replacement variables will be placed in all capital letters, and a description of the variables, any possible defaults, and how to get up-to-date information, will be listed below. + +### Bash Commands + +!!! notice + Command blocks are *always* assumed to start at the home folder of the user originally created during the server installation. **They are never run as `root`**; if root access is required, `sudo` will be specified, and if files are created belonging to `root` that should not belong to `root`, `chmod` and `chown` will be run against them immediately afterwards to ensure correct permissions. If commands need to be run from a different working directory, an absolute path will be specified to use with `cd`. If you're concerned about whether or not a code block can be run from your current working directory, run `cd ~` before executing any commands in it. + +Commands to be run on the command line will be placed in code blocks, with one command per line, and any replacement variables below, e.g., + +```shell +sudo run --this-command +python3 run.py /this/other/command --with-param PARAMETER +``` +- `PARAMETER`: `some_sensible_default`, perhaps with an explanation of why, or how to determine alternatives + +### Editing Files In Place + +When an individual file needs to be modified in place (as opposed to replacing it outright), a Before and After quote will be provided that identifies one or more lines in the file, what the default installed version of that file looks like on that line, and what the line should look like after it has been modified, like so: + +`/path/to/file_being_modified` + +**Before**: +> 174 | Here is what line 174 in the file looked like before + +> 175 | And here is what the following line looked like in the file before + +**After**: +> 174 | Here is what line 174 should look like after modification + +> 175 | And here is what the following line should look like after modification: VARIABLE + +- `VARIABLE`: `some_value`, perhaps with an explanation of why + +It should be noted that configuration files and the like are subject to change by the organizations that maintain their respective applications; this guide generally recommends installing the latest version of these applications, as these generally include security updates. It is expected that the implementer will be able to search through a file and find specific lines in the case where the maintaining organization has moved it in a subsequent patch. In most cases, configuration files will be provided outright to avoid these scenarios. + +### Adding or Replacing Files + +When a file needs to be added or replaced, it will be described in three sections: + +- A line that describes the path to the file, as well as the owner, group, and umask for the file; it is assumed that the person following the guide will use `chmod` and `chown` appropriately to apply the owner, group, and umask +- The entire contents of the file in a code block, including any portions that need to be replaced with specific values +- Those replacement values + +`/the/path/to/some/file.php | owner:group/umask` +```php + +``` +- `THE_NUMBER_TO_ADD_TO_THIS`: 12, perhaps with an explanation of why, or other numbers that may be appropriate + +### Troubleshooting + +The most common issues you will likely run into when manually provisioning a server are: + +- Files or directories are not owned by the user who needs access to them, and can therefore not be written to. Check the ownership of files using `ls -la`, and ensure their ownership using `chown USER` for files, and `chown -R USER` for directories +- Replacement variables were left in place in files specified by the guide. Ensure any replacement variables such as server addresses and passwords are swapped out when writing files to the server + +For any other issues, don't hesitate to email the [mailing list](mailto:islandora@googlegroups.com) to ask for help. If you think that a part of the installation documentation is incorrect or could be improved, please create an issue in the [documentation issues queue](http://github.com/Islandora/documentation/issues) and give it a `documentation` tag. Bear in mind that this guide is built for Ubuntu 18.04 and attempts to give generalized instructions; you will likely naturally encounter situations where your own environment needs to differ from the guide. diff --git a/docs/installation/manual/preparing_a_webserver.md b/docs/installation/manual/preparing_a_webserver.md new file mode 100644 index 000000000..89a6dc298 --- /dev/null +++ b/docs/installation/manual/preparing_a_webserver.md @@ -0,0 +1,97 @@ +# Preparing a LAPP Server + +## In this section, we will install: + +- [Apache 2](https://httpd.apache.org/), the webserver that will deliver webpages to end users +- [PHP 7](https://www.php.net/), the runtime code interpreter that Drupal will use to generate webpages and other services via apache, as well as that Drush and Composer will use to run tasks from the command line +- Several modules for PHP 7 which are required to run the PHP code that Drupal and other applications will be executing +- [PostgreSQL 10](https://www.postgresql.org/), the database that Drupal will use for storage (as well as other applications down the line) + +## Apache 2 + +### Install Apache 2 + +Apache can typically be installed and configured outright by your operating system’s package manager: + +```bash +sudo apt-get -y install apache2 apache2-utils +``` + +This will install: + +- A `systemd` service that will ensure Apache can be stopped and started, and will run when the machine is powered on +- A set of Apache configurations in `/etc/apache2`, including the basic configuration, ports configuration, enabled mods, and enabled sites +- An Apache webroot in `/var/www/html`, configured to be the provided server on port `:80` in `/etc/apache2/sites-enabled/000-default.conf`; we’ll make changes and additions to this file later +- A user and group, `www-data`, which we will use to read/write web documents. + +### Enable Apache Mods + +We’re going to enable a couple of Apache mods that Drupal highly recommends installing, and which are de-facto considered required by Islandora: + +```bash +sudo a2enmod ssl +sudo a2enmod rewrite +sudo systemctl restart apache2 +``` + +### Add the Current User to the `www-data` Group + +Since the user we are currently logged in as is going to work quite a bit inside the Drupal directory, we want to give it group permissions to anything the `www-data` group has access to. When we run `composer`, `www-data` will also be caching data in our own home directory, so we want this group modification to go in both directions. + +**N.B.** This code block uses **backticks**, not single quotes; this is an important distinction as backticks have special meaning in `bash`. + +```bash +sudo usermod -a -G www-data `whoami` +sudo usermod -a -G `whoami` www-data +# Immediately log back in to apply the new group. +sudo su `whoami` +``` + +## PHP 7.2 + +### Install PHP 7.2 + +PHP can generally be easily installed using your operating system’s package manager, though whether or not the version you’ll be given is up to date depends entirely on whether or not that package manager is kept up-to-date. We’re going to enable both PHP 7.2, as well as the myriad modules we require, simultaneously: + +```bash +sudo apt-get -y install php7.2 php7.2-cli php7.2-common php7.2-curl php7.2-dev php7.2-gd php7.2-imap php7.2-json php7.2-mbstring php7.2-opcache php7.2-xml php7.2-yaml php7.2-zip libapache2-mod-php7.2 php-pgsql php-redis php-xdebug unzip +``` + +This will install a series of PHP configurations and mods in `/etc/php/7.2`, including: + +- A `mods-available` folder (from which everything is typically enabled by default) +- A configuration for PHP when run from Apache in the `apache2` folder +- A configuration for PHP when run from the command line - including when run via Drush - in the `cli` folder +- `unzip`, which is important for PHP’s zip module to function correctly despite it not being a direct dependency of the module. We will also need to unzip some things later, so this is convenient to have in place early in the installation process. + +## PostgreSQL 10 + +### Install PostgreSQL 10 + +PostgreSQL can generally be easily installed using your operating system’s package manager. It is typically sensible to install the version the system recognizes as up-to-date; Ubuntu 18.04 sees this as version 10. We’re simply going to install the database software: + +```bash +sudo apt-get -y install postgresql +``` + +This will install: + +- A user at the system level named `postgres`; this will be the only user, by default, that has permission to run the `psql` binary and have access to Postgres configurations +- A binary executable at `/usr/bin/psql`, which anyone - even `root` - will get kicked out of the moment they run it, since only the `postgres` user has permission to run any Postgres commands +- A series of configurations that live in `/etc/postgresql/10/main` which can be used to modify how PostgreSQL works. + +### Configure Postgresql 10 For Use With Drupal + +A modification needs to be made to the PostgreSQL configuration in order for Drupal to properly install and function. This change can be made to the main configuration file at `/etc/postgresql/10/main/postgresql.conf`: + +**Before**: +> 558 | #bytea_output = ‘hex’ # hex, escape + +**After**: +> 558 | bytea_output = ‘escape’ + +The `postgresql` service should be restarted to accept the new configuration: + +```bash +sudo systemctl restart postgresql +``` diff --git a/docs/installation.md b/docs/installation/playbook.md similarity index 57% rename from docs/installation.md rename to docs/installation/playbook.md index e80f10d8d..83747f43a 100644 --- a/docs/installation.md +++ b/docs/installation/playbook.md @@ -1,81 +1,93 @@ -Islandora 8 is installed through an Ansible Playbook called [islandora-playbook](https://github.com/Islandora-Devops/islandora-playbook). +The fastest way to get up and running with Islandora 8 is through an Ansible Playbook called [islandora-playbook](https://github.com/Islandora-Devops/islandora-playbook). It can be used to spin up a local environment using [Vagrant](https://www.vagrantup.com/), or to provision an existing machine. ## Requirements Download and install the following: 1. [Virtual Box](https://www.virtualbox.org/) -1. [Vagrant](https://www.vagrantup.com/) (version 2.0 or required) +2. [Vagrant](https://www.vagrantup.com/) (version 2.0 or higher required) +3. [Git](https://git-scm.com/) +4. [OpenSSL](https://www.openssl.org/) +5. [Ansible](https://www.ansible.com/community) (up to, and not past, 2.8.7) -Then use your package manager of choice to get [Git](https://git-scm.com/). +#### Ubuntu/Debian -``` -# Ubuntu -$ sudo apt-get install git - -# Centos -$ sudo yum install git - -# OSX -$ brew install git -``` - -As well as [OpenSSL](https://www.openssl.org/) +Git and OpenSSL are available via `apt`. [Ansible](https://www.ansible.com/community) up to version 2.8.7. This is done best with `pip`, the python package manager: ``` -# Ubuntu +# Install git and openssl +$ sudo apt-get install git $ sudo apt-get install openssl - -# Centos -$ sudo yum install openssl - -# OSX -$ brew install openssl +# If pip isn’t already available, run the following commands to install it +$ curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py +$ python get-pip.py --user +# Install ansible +$ pip install --user -Iv ansible==2.8.7 ``` -Finally, install [Ansible](https://www.ansible.com/community) up to version 2.8.7. This is done best with `pip`, the python package manager. If pip isn’t already available on your system of Python, run the following commands to install it: +#### CentOS + +Git and OpenSSL are available via `yum`. Most everything else can be installed in the same way. ``` +$ sudo yum install git +$ sudo yum install openssl +# If pip isn’t already available, run the following commands to install it $ curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py $ python get-pip.py --user +# Install ansible +$ pip install --user -Iv ansible==2.8.7 ``` -Then pin down Ansible to 2.8.7 +#### MacOS + +OpenSSL is already pre-installed on MacOS. Python and Pip should be installed via the downloaded installer direct from the site. For the installation of Ansible, consider using [homebrew](https://brew.sh/): ``` -$ pip install --user -Iv ansible==2.8.7 +# Use xcode-select to install command line components, including git +$ xcode-select --install +$ /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)" +$ brew install ansible@2.8.7 ``` -If you want to provision a CENTOS 7 environment, you'll also need to install the [vbguest](https://github.com/dotless-de/vagrant-vbguest) -plugin for Vagrant +## Installing a local development environment + +Before provisioning a local environment, you should likely double check that no [required ports](#port-clashes-for-local-environments) are currently in use. + +Clone the `islandora-playbook` and use `vagrant up` to automatically provision an environment. + +#### Ubuntu 18.04 or MacOS ```bash -$ vagrant plugin install vagrant-vbguest +$ git clone https://github.com/Islandora-Devops/islandora-playbook +$ cd islandora-playbook +$ vagrant up ``` -## Installing a local development environment +#### CentOS 7 -Once you've installed all the requirements, you can spin up a local development environment with ```bash $ git clone https://github.com/Islandora-Devops/islandora-playbook $ cd islandora-playbook -$ vagrant up +$ vagrant plugin install vagrant-vbguest +$ ISLANDORA_DISTRO="centos/7" vagrant up ``` -By default, this provisions an Ubuntu 18.04 environment. If you would prefer to use CENTOS 7 instead, set the `ISLANDORA_DISTRO` -environment variable to `centos/7`. To prevent having to do this every time you open a new shell, add the following command to -your `.bashrc` file +Or, for simplicity's sake, add the following to your user profile (e.g., `.bashrc` on Ubuntu/Debian environments, or `.bash_profile` on MacOS): ```bash -$ export ISLANDORA_DISTRO="centos/7" +export ISLANDORA_DISTRO="centos/7" ``` ## Installing a remote environment -If you want to provision a remote server using the playbook, there's a handful of configuration entries you need to update to include your -usernames/passwords and IP addresses. You'll also want Apache to serve at port 80 as opposed to 8000, which we use for development -purposes. To start, take the inventory for the vagrant development environment and copy it. Be sure to -give it an appropriate name. Here we're using `example`. +A remote environment can be provisioned by providing SSH credentials to `claw-playbook` and using the `ansible-galaxy` installer instead of Vagrant. Some preparation of configuration entries in the `inventory` also need to be changed to be aware of the particulars of your remote environment; this includes: + +- Changing usernames and passwords to something more sensible than the default +- Changing IP addresses to use the remote machine's actual IP +- Changing Apache to serve at port 80 (as opposed to 8000, which we use for development purposes) + +We're going to build up this new remote environment configuration from the default provided Vagrant configuration. To start, take the inventory for the vagrant development environment and make a copy of it. Be sure to give it an appropriate name. Here we're using `example`. ```bash $ git clone https://github.com/Islandora-Devops/islandora-playbook @@ -116,8 +128,8 @@ crayfish_recast_gemini_base_url: http://example.org/gemini ``` #### group_vars/karaf.yml -Unfortunately, you have to copy/paste this whole chunk into the yml, even though you're only updating the URLs and -the `token.value` entry. + +For Alpaca, only the `token.value` and various URLs are of particular importance, but the entire configuration chunk is provided here for convenience. ```yml alpaca_settings: @@ -160,6 +172,7 @@ alpaca_blueprint_settings: ``` #### group_vars/tomcat.yml + ```yml fcrepo_allowed_external_content: - http://example.org/ @@ -167,12 +180,14 @@ cantaloupe_HttpResolver_BasicLookupStrategy_url_prefix: http://example.org/ ``` #### group_vars/webserver/apache.yml -Here's where you set the port to 80 instead of 8000. + +This is where we specify that the webserver is listening on the default port 80, instead of the development machine port 8000. ```yml apache_listen_port: 80 ``` #### group_vars/webserver/drupal.yml + ```yml drupal_trusted_hosts: - ^localhost$ @@ -181,14 +196,16 @@ fedora_base_url: "http://example.org:8080/fcrepo/rest/" ``` #### group_vars/webserver/general.yml + ```yml openseadragon_iiiv_server: http://example.org:8080/cantaloupe/iiif/2 matomo_site_url: http://example.org ``` #### hosts -You'll need the ssh particulars for logging into your server in the hosts file. This example is set up to login as `root` using -an ssh key. You'll need to get the details for logging into your remote server from your hosting provider (AWS, Digital Ocean, etc...) + +You'll need the SSH particulars for logging into your server in the hosts file. This example is set up to login as `root` using +an SSH key. You'll need to get the details for logging into your remote server from your hosting provider (AWS, Digital Ocean, etc...) or your systems administrator if you're running the server in-house. See [this page](https://docs.ansible.com/ansible/latest/user_guide/intro_inventory.html#connecting-to-hosts-behavioral-inventory-parameters) for more details about what you can put into a @@ -213,7 +230,7 @@ Then, depending on the operating system installed on the remote environment, you $ ansible-playbook -i inventory/production playbook.yml -e "islandora_distro=ubuntu/xenial64" ``` -or for CENTOS 7 +or for CentOS 7 ```bash $ ansible-playbook -i inventory/production playbook.yml -e "islandora_distro=centos/7" @@ -221,14 +238,38 @@ $ ansible-playbook -i inventory/production playbook.yml -e "islandora_distro=cen ## Troubleshooting +### Out of date playbooks + Ansible caches the code used to provision the environment, so if you've already installed once you may not be getting the latest version of things even if you've `git pull`'d the latest playbook. The code is stored in `roles/external`, so if you want to clear it out you can -run +remove these before attempting to provision an environment ```bash $ rm -rf roles/external ``` +### Port clashes for local environments + +When provisioning using a local environment, you should be aware of any ports that are already in use by your computer that are also going to be +used by Vagrant, as these may clash and cause problems during and after provisioning. These include: + +- 8000 (Apache) +- 8080 (Tomcat) +- 3306 (MySQL) +- 5432 (PostgreSQL) +- 8983 (Solr) +- 8161 (ActiveMQ) +- 8081 (API-X) + +If there are port clashes for any of these, you will need to either find and replace them in the configuration .yml files under +`inventory/vagrant/group_vars`, or provide new values for the different playbooks that support changing the ports (for example, `postgresql_databases` +supports adding a `port` property which is currently simply unused). You will also need to replace the port forwarding values in `Vagrantfile`. + +Additionally, Ansible attempts to use port 2200 for SSH. If this port is already in use, your local environment cannot be provisioned. To +change this, set a new value for `ansible_port` in `inventory/vagrant/hosts`. + +### Help + If you run into any issues installing the environment, do not hesitate to email the [mailing list](mailto:islandora@googlegroups.com) to ask for help. If you think you've stumbled across a bug in the installer, please create an issue in the [Islandora 8 issue queue](http://github.com/Islandora-CLAW/CLAW/issues) and give it an `ansible` tag. diff --git a/mkdocs.yml b/mkdocs.yml index a910dc8f6..8ad559ddf 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -34,7 +34,19 @@ extra: nav: - Summary: 'index.md' - - Installation: 'installation.md' + - Installation: + - 'Component Overview': 'installation/component_overview.md' + - 'Automatic Provisioning': 'installation/playbook.md' + - Manual Installation: + - 'Introduction': 'installation/manual/introduction.md' + - 'Preparing a LAPP Webserver': 'installation/manual/preparing_a_webserver.md' + - 'Installing Composer, Drush, and Drupal': 'installation/manual/installing_composer_drush_and_drupal.md' + - 'Installing Tomcat and Cantaloupe': 'installation/manual/installing_tomcat_and_cantaloupe.md' + - 'Installing Fedora, Syn, and Blazegraph': 'installation/manual/installing_fedora_syn_and_blazegraph.md' + - 'Installing Solr': 'installation/manual/installing_solr.md' + - 'Installing Crayfish': 'installation/manual/installing_crayfish.md' + - 'Installing Karaf and Alpaca': 'installation/manual/installing_karaf_and_alpaca.md' + - 'Configuring Drupal': 'installation/manual/configuring_drupal.md' - User Documentation: - 'Introduction': 'user-documentation/user-intro.md' - 'Video Documentation': 'user-documentation/video-docs.md'