Skip to content

ibm-watson-data-lab/detacher

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

detacher

If you are using Cloudant or CouchDB and occasionally storing binary attachments inside documents, then detacher may be for you. It is a serverless function that runs in IBM Cloud Functions (based on Apache OpenWhisk) that is invoked whenever a Cloudant document changes. If the document contains attachments, those documents are copied into Cloud Object Storage or AWS S3 and removed from the document.

This allows the Cloudant database to remain free of binary attachments with no loss of data.

Here is a typical document before

{
  "_id": "7",
  "_rev": "2-920d8da7eb1a1175fcbc10cf6f989d99",
  "first_name": "Glynn",
  "last_name": "Bird",
  "job": "Developer Advocate @ IBM",
  "twitter": "@glynn_bird",
  "_attachments": {
    "headshot.jpg": {
      "content_type": "image/jpeg",
      "revpos": 2,
      "digest": "md5-N0JXExRZxZaOD3sszjMXzA==",
      "length": 46998,
      "stub": true
    }
  }
}

CouchDB/Cloudant stores attached files in an object called _attachmments. After processing by detacher, the document is modified to look like this:

{
  "_id": "7",
  "_rev": "3-c3272191e6e94d3bd2a3d72145c7d4fd",
  "first_name": "Glynn",
  "last_name": "Bird",
  "job": "Developer Advocate @ IBM",
  "twitter": "@glynn_bird",
  "attachments": {
    "headshot.jpg": {
      "content_type": "image/jpeg",
      "revpos": 2,
      "digest": "md5-N0JXExRZxZaOD3sszjMXzA==",
      "length": 46998,
      "stub": true,
      "Location": "https://detacher.s3.eu-west-2.amazonaws.com/7-headshot.jpg",
      "Key": "7-headshot.jpg"
    }
  }
}

Notice that the _attachments key is no longer there: Cloudant is not storing the attachment anymore. In its place is attachments (without the underscore) which contains the same data but with an extra Location and Key which record where in your Object Storage the file is stored.

Pre-requisites

You need:

Installation

Ensure you have a new "bucket" in your Object Storage service and a new database in your Cloudant service.

Set up environment variables containing the credentials of your Cloudant service and Object storage service:

export CLOUDANT_HOST="myhost.cloudant.com"
export CLOUDANT_USERNAME="myusername"
export CLOUDANT_PASSWORD="mypassword"
export CLOUDANT_DATABASE="mydatabase"
export AWS_ACCESS_KEY_ID="ABC123"
export AWS_SECRET_ACCESS_KEY="XYZ987"
export AWS_BUCKET="mybucket"
export AWS_REGION="eu-west-2"
export AWS_ENDPOINT="https://ec2.eu-west-2.amazonaws.com"

If you are using Amazon S3, you can omit the AWS_ENDPOINT environment variable. For the IBM Cloud Object Storage service, the endpoints are listed here.

Then run the deploy.sh script

./deploy.sh

You can now add document to your database and add an attachment too it. In a few moments the document will have updated and will no longer contain attachments, but references to those files in your object storage.

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •