Skip to content

Commit

Permalink
feat: write objects to blob storage (#8557)
Browse files Browse the repository at this point in the history
* feat: basic blobstore infrastructure for dev

* refactor: (broken) attempt to put minio console behind nginx

* feat: initialize blobstore with boto3

* fix: abandon attempt to proxy minio. Use docker compose instead.

* feat: beginning of blob writes

* feat: storage utilities

* feat: test buckets

* chore: black

* chore: remove unused import

* chore: avoid f string when not needed

* fix: inform all settings files about blobstores

* fix: declare types for some settings

* ci: point to new target base

* ci: adjust test workflow

* fix: give the tests debug environment a blobstore

* fix: "better" name declarations

* ci: use devblobstore container

* chore: identify places to write to blobstorage

* chore: remove unreachable code

* feat: store materials

* feat: store statements

* feat: store status changes

* feat: store liaison attachments

* feat: store agendas provided with Interim session requests

* chore: capture TODOs

* feat: store polls and chatlogs

* chore: remove unneeded TODO

* feat: store drafts on submit and post

* fix: handle storage during doc expiration and resurrection

* fix: mirror an unlink

* chore: add/refine TODOs

* feat: store slide submissions

* fix: structure slide test correctly

* fix: correct sense of existence check

* feat: store some indexes

* feat: BlobShadowFileSystemStorage

* feat: shadow floorplans / host logos to the blob

* chore: remove unused import

* feat: strip path from blob shadow names

* feat: shadow photos / thumbs

* refactor: combine photo and photothumb blob kinds

The photos / thumbs were already dropped in the same
directory, so let's not add a distinction at this point.

* style: whitespace

* refactor: use kwargs consistently

* chore: migrations

* refactor: better deconstruct(); rebuild migrations

* fix: use new class in mack patch

* chore: add TODO

* feat: store group index documents

* chore: identify more TODO

* feat: store reviews

* fix: repair merge

* chore: remove unnecessary TODO

* feat: StoredObject metadata

* fix: deburr some debugging code

* fix: only set the deleted timestamp once

* chore: correct typo

* fix: get_or_create vs get and test

* fix: avoid the questionable is_seekable helper

* chore: capture future design consideration

* chore: blob store cfg for k8s

* chore: black

* chore: copyright

* ci: bucket name prefix option + run Black

Adds/uses DATATRACKER_BLOB_STORE_BUCKET_PREFIX option. Other changes
are just Black styling.

* ci: fix typo in bucket name expression

* chore: parameters in app-configure-blobstore

Allows use with other blob stores.

* ci: remove verify=False option

* fix: don't return value from __init__

* feat: option to log timing of S3Storage calls

* chore: units

* fix: deleted->null when storing a file

* style: Black

* feat: log as JSON; refactor to share code; handle exceptions

* ci: add ietf_log_blob_timing option for k8s

* test: --no-manage-blobstore option for running tests

* test: use blob store settings from env, if set

* test: actually set a couple more storage opts

* feat: offswitch (#8541)

* feat: offswitch

* fix: apply ENABLE_BLOBSTORAGE to BlobShadowFileSystemStorage behavior

* chore: log timing of blob reads

* chore: import Config from botocore.config

* chore(deps): import boto3-stubs / botocore

botocore is implicitly imported, but make it explicit
since we refer to it directly

* chore: drop type annotation that mypy loudly ignores

* refactor: add storage methods via mixin

Shares code between Document and DocHistory without
putting it in the base DocumentInfo class, which
lacks the name field. Also makes mypy happy.

* feat: add timeout / retry limit to boto client

* ci: let k8s config the timeouts via env

* chore: repair merge resolution typo

* chore: tweak settings imports

* chore: simplify k8s/settings_local.py imports

---------

Co-authored-by: Jennifer Richards <jennifer@staff.ietf.org>
  • Loading branch information
rjsparks and jennifer-richards authored Feb 19, 2025
1 parent e71272f commit 997239a
Show file tree
Hide file tree
Showing 64 changed files with 1,484 additions and 116 deletions.
4 changes: 4 additions & 0 deletions .devcontainer/docker-compose.extend.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,10 @@ services:
# - datatracker-vscode-ext:/root/.vscode-server/extensions
# Runs app on the same network as the database container, allows "forwardPorts" in devcontainer.json function.
network_mode: service:db
blobstore:
ports:
- '9000'
- '9001'

volumes:
datatracker-vscode-ext:
2 changes: 2 additions & 0 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,8 @@ jobs:
services:
db:
image: ghcr.io/ietf-tools/datatracker-db:latest
blobstore:
image: ghcr.io/ietf-tools/datatracker-devblobstore:latest

steps:
- uses: actions/checkout@v4
Expand Down
17 changes: 17 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,23 @@ Nightly database dumps of the datatracker are available as Docker images: `ghcr.

> Note that to update the database in your dev environment to the latest version, you should run the `docker/cleandb` script.

### Blob storage for dev/test

The dev and test environments use [minio](https://github.com/minio/minio) to provide local blob storage. See the settings files for how the app container communicates with the blobstore container. If you need to work with minio directly from outside the containers (to interact with its api or console), use `docker compose` from the top level directory of your clone to expose it at an ephemeral port.

```
$ docker compose port blobstore 9001
0.0.0.0:<some ephemeral port>

$ curl -I http://localhost:<some ephemeral port>
HTTP/1.1 200 OK
...
```
The minio container exposes the minio api at port 9000 and the minio console at port 9001
### Frontend Development
#### Intro
Expand Down
23 changes: 22 additions & 1 deletion dev/deploy-to-container/settings_local.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
# Copyright The IETF Trust 2007-2019, All Rights Reserved
# -*- coding: utf-8 -*-

from ietf.settings import * # pyflakes:ignore
from ietf.settings import * # pyflakes:ignore
from ietf.settings import STORAGES, MORE_STORAGE_NAMES, BLOBSTORAGE_CONNECT_TIMEOUT, BLOBSTORAGE_READ_TIMEOUT, BLOBSTORAGE_MAX_ATTEMPTS
import botocore.config

ALLOWED_HOSTS = ['*']

Expand Down Expand Up @@ -79,3 +81,22 @@

# OIDC configuration
SITE_URL = 'https://__HOSTNAME__'

for storagename in MORE_STORAGE_NAMES:
STORAGES[storagename] = {
"BACKEND": "ietf.doc.storage_backends.CustomS3Storage",
"OPTIONS": dict(
endpoint_url="http://blobstore:9000",
access_key="minio_root",
secret_key="minio_pass",
security_token=None,
client_config=botocore.config.Config(
signature_version="s3v4",
connect_timeout=BLOBSTORAGE_CONNECT_TIMEOUT,
read_timeout=BLOBSTORAGE_READ_TIMEOUT,
retries={"total_max_attempts": BLOBSTORAGE_MAX_ATTEMPTS},
),
verify=False,
bucket_name=f"test-{storagename}",
),
}
23 changes: 22 additions & 1 deletion dev/diff/settings_local.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
# Copyright The IETF Trust 2007-2019, All Rights Reserved
# -*- coding: utf-8 -*-

from ietf.settings import * # pyflakes:ignore
from ietf.settings import * # pyflakes:ignore
from ietf.settings import STORAGES, MORE_STORAGE_NAMES, BLOBSTORAGE_CONNECT_TIMEOUT, BLOBSTORAGE_READ_TIMEOUT, BLOBSTORAGE_MAX_ATTEMPTS
import botocore.config

ALLOWED_HOSTS = ['*']

Expand Down Expand Up @@ -66,3 +68,22 @@
SLIDE_STAGING_PATH = 'test/staging/'

DE_GFM_BINARY = '/usr/local/bin/de-gfm'

for storagename in MORE_STORAGE_NAMES:
STORAGES[storagename] = {
"BACKEND": "ietf.doc.storage_backends.CustomS3Storage",
"OPTIONS": dict(
endpoint_url="http://blobstore:9000",
access_key="minio_root",
secret_key="minio_pass",
security_token=None,
client_config=botocore.config.Config(
signature_version="s3v4",
connect_timeout=BLOBSTORAGE_CONNECT_TIMEOUT,
read_timeout=BLOBSTORAGE_READ_TIMEOUT,
retries={"total_max_attempts": BLOBSTORAGE_MAX_ATTEMPTS},
),
verify=False,
bucket_name=f"test-{storagename}",
),
}
3 changes: 3 additions & 0 deletions dev/tests/docker-compose.debug.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,5 +28,8 @@ services:
volumes:
- postgresdb-data:/var/lib/postgresql/data

blobstore:
image: ghcr.io/ietf-tools/datatracker-devblobstore:latest

volumes:
postgresdb-data:
23 changes: 22 additions & 1 deletion dev/tests/settings_local.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
# Copyright The IETF Trust 2007-2019, All Rights Reserved
# -*- coding: utf-8 -*-

from ietf.settings import * # pyflakes:ignore
from ietf.settings import * # pyflakes:ignore
from ietf.settings import STORAGES, MORE_STORAGE_NAMES, BLOBSTORAGE_CONNECT_TIMEOUT, BLOBSTORAGE_READ_TIMEOUT, BLOBSTORAGE_MAX_ATTEMPTS
import botocore.config

ALLOWED_HOSTS = ['*']

Expand Down Expand Up @@ -65,3 +67,22 @@
SLIDE_STAGING_PATH = 'test/staging/'

DE_GFM_BINARY = '/usr/local/bin/de-gfm'

for storagename in MORE_STORAGE_NAMES:
STORAGES[storagename] = {
"BACKEND": "ietf.doc.storage_backends.CustomS3Storage",
"OPTIONS": dict(
endpoint_url="http://blobstore:9000",
access_key="minio_root",
secret_key="minio_pass",
security_token=None,
client_config=botocore.config.Config(
signature_version="s3v4",
connect_timeout=BLOBSTORAGE_CONNECT_TIMEOUT,
read_timeout=BLOBSTORAGE_READ_TIMEOUT,
retries={"total_max_attempts": BLOBSTORAGE_MAX_ATTEMPTS},
),
verify=False,
bucket_name=f"test-{storagename}",
),
}
10 changes: 10 additions & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ services:
depends_on:
- db
- mq
- blobstore

ipc: host

Expand Down Expand Up @@ -83,6 +84,14 @@ services:
- .:/workspace
- app-assets:/assets

blobstore:
image: ghcr.io/ietf-tools/datatracker-devblobstore:latest
restart: unless-stopped
volumes:
- "minio-data:/data"



# Celery Beat is a periodic task runner. It is not normally needed for development,
# but can be enabled by uncommenting the following.
#
Expand All @@ -106,3 +115,4 @@ services:
volumes:
postgresdb-data:
app-assets:
minio-data:
4 changes: 2 additions & 2 deletions docker/app.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,8 @@ RUN rm -rf /tmp/library-scripts
# Copy the startup file
COPY docker/scripts/app-init.sh /docker-init.sh
COPY docker/scripts/app-start.sh /docker-start.sh
RUN sed -i 's/\r$//' /docker-init.sh && chmod +x /docker-init.sh
RUN sed -i 's/\r$//' /docker-start.sh && chmod +x /docker-start.sh
RUN sed -i 's/\r$//' /docker-init.sh && chmod +rx /docker-init.sh
RUN sed -i 's/\r$//' /docker-start.sh && chmod +rx /docker-start.sh

# Fix user UID / GID to match host
RUN groupmod --gid $USER_GID $USERNAME \
Expand Down
27 changes: 24 additions & 3 deletions docker/configs/settings_local.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
# Copyright The IETF Trust 2007-2019, All Rights Reserved
# Copyright The IETF Trust 2007-2025, All Rights Reserved
# -*- coding: utf-8 -*-

from ietf.settings import * # pyflakes:ignore
from ietf.settings import * # pyflakes:ignore
from ietf.settings import STORAGES, MORE_STORAGE_NAMES, BLOBSTORAGE_CONNECT_TIMEOUT, BLOBSTORAGE_READ_TIMEOUT, BLOBSTORAGE_MAX_ATTEMPTS
import botocore.config

ALLOWED_HOSTS = ['*']

from ietf.settings_postgresqldb import DATABASES # pyflakes:ignore
from ietf.settings_postgresqldb import DATABASES # pyflakes:ignore

IDSUBMIT_IDNITS_BINARY = "/usr/local/bin/idnits"
IDSUBMIT_STAGING_PATH = "/assets/www6s/staging/"
Expand Down Expand Up @@ -37,6 +39,25 @@
# DEV_TEMPLATE_CONTEXT_PROCESSORS = [
# 'ietf.context_processors.sql_debug',
# ]
for storagename in MORE_STORAGE_NAMES:
STORAGES[storagename] = {
"BACKEND": "ietf.doc.storage_backends.CustomS3Storage",
"OPTIONS": dict(
endpoint_url="http://blobstore:9000",
access_key="minio_root",
secret_key="minio_pass",
security_token=None,
client_config=botocore.config.Config(
signature_version="s3v4",
connect_timeout=BLOBSTORAGE_CONNECT_TIMEOUT,
read_timeout=BLOBSTORAGE_READ_TIMEOUT,
retries={"total_max_attempts": BLOBSTORAGE_MAX_ATTEMPTS},
),
verify=False,
bucket_name=storagename,
),
}


DOCUMENT_PATH_PATTERN = '/assets/ietfdata/doc/{doc.type_id}/'
INTERNET_DRAFT_PATH = '/assets/ietf-ftp/internet-drafts/'
Expand Down
4 changes: 4 additions & 0 deletions docker/docker-compose.extend.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,10 @@ services:
pgadmin:
ports:
- '5433'
blobstore:
ports:
- '9000'
- '9001'
celery:
volumes:
- .:/workspace
Expand Down
28 changes: 28 additions & 0 deletions docker/scripts/app-configure-blobstore.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
#!/usr/bin/env python
# Copyright The IETF Trust 2024, All Rights Reserved

import boto3
import os
import sys

from ietf.settings import MORE_STORAGE_NAMES


def init_blobstore():
blobstore = boto3.resource(
"s3",
endpoint_url=os.environ.get("BLOB_STORE_ENDPOINT_URL", "http://blobstore:9000"),
aws_access_key_id=os.environ.get("BLOB_STORE_ACCESS_KEY", "minio_root"),
aws_secret_access_key=os.environ.get("BLOB_STORE_SECRET_KEY", "minio_pass"),
aws_session_token=None,
config=botocore.config.Config(signature_version="s3v4"),
verify=False,
)
for bucketname in MORE_STORAGE_NAMES:
blobstore.create_bucket(
Bucket=f"{os.environ.get('BLOB_STORE_BUCKET_PREFIX', '')}{bucketname}".strip()
)


if __name__ == "__main__":
sys.exit(init_blobstore())
5 changes: 5 additions & 0 deletions docker/scripts/app-init.sh
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,11 @@ echo "Creating data directories..."
chmod +x ./docker/scripts/app-create-dirs.sh
./docker/scripts/app-create-dirs.sh

# Configure the development blobstore

echo "Configuring blobstore..."
PYTHONPATH=/workspace python ./docker/scripts/app-configure-blobstore.py

# Download latest coverage results file

echo "Downloading latest coverage results file..."
Expand Down
5 changes: 5 additions & 0 deletions ietf/api/tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
import debug # pyflakes:ignore

import ietf
from ietf.doc.storage_utils import retrieve_str
from ietf.doc.utils import get_unicode_document_content
from ietf.doc.models import RelatedDocument, State
from ietf.doc.factories import IndividualDraftFactory, WgDraftFactory, WgRfcFactory
Expand Down Expand Up @@ -553,6 +554,10 @@ def test_api_upload_polls_and_chatlog(self):
newdoc = session.presentations.get(document__type_id=type_id).document
newdoccontent = get_unicode_document_content(newdoc.name, Path(session.meeting.get_materials_path()) / type_id / newdoc.uploaded_filename)
self.assertEqual(json.loads(content), json.loads(newdoccontent))
self.assertEqual(
json.loads(retrieve_str(type_id, newdoc.uploaded_filename)),
json.loads(content)
)

def test_api_upload_bluesheet(self):
url = urlreverse("ietf.meeting.views.api_upload_bluesheet")
Expand Down
8 changes: 7 additions & 1 deletion ietf/doc/admin.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
TelechatDocEvent, BallotPositionDocEvent, ReviewRequestDocEvent, InitialReviewDocEvent,
AddedMessageEvent, SubmissionDocEvent, DeletedEvent, EditedAuthorsDocEvent, DocumentURL,
ReviewAssignmentDocEvent, IanaExpertDocEvent, IRSGBallotDocEvent, DocExtResource, DocumentActionHolder,
BofreqEditorDocEvent, BofreqResponsibleDocEvent )
BofreqEditorDocEvent, BofreqResponsibleDocEvent, StoredObject )

from ietf.utils.validators import validate_external_resource_value

Expand Down Expand Up @@ -218,3 +218,9 @@ class DocExtResourceAdmin(admin.ModelAdmin):
search_fields = ['doc__name', 'value', 'display_name', 'name__slug',]
raw_id_fields = ['doc', ]
admin.site.register(DocExtResource, DocExtResourceAdmin)

class StoredObjectAdmin(admin.ModelAdmin):
list_display = ['store', 'name', 'modified', 'deleted']
list_filter = ['deleted']
search_fields = ['store', 'name', 'doc_name', 'doc_rev', 'deleted']
admin.site.register(StoredObject, StoredObjectAdmin)
14 changes: 14 additions & 0 deletions ietf/doc/expire.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@

from typing import List, Optional # pyflakes:ignore

from ietf.doc.storage_utils import exists_in_storage, remove_from_storage
from ietf.doc.utils import update_action_holders
from ietf.utils import log
from ietf.utils.mail import send_mail
Expand Down Expand Up @@ -156,11 +157,17 @@ def remove_ftp_copy(f):
if mark.exists():
mark.unlink()

def remove_from_active_draft_storage(file):
# Assumes the glob will never find a file with no suffix
ext = file.suffix[1:]
remove_from_storage("active-draft", f"{ext}/{file.name}", warn_if_missing=False)

# Note that the object is already in the "draft" storage.
src_dir = Path(settings.INTERNET_DRAFT_PATH)
for file in src_dir.glob("%s-%s.*" % (doc.name, rev)):
move_file(str(file.name))
remove_ftp_copy(str(file.name))
remove_from_active_draft_storage(file)

def expire_draft(doc):
# clean up files
Expand Down Expand Up @@ -218,6 +225,13 @@ def move_file_to(subdir):
mark = Path(settings.FTP_DIR) / "internet-drafts" / basename
if mark.exists():
mark.unlink()
if ext:
# Note that we're not moving these strays anywhere - the assumption
# is that the active-draft blobstore will not get strays.
# See, however, the note about "major system failures" at "unknown_ids"
blobname = f"{ext[1:]}/{basename}"
if exists_in_storage("active-draft", blobname):
remove_from_storage("active-draft", blobname)

try:
doc = Document.objects.get(name=filename, rev=revision)
Expand Down
Loading

0 comments on commit 997239a

Please sign in to comment.