Releases: octue/octue-sdk-python
Use correct formatter for analysis logger
Summary
Simplify and correct the choice of log formatter on different platforms.
Contents
Enhancements
- Remove
COMPUTE_PROVIDER
andUSE_OCTUE_LOG_HANDLER
from build arguments for Google Cloud Run Dockerfile (just keep as environment variables)
Fixes
- Use correct formatter for analysis logger
- Stop allowing
COMPUTE_PROVIDER
environment variable to overrideUSE_OCTUE_LOG_HANDLER
Refactoring
- Move decision on which log formatter to use to new
octue.log_handlers.get_formatter
function
Testing
- Expand and update log handler tests
Revert "Acknowledge Pub/Sub messages received on Google Cloud Run straight away"
Reverts #221
We've had to revert due to limitations in Google Cloud Run:
- Only instances that haven't returned an HTTP status code yet are counted as active
- Acknowledgement of trigger Pub/Sub messages can only be done by returning a status code
- Threads can be launched so the trigger message can be acknowledged to avoid it being sent again, but the instance will then be treated as idle and killed after around 15 minutes
- Together, this means that acknowledgement of trigger Pub/Sub messages can only be done after all processing has completed
- There is also a 600s maximum acknowledgement deadline, after which the message is sent again, resulting in extra containers being spawned and the same computation being carried out multilple times until the first instance of it has finished and acknowledged the trigger message
- This means that long-running processes on Google Cloud Run cause the same processing to happen potentially many times, wasting a lot of compute resource.
- The only solution we can see to this currently (while staying on Google Cloud Run) is to set the acknowledgement deadline to its maximum of 600s and set the message retention deadline to its minimum of 600s so messages for longer-running processes aren't resent
- The long-term solution is to stop using Cloud Run and use something like Knative instead
Acknowledge Pub/Sub messages received on Google Cloud Run straight away
Summary
Acknowledge Pub/Sub messages received on Google Cloud Run straight away rather than waiting for the analysis to complete first. This avoids triggering the same analysis multiple times (due to Pub/Sub sending the same message multiple times), wasting compute resource, and adding a large amount of noise to the logs.
Contents
Fixes
- Answer questions from Google Cloud Run in a thread
Make input values optional when asking a question
Summary
Allow questions with only an input manifest to be asked (questions with only input values are already allowed).
Contents
Fixes
- Make
input_values
optional inService.ask
Allow input manifests referencing local files to be used when asking questions
Summary
Allow the input manifest sent to a child service to reference local files if the user confirms that the child will have access to them. This is useful if there are several children running on a single machine (or several machines with a shared filesystem) that produce files so large that it would cost too much or take too long to upload and download these from cloud storage repeatedly. A good example of this might be running heavy-computation/big data children on a high-performance computing cluster.
Contents
New features
- Allow input manifests referencing local files to be used if the files can be accessed by the child and the
allow_local_files
parameter isTrue
Testing
- Loosen deprecation warning test
Quality Checklist
- New features are fully tested (No matter how much Coverage Karma you have)
Make OrderedMessageHandler work with no timeout
Contents
Fixes
- Make
OrderedMessageHandler.handle_messages
work with no timeout
Only retry transient errors in Google Pub/Sub code
Summary
Simplify Google Pub/Sub retries and restrict them to transient errors, specifically removing retries for NotFound
errors (these were triggering many unneeded retries on Google Cloud Run). This also stops the retry schedule being proportional to the timeout for waiting for an answer to a question, which could lead to very long retry schedules for large timeouts (e.g. for questions that involve long analyses).
Contents
Enhancements
- Add
timeout
parameter toService.ask
Fixes
- Only retry transient errors in Google Pub/Sub
Service
andGooglePubSubHandler
Ensure crc32c hashing on cloud upload works for binary files
Contents
Fixes
- Ensure crc32c hash calculation for cloud upload fidelity check works for binary files
Testing
- Make deprecation warning test less stringent
Fix error messages about uppercase characters in tags and labels
Contents
Fixes
- Fix error messages about uppercase characters in tags and labels
Format answerer exceptions properly when sending to asker
Summary
Ensure exceptions with multiple arguments of any type are formatted and sent correctly to the asking service by the answering service.
Contents
Fixes
- Format answerer exceptions properly when sending to asker