-
Notifications
You must be signed in to change notification settings - Fork 1
/
ExecutionBroker.tex
2815 lines (2330 loc) · 104 KB
/
ExecutionBroker.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\documentclass[11pt,a4paper]{ivoa}
\input tthdefs
\usepackage{xspace}
% Standard terms used throughout the document,
% defined as macro commands to maintain consistency
% and avoid repeated spelling mistakes.
% Using non-breaking space character.
% https://stackoverflow.com/a/1012891
\usepackage[super]{nth}
\newcommand{\xml} {XML}
\newcommand{\json} {JSON}
\newcommand{\yaml} {YAML}
\newcommand{\http} {HTTP}
\newcommand{\rest} {REST}
\newcommand{\uuid} {UUID}
\newcommand{\openapi} {OpenAPI}
\newcommand{\datamodel} {data~model}
\newcommand{\webservice} {web service}
\newcommand{\webbrowser} {web browser}
\newcommand{\vo} {VO}
\newcommand{\vofull} {Virtual Observatory}
\newcommand{\ivoa} {IVOA}
\newcommand{\ivoafull} {International Virtual Observatory Alliance}
\newcommand{\uws} {UWS}
\newcommand{\vospace} {VOSpace}
\newcommand{\execworkerclass} {**ExecutionWorker**}
\newcommand{\execbrokerclass} {\textit{ExecutionBroker}}
\newcommand{\execbrokerservice}[1] {\textit{ExecutionBroker~service#1}}
\newcommand{\execbrokersession}[1] {\textit{ExecutionBroker~session#1}}
\newcommand{\execbrokerstate}[1] {\codeword{state#1}}
\newcommand{\execoffer}[1] {\textit{ExecutionBroker~offer#1}}
\newcommand{\execofferset}[1] {\textit{ExecutionBroker~offerset#1}}
\newcommand{\execsession}[1] {\textit{ExecutionBroker~session#1}}
\newcommand{\executionbroker} {\textit{Execution~Broker}}
\newcommand{\executionplanning} {\textit{Execution~Planning}}
\newcommand{\execplatform} {\textit{execution~platform}}
\newcommand{\executable} {\textit{executable}}
\newcommand{\executablething}[1] {\textit{executable~thing#1}}
\newcommand{\excutabletask} {\textit{executable} task}
\newcommand{\metadoc} [1]{\textit{metadata document#1}}
%\newcommand{\execoffer}[1] {\textit{offer#1}}
\newcommand{\workerjob}[1] {\textit{session#1}}
\newcommand{\teardown} {tear-down}
\newcommand{\jupyter} {Jupyter}
\newcommand{\jupyterhub} {JupyterHub}
\newcommand{\binderhub} {BinderHub}
\newcommand{\jupyternotebook} {Jupyter notebook}
\newcommand{\esap} {ESAP}
\newcommand{\escape} {ESCAPE}
\newcommand{\datalake} {DataLake}
\newcommand{\rucio} {Rucio}
\newcommand{\python} {Python}
\newcommand{\pythonprogram} {Python program}
\newcommand{\pythonruntime} {Python runtime}
\newcommand{\apache} {Apache}
\newcommand{\spark} {Spark}
\newcommand{\pyspark} {PySpark}
\newcommand{\zeppelin} {Zeppelin}
\newcommand{\zeppelinnotebook} {Zeppelin notebook}
\newcommand{\oci} {OCI}
\newcommand{\ociruntime} {OCI runtime}
\newcommand{\ocicontainer} {OCI container}
\newcommand{\docker} {Docker}
\newcommand{\dockercompose} {Docker compose}
\newcommand{\dockerimage} {Docker image}
\newcommand{\dockerruntime} {Docker runtime}
\newcommand{\dockercontainer} {Docker container}
\newcommand{\singularity} {Singularity}
\newcommand{\singularitycontainer} {Singularity container}
\newcommand{\openstack} {Openstack}
\newcommand{\kubernetes} {Kubernetes}
\newcommand{\codeword}[1] {\texttt{#1}}
\newcommand{\footurl}[1] {\footnote{\url{#1}}}
\newcommand{\dataset}[1] {dataset#1}
\newcommand{\datascience} {data~science}
\newcommand{\scienceplatform}[1] {science~platform#1}
\newcommand{\science}[1] {science#1}
\newcommand{\scientist}[1] {scientist#1}
\newcommand{\cpu}[1] {CPU#1}
\newcommand{\gpu}[1] {GPU#1}
\newcommand{\nvidiagpu} {NVIDIA~AD104~GPU}
\newcommand{\scalable} {scalable}
% TODO add a citation for the YAML specification.
% https://yaml.org/spec/
\usepackage{listings}
\usepackage{xcolor}
%\colorlet{punct}{red!60!black}
\colorlet{numb}{magenta!60!black}
\definecolor{html-gray}{HTML}{EEEEEE}
\definecolor{light-gray}{gray}{0.95}
\definecolor{delim}{RGB}{20,105,176}
\lstset{
basicstyle=\small\ttfamily,
columns=fullflexible,
frame=none,
backgroundcolor=\color{light-gray},
stepnumber=1,
%numbers=left,
numbers=none,
numberstyle=\small,
numbersep=8pt,
%xleftmargin=\parindent,
xrightmargin=1cm,
showstringspaces=false,
keepspaces=true,
breaklines=true,
linewidth=14cm,
frame=none
}
% https://tex.stackexchange.com/questions/83085/how-to-improve-listings-display-of-json-files
% https://tex.stackexchange.com/a/83100
% https://tex.stackexchange.com/questions/10828/indent-a-code-listing-in-latex
% https://tex.stackexchange.com/a/10831
\lstdefinelanguage{json}{
literate=
*{0}{{{\color{numb}0}}}{1}
{1}{{{\color{numb}1}}}{1}
{2}{{{\color{numb}2}}}{1}
{3}{{{\color{numb}3}}}{1}
{4}{{{\color{numb}4}}}{1}
{5}{{{\color{numb}5}}}{1}
{6}{{{\color{numb}6}}}{1}
{7}{{{\color{numb}7}}}{1}
{8}{{{\color{numb}8}}}{1}
}
\lstdefinelanguage{yaml}{
literate=
*{0}{{{\color{numb}0}}}{1}
{1}{{{\color{numb}1}}}{1}
{2}{{{\color{numb}2}}}{1}
{3}{{{\color{numb}3}}}{1}
{4}{{{\color{numb}4}}}{1}
{5}{{{\color{numb}5}}}{1}
{6}{{{\color{numb}6}}}{1}
{7}{{{\color{numb}7}}}{1}
{8}{{{\color{numb}8}}}{1}
}
\hyphenation{Exe-cut-able-Thing}
% Enable [*bold*] inside listings
% https://stackoverflow.com/a/24838471
% https://www.mrunix.de/forums/archive/index.php/t-42976.html
% https://en.wikibooks.org/wiki/LaTeX/Source_Code_Listings
% https://mirror.ox.ac.uk/sites/ctan.org/macros/latex/contrib/listings/listings.pdf#subsection.3.3
\renewcommand{\ttdefault}{pcr}
\lstset{moredelim=[is][\bfseries]{[*}{*]}}
\lstset{moredelim=[is][\itshape]{[+}{+]}}
\title{IVOA Execution Broker}
% see ivoatexDoc for what group names to use here; use \ivoagroup[IG] for
% interest groups.
\ivoagroup{GWS}
\author[http://www.ivoa.net/twiki/bin/view/IVOA/DaveMorris]
{Dave Morris}
\author[http://www.ivoa.net/twiki/bin/view/IVOA/SaraBertocco]
{Sara Bertocco}
\editor[http://www.ivoa.net/twiki/bin/view/IVOA/DaveMorris]
{Dave Morris}
% \previousversion[????URL????]{????Concise Document Label????}
\previousversion{This is the first public release}
\begin{document}
\begin{abstract}
\label{abstract}
One of the long term goals of the \ivoa{} has been to enable users to
move the code to the data.
This is becoming more and more important as the size and complexity
of the \dataset{s} available in the virtual observatory increases.
%\citep{gaia-at-esac}
%\footurl{https://www.skao.int/en/explore/big-data}
%\footurl{https://www.lsst.org/scientists/keynumbers}
The \ivoa{} \executionbroker{} provides a step towards making this possible.
The \ivoa{} \executionbroker{} is designed to address a specific question;
given an executable thing, e.g. a \pythonprogram{} or \jupyternotebook{}.
What facilities are available to run it?
To do this, the \ivoa{} \executionbroker{} specification defines
a \datamodel{} and \webservice{} API for describing executable things
and the resources needed to execute them.
Together these components enable a user to ask a simple question
\textit{"Where (and when) can I execute my program?"}
This in turn enables users to move code between \scienceplatform{s}.
Allowing them to develop their code on one platform and then apply it to a different
\dataset{} by sending it to execute on another platform.
\end{abstract}
\section*{Acknowledgments}
\label{sect-acknowledgments}
The authors would like to thank all the participants in the IVOA and ESCAPE projects
who have contributed their ideas, critical reviews, and suggestions to this document.
\section*{Conformance-related definitions}
\label{sect-conformance}
The words ``MUST'', ``SHALL'', ``SHOULD'', ``MAY'', ``RECOMMENDED'', and
``OPTIONAL'' (in upper or lower case) used in this document are to be
interpreted as described in IETF standard RFC2119 \citep{std:RFC2119}.
The \emph{Virtual Observatory (VO)} is a general term for a collection of
federated resources that can be used to conduct astronomical research,
education, and outreach.
The \href{https://www.ivoa.net}{International Virtual Observatory Alliance (IVOA)}
is a global collaboration of separately funded projects to develop standards and
infrastructure that enable VO applications.
\section{Introduction}
\label{sect-introduction}
The \ivoa{} \executionbroker{} specification defines a \datamodel{} for describing executable tasks
and a \webservice{} interface for managing them.
Together these provide a common interface for service discovery, resource allocation
and execution scheduling across a heterogeneous federation of different types of
execution platform.
\begin{itemize}
\item \execbrokerclass{} \datamodel{} – a data model for describing execution sessions and their resource requirements.
\item \execbrokerclass{} \webservice{} – a \rest{} based web service to find execution platforms, allocate resources and schedule execution sessions.
\end{itemize}
\subsection{Role within the VO Architecture}
\label{sub-ivoa-role}
% As of ivoatex 1.2, the architecture diagram is generated by ivoatex in
% SVG; copy ivoatex/archdiag-full.xml to role_diagram.xml and throw out
% all lines not relevant to your standard.
% Notes don't generally need this. If you don't copy role_diagram.xml,
% you must remove role_diagram.pdf from SOURCES in the Makefile.
\begin{figure}
\centering
\includegraphics[width=0.9\textwidth]{role_diagram.pdf}
\caption{Architecture diagram showing the \ivoa{} \executionbroker{}'s role in the \ivoa}
\label{fig:archdiag}
\end{figure}
The \ivoa{} Architecture\citep{2010ivoa.rept.1123A} provides a high-level view of how \ivoa{}
standards work together to connect users and applications with providers of data
and services.
Fig.~\ref{fig:archdiag} shows the role the \ivoa{} \executionbroker{} plays within this architecture.
In response to the increasing size and complexity of the next generation of science \dataset{s}
a number of \ivoa{} members are developing intergrated \scienceplatform{s} which bring
together the \dataset{s} co-located with the compute resources needed to analyse
them.\footurl{https://data.lsst.cloud/}\footurl{https://rsp.lsst.io/}
These \scienceplatform{s} make extensive use of the \ivoa{} data models and
vocabularies to describe their \dataset{s}, and use the \ivoa{} data access
services to find and access data from other data providers.
In addition, some of the \scienceplatform{s} use \ivoa{} \vospace{} services to manage
data transfers to and from local storage co-located with the compute resources.
However, to date the \ivoa{} does not provide any APIs or services that
enable \scienceplatform{s} to exchange the software used to analyse the data.
The \ivoa{} \executionbroker{} provides a step towards making this possible.
This places the \ivoa{} \executionbroker{} in the same region of the \ivoa{} architecture
as the \ivoa{} \vospace{} specification \citep{2009ivoa.specQ1007G},
providing an infrastructure level service that enables service discovery,
resource allocation and execution scheduling across a heterogeneous federation
of execution platforms.
\subsection{Supplementary documents}
\label{sub-supplementary-documents}
\subsubsection{\openapi{} specification}
\label{subsub-openapi-specification}
This document is designed to read in combination with the \openapi{}
\footurl{https://www.openapis.org/} \footurl{https://swagger.io/specification/}
specification published alongside this document.
The \openapi{} specification defines the technical details of the
\executionbroker{} \webservice{} interface and the schema for the
\executionbroker{} \datamodel{}.
This document compliments the technical specification by using use cases
and examples to describe the intended service behaviour and
explain the reasoning behind some of the design choices.
The two documents are intended to be read together.
The machine-readable \openapi{} spcification defining the \textit{what},
and the human-readable text document describing the \textit{why}.
The \openapi{} specification associated with this document is
published in the following files:
\begin{itemize}
\item \codeword{openapi.yaml} - The main service specification, including
the \executionbroker{} service API and the core data model.
\item \codeword{messages.yaml} - a supplimentary data model for INFO, WARN,
and DEBUG messages embedded in the service responses.
\item \codeword{utils.yaml} - a supplimentary data model for re-usable
components such as ISO date formats, min/max pairs, and name-value maps.
\end{itemize}
This text document may include small examples of \openapi{} schema to
explain specific points, but these are for information only. Unless otherwise
specified, the \openapi{} specification itself should be assumed to be the
definitive source and this text document should be considered as secondary.
\subsubsection{IVOA profiles}
\label{subsub-ivoa-profiles}
This specification refers to the following documents
to cover the details of how an \executionbroker{} \webservice{}
should use the following protocols and data formats:
\begin{itemize}
\item The \ivoa{} \rest{} profile, describing a common pattern
for \ivoa{} \webservice{s} that implement
\rest{}\footurl{https://en.wikipedia.org/wiki/REST}
webservices.
\item The \ivoa{} \http{} profile, describing how \ivoa{} \webservice{s}
should use aspects of the \http{} protocol, including \http{} parameters,
message content, content negotiation, and \http{} return codes.
\item The \ivoa{} error messages profile, describing a common format
for \ivoa{} \webservice{s} to format error messages.
\item The \ivoa{} \json{} profile, describing how \ivoa{} \webservice{s}
should serialize message content using the
\json{}\footurl{https://www.json.org/json-en.html} data format.
\item The \ivoa{} \yaml{} profile, describing how \ivoa{} \webservice{s}
should serialize message content using the
\yaml{}\footurl{https://yaml.org/} data format.
\item The \ivoa{} \xml{} profile, describing how \ivoa{} \webservice{s}
should serialize message content using the
\xml{}\footurl{https://www.w3.org/XML/} data format.
\end{itemize}
Unless otherwise specified, an \executionbroker{} \webservice{} implementation
should follow the guidelines outined in these documents.
\subsubsection{IVOA Single-Sign-On}
\label{subsub-ivoa-sso}
An \ivoa{} \executionbroker{} service MAY use parts of the
\ivoa{} Single-Sign-On standard\citep{2017ivoa.spec.0524T}
for authentication, and the
\ivoa{} Credential Delegation Protocol \citep{2010ivoa.spec.0218P}
for delegating credentials to other services.
\subsection{Executable things}
\label{sub-executablething}
To understand the problem that the \ivoa{} \executionbroker{} is trying to solve
it is useful to describe what an \executablething{} is in this context.
In general terms, this document refers to something that can be executed, or run,
as an \executable{}.
To explain what this means we can start with a science domain function that we want to perform.
For example, the mathematical concept of the square root of a number.
We can calculate the square root of a positive number using the Newton–Raphson
algorithm\footurl{https://en.wikipedia.org/wiki/Newton\%27s_method}
which produces successively closer approximations to the result.
However, in general case, this mathematical description of the algorithm would not be
considered to be an \executablething{}.
We can write a \pythonprogram{} to use this algorithm to calculate the square root of a number.
This is the first identifiable \executablething{} in our example.
To be able to use this \executablething{}, you would need a computing resource with the appropriate
hardware and software environment. In this case, a computing resource with the \python{} interpreter
installed along with the additional \python{} modules required by the program.
This environment is often referred to as the \python{} runtime.
In the context of \scienceplatform{s} and \datascience{}, a common pattern is to provide this environment
using a Docker\footurl{https://docs.docker.com/get-started/what-is-a-container/}
or OCI\footurl{https://opencontainers.org/} container
to package the \pythonprogram{} and \python{} runtime together as a single binary object.
This package, or container, is itself an \executablething{}. One which requires a different execution
environment than the original \pythonprogram{}.
The aim of containerization is to package software components together with all the libraries and dependencies
they need as a single binary object that interfaces with a standard execution environment,
referred to as the \textit{container runtime}.
To be able to use this \executablething{}, you would need a computing resource with the appropriate
hardware and software environment. In this case, a computing resource with the \docker{} or \ocicontainer{}
runtime installed.
We could also create a \jupyternotebook{} that demonstrates how to use our \pythonprogram{}.
This is the third \executablething{} in our example.
One which provides an interactive environment for the user to experiment with.
As before, to be able to use this \executablething{}, we would need a computing resource with
the appropriate hardware and software environment.
In this case, a computer with the \jupyternotebook{} platform installed along with all the \python{} modules
needed by our \pythonprogram{}.
In the context of \scienceplatform{s} and \datascience{}, a common pattern is to provide this environment as a \webservice{}
that allows the user to interact with the \jupyternotebook{} via a \webbrowser.
From one algorithm that implements a science domain function, we have created three different \executablething{s}.
A \pythonprogram{}, a \dockercontainer{} packaging the \pythonprogram{}, and an interactive \jupyternotebook{}
that demonstrates how to use the \pythonprogram{}.
Each of which requires a different computing environment to execute.
A basic \python{} runtime, the \dockerruntime{}, and a \jupyternotebook{} service.
We may also want to consider the data that we are applying the algorithm to and the compute resources that
will be needed to process it.
If we are running some small experiments to learn how to use the algorithm, then a basic computing
resource will probably be sufficient.
However, if we have a \dataset{} of ten million numbers that we want to process, then we may
need to consider adding extra storage to handle the input data and the results.
For a large \dataset{} it may also be worth using a \gpu{} to accelerate the calculation.
The \ivoa{} \executionbroker{} \datamodel{} provides a way to describe what each of these \executablething{s}
are and what resources are needed to execute them.
This can include things like number of \cpu{} cores and amount of memory it needs,
whether it needs a \gpu{}, the location of the input data, the storage space needed to perform
the calculation, and the storage space needed to save the results.
\section{Service interaction}
\label{sect-service-interaction}
The interaction between a user, the client application they are using, and the services available in the \vofull{}
can be described as a conversation to discover where, how, and when, an \executablething{} that the user
has chosen can be executed.
\subsection{Discovery services}
\label{sub-discovery-services}
The conversation starts at the discovery stage, where the user uses discovery services to
select the software and \dataset{s} that they want to work with.
\includegraphics[width=0.9\textwidth]{diagrams/data-discovery.pdf}
The detailed specification for the software and data discovery services are beyond the
scope of this document. However we can outline some general requirements for them.
In both cases, the discovery process should not depend on the technical details
of the software or the \dataset{s}, but on their science domain functionality and properties.
From a science user's perspective they want to be able to find software that implements
a particular clustering algorithm, or a \dataset{} that is indexed according to a particular
coordinate system.
The programming language the software is written in and the file format of the \dataset{}
are at best secondary criteria.
In our square root example, we would expect our user to use search terms like \textit{'square root'}
or \textit{'newton raphson'} to find the software they need.
We wouldn't expect them to start out looking for a \textit{'python'} or \textit{'docker'} as their key search terms.
Ideally, if the \executionbroker{} service functions as intended, a science user should not
need to know about programming languages, software packaging or file formats.
The \executionbroker{} service should hide as much as possible of the technical details,
enabling the science user to get on with science.
Another important consideration is these discovery services should be designed to be domain agnostic.
Meaning that it should be possible to swap out an astronomy based discovery service
for an equivalent biochemistry discovery service and although the domain specific
terms and vocabulary will be different, the techical details of the service interfaces
should be the same.
\subsubsection{Software discovery}
\label{subsub-software-discovery}
There are three main components involved in software discovery, the metadata schema for
describing the software, one or more search services, and the
repositories where the \executablething{s} are stored.
The vocabularies and schema need to be based on use cases that start by describing what the
\scientist{} wants to do, and from that derrive what software tools they would need, and what terms
they would naturally use to describe them.
The \ivoa{} semantics and data modelling working groups have a lot of experience developing
vocabularies and data models to descibe \science{} data products, and is well placed
to develop the vocabularies needed to descibe astronomy software.
It is important to keep in mind that the requirement is not to model the technical properties
of the software itself e.g. what programming language it is written in or who funded the development.
The important things to model are the search terms that a \scientist{} is most likley to use to try to
find the software they need.
The second component is a searchable database that acceopts a list of search terms and responds with a
list of \metadoc{s} that describe \executablething{s} that match the criteria.
Before we look in detail at the content of the \metadoc{s} it is worth looking at where the \metadoc{s} are
stored in relation to the search service and the repositories where the \executablething{s} are stored.
In one scenario, all of the components can be co-located by the same service.
The database of search terms, the \metadoc{s}, and the binary files containing the \executablething{s}
can all be hosted by the same service implementation.
TODO diagram
An alternative implementation could store them at different locations, using
existing off-the-shelf software and services to host them.
There are a number of widely available content managment systems, both commercial
and open source, that would be capable of implementing the database of search terms.
If the \metadoc{s} are stored in the same database, then the response from a
database search could contain the \metadoc{s} themselves.
TODO diagram
database, results, contain \metadoc{s} from database
Alternatively, the \metadoc{s} could be stored at a separate location,
in an online git repository for example,
and the database search response simply contains a list of URLs that
point to the individual \metadoc{s}.
TODO diagram
database, results, links to \metadoc{s} in external repositories
The third part of the set is the binary image of the \executablething{}.
In most cases it would probably make sense for the \metadoc{} to reference
the \executablething{} as a binary file stored in an external repository
rather than trying to include the \executablething{} as a binary blob in
the database.
TODO diagram
database, results, links to \metadoc{s} with links to images
The system can use standard cryptographic signatures and checksums to ensure the validity
of the \metadoc{s} and the binary images they refer to even when they are stored and accesed
via external \nth{3} party services.
In summary, there are two things that need to be standardised for a software discovery service:
\begin{itemize}
\item The inputs to the discovery service, including the metadata vocabularies
used to describe the software in terms that make sense to the \scientist{}
looking for them. For example what algorithm it implements, the type of input data it
operates on, and the type of results it generates.
\item The outputs of the discovery service, including the \metadoc{s} defined by this
specification, that describe the binary images that package the software
as \executablething{s}.
\end{itemize}
The other components in the software discovery stack, the database of search terms, and
the storage and access services for the \metadoc{s} and binary images, do not need to be
standardised at this stage.
\subsubsection{Data discovery}
\label{subsub-data-discovery}
TODO - update this with reference to \ivoa{} data product type and how this links on the
executable parameters.
https://www.ivoa.net/rdf/product-type/2024-05-19/product-type.html
\subsection{Execution Broker}
\label{sub-execution-broker}
Once the user has identified the software and data resources that they want to use,
the client application brings together details of the \executablething{} the user
has selected, the compute, storage, and data resources they want to use,
along with a schedule describing when the user wants it to execute,
to create a \metadoc{} description of the \execsession{} the user wants
to execute.
\begin{lstlisting}[]
[+# ExecutionBroker OfferSet request.+]
executable:
....
resources:
....
schedule:
....
\end{lstlisting}
\subsubsection{OfferSet request}
\label{subsub-offerset-request}
The client sends the \metadoc{} description to one or more \execbrokerclass{}
services to ask if they can meet the requirements and execute the \execsession{}.
Each \execbrokerservice{} evaluates the request and responds with a top level
\codeword{YES|NO} answer, and if
the answer is \codeword{YES}, a list of one or more \execoffer{s} describing how
the requested \execsession{} could be executed on the platform(s) represented by
that \execbrokerservice{}.
%\begin{lstlisting}[]
%[+# ExecutionBroker OfferSet request/response.+]
%Request - Can this platform execute <session> ?
%Response - YES, list of <offer>[]
%\end{lstlisting}
\includegraphics[width=0.9\textwidth]{diagrams/request-offers.pdf}
\subsubsection{Resource allocation}
\label{subsub-resource-allocation}
The \execbrokerservice{} is responsible for managing the resources
within the underlying \execplatform{} that it represents.
The \execbrokerservice{} is responsible for keeping track of
reservations for the resources that it grants to clients in
response to execution requests.
The details of how an \execbrokerservice{} implements this is
beyond the scope of this specification.
This specification defines a set of black-box behaviours
that an \execbrokerservice{} MUST implement.
\begin{itemize}
\item Each \execoffer{} MUST meet the minimum criteria specified in the request.
\item Each \execoffer{} MUST be within the capabilities of the underlying \execplatform{} to be able to provide.
\item An \execbrokerservice{} MAY offer more resources than the minimum value specified in the request.
\item An \execbrokerservice{} MAY offer more execution time beyond the minimum value specified in the request.
\item An \execbrokerservice{} SHOULD NOT offer more resources than the maximum value specified in the request.
\item An \execbrokerservice{} SHOULD NOT offer more execution time beyond the maximum value specified in the request.
\end{itemize}
This specification recognises a set of black-box behaviours that an \execbrokerservice{} SHOULD implement.
\begin{itemize}
\item An \execoffer{} made by an \execbrokerservice{} is assumed to be made in good faith, with a reasonable probability of success.
\item An \execbrokerservice{} is free to just say \codeword{NO} to a request, for whatever reason.
\end{itemize}
In addition, this specification recognises a number of situational and dispositional factors
that may influence how an \execbrokerservice{} allocates resources.
\begin{itemize}
\item Different implementations may represent a range of different types of \execplatform{s}.
\begin{itemize}
\item Science platforms.
\begin{itemize}
\item CANFAR.
\item Azimuth.
\item SciServer.
\end{itemize}
\item Cloud platforms.
\begin{itemize}
\item Kubernetes platforms.
\item Openstack platforms.
\item Commercial cloud platforms.
\end{itemize}
\item Service platforms.
\begin{itemize}
\item Slurm services.
\end{itemize}
\end{itemize}
\item Different implementations may be connected to a range of different sizes of \execplatform{s}.
\begin{itemize}
\item Large scale HPC systems.
\item Large scale commercial platforms.
\item Medium scale on-premises platforms.
\item Small scale specialist platforms.
\end{itemize}
\item Different deployments may apply different algorithms to manage the allocation of resources.
\begin{itemize}
\item Maximise the amount of resources offered to users.
\item Maximise the number of jobs a \execplatform{} can execute.
\item Prioritise interactive sessions by offering precise start times.
\item Prioritise batch processing by offering wide start ranges.
\end{itemize}
\item Different deployments may apply different policies to manage the allocation of resources.
\begin{itemize}
\item Required security criteria.
\item Recognised identity providers.
\item Required group membership policies.
\item Local resource allocation policies.
\item Local white lists of allowed executables.
\end{itemize}
\end{itemize}
\subsubsection{Execution scheduling}
\label{subsub-execution-scheduling}
This is already covered in a differnt section ..
\subsubsection{Offerset response}
\label{subsub-offerset-response}
Each \execbrokerservice{} will respond to a request for offers with an \execofferset{}
containing some metadata about the \execofferset{} itself, followed by a list of \execoffer{}s
describing how the requested \execsession{} could be executed.
Each \execoffer{} in the list contains some metadata about the \execoffer{} itself,
including its UUID identifier and expiry time, followed by details of how and when the
\execsession{} would be executed.
\begin{lstlisting}[]
[+# ExecutionBroker OfferSet response.+]
result: YES
....
offers:
- uuid: "2e164a1b-7ff6-11ef-8412-4bc36fe2face"
state: 'OFFERED'
expires: "2023-09-18T07:05:21"
....
executable:
....
resources:
....
schedule:
....
- uuid: "2e16bf4c-7ff6-11ef-8412-4bc36fe2face"
status: 'OFFERED'
expires: "2023-09-18T07:05:21"
....
executable:
....
resources:
....
schedule:
....
\end{lstlisting}
\subsubsection{Session lifecycle}
\label{subsub-session-lifecycle}
An \execbrokersession{} goes through the following lifecycle stages.
\begin{itemize}
\item \codeword{OFFERED} The \workerjob{} is being offered.
\item \codeword{ACCEPTED} The \workerjob{} has been accepted.
\item \codeword{REJECTED} The \workerjob{} offer has been rejected.
\item \codeword{EXPIRED} The \workerjob{} offer has expired.
\item \codeword{WAITING} The \workerjob{} is waiting to activate.
\item \codeword{PREPARING} The resources are being prepared.
\item \codeword{READY} The \workerjob{} is ready to execute.
\item \codeword{RUNNING} The \workerjob{} is executing.
\item \codeword{RELEASING} The resources are being released.
\item \codeword{COMPLETED} The \workerjob{} has completed.
\item \codeword{CANCELLED} The \workerjob{} has been cancelled.
\item \codeword{FAILED} The \workerjob{} has failed.
\end{itemize}
The first four \execbrokerstate{s} are interactive, where the \execbrokerstate{} changes
are primarily driven by user interactions.
\begin{itemize}
\item \codeword{OFFERED} The \workerjob{} is being offered.
\item \codeword{ACCEPTED} The \workerjob{} has been accepted.
\item \codeword{REJECTED} The \workerjob{} offer has been rejected.
\item \codeword{EXPIRED} The \workerjob{} offer has expired.
\end{itemize}
When the \execbrokerservice{} replies with an \execofferset{} containing
a list of \execoffer{}s, the user can choose to do nothing, to accept one of the
\execoffer{s} from the list that best
fits their requirements, or they can reject the \execoffer{s}.
Each of the \execoffer{s} in the \execofferset{} represent a temporary reservation
for the resources listed in the \execoffer{}.
This means these resources will not available to other users while the \execoffer{s}
are still valid.
If the user does nothing, then \execbrokerstate{} of each of the \execoffer{s} will
automatically be updated to \codeword{EXPIRED}, and their associated resources will
be released, when their expiry time is reached.
If the user accepts one of the \execoffer{s} in the \execofferset{} by updating
the \execbrokerstate{} to \codeword{ACCEPTED}, the \execbrokerservice{} SHOULD
update the \execbrokerstate{} of the other \execoffer{s} in the \execofferset{} to
\codeword{REJECTED} and release the associated resources.
TODO [accept/reject/expire state-transition diagram ]
\subsubsection{Session execution}
\label{subsub-session-execution}
Once the user accepts one of the \execoffer{s}, the \execbrokerservice{} takes over and
drives the subsequent \execbrokerstate{} changes based on the timing schedule and
the results of the internal steps within the \execbrokersession{}.
\begin{itemize}
\item \codeword{WAITING} The \workerjob{} is waiting to activate.
\item \codeword{PREPARING} The resources are being prepared.
\item \codeword{READY} The \workerjob{} is ready to execute.
\item \codeword{RUNNING} The \workerjob{} is executing.
\item \codeword{RELEASING} The resources are being released.
\item \codeword{COMPLETED} The \workerjob{} has completed.
\item \codeword{CANCELLED} The \workerjob{} has been cancelled.
\item \codeword{FAILED} The \workerjob{} has failed.
\end{itemize}
It is up to the \execbrokerservice{} to select the right time to change the \workerjob{}
\codeword{phase} from \codeword{WAITING} to \codeword{PEPARING} and begin preparing the resources so that
the \workerjob{} is \codeword{READY} in time for the \codeword{starttime} declared
in the \execoffer{}.
If it will take 2 hours to transfer the data resources
from archive storage to live storage co-located with the compute resources,
then the \execworkerclass{} needs to start the \codeword{PREPARING} phase at least 2 hours
before the \codeword{starttime} declared in the \execoffer{}.
Once all the resources are ready, the \execworkerclass{} changes the \workerjob{}
\codeword{phase} to \codeword{READY} to indicate that the all the resources
are ready and the \workerjob{} is waiting to start.
The \execworkerclass{} will then wait until the \codeword{starttime} declared in the \execoffer{}
at which point it will start executing the \workerjob{} and change the \workerjob{} \codeword{phase}
to \codeword{RUNNING}.
When the \workerjob{} finishes executing, because the \dockercontainer{} finished executing,
the user closed their \jupyternotebook, or the \codeword{maxduration} was reached,
the \execworkerclass{} will change the \workerjob{} \codeword{phase} to \codeword{TEARDOWN} and
begin the process of releasing the resources.
If the \workerjob{} includes some persistent storage that should last beyond the end of the \workerjob{},
then part of the \teardown{} process may involve transferring results from the \workerjob{}
onto the persistent storage before the local storage is released.
When the \teardown{} process completes, the \workerjob{} \codeword{phase} is changed to \codeword{COMPLETED}.
If an error occurs at any time in the process, the \workerjob{} \codeword{phase} is changed to \codeword{FAILED}.
This includes any errors that occur during the \teardown{} process; for example, because
the \execworkerclass{} was unable to transfer the results onto persistent storage.
Then the \workerjob{} \codeword{phase} is changed to \codeword{FAILED}, even if the main part of the
execution completed successfully.
This is because any workflow steps that follow after this step, will depend not only on the execution being
completed, but they also need the \teardown{} data transfers to complete so that the results from this step
are in the right place for the next step to be able to access them.
\subsubsection{Update options}
\label{subsub-update-options}
Each \execsession{} has a unique URL that the client can use to monitor and update
its state.
The \execbrokerservice{} response for an \execsession{} includes a list of options
that the user may use to update or modify the \execsession{}.
Which options are available will depend on the current \codeword{state} of the
\execsession{} and the identity and permissions of the authenticated user.
If the \execsession{} is still being offered, then the list of available options
allow the user to accept or reject the offer by updating the \codeword{state} of
the \execsession{} to \codeword{ACCEPTED} or \codeword{REJECTED}.
\begin{lstlisting}[]
uuid: "2e164a1b-7ff6-11ef-8412-4bc36fe2face"
href: "http://..../sessions/2e164a1b-7ff6-11ef-8412-4bc36fe2face"
state: 'OFFERED'
expires: "2023-09-18T07:05:21"
....
....
options:
- type: "uri:enum-value-option"
path: "state"
values:
- "ACCEPTED"
- "REJECTED"
\end{lstlisting}
Once the \execoffer{} has been accepted and the \execsession{} has started to
execute, then the list of available options will only allow the user to cancel
the execution.
\begin{lstlisting}[]
uuid: "2e164a1b-7ff6-11ef-8412-4bc36fe2face"
href: "http://..../sessions/2e164a1b-7ff6-11ef-8412-4bc36fe2face"
state: 'ACCEPTED'
expires: "2023-09-18T07:05:21"
....
....
options:
- type: "uri:enum-value-option"
path: "state"
values:
- "CANCELLED"
\end{lstlisting}
Using a dynamic list of options in this way enables the \execbrokerservice{}
to communicate to the client what actions the user is able to take
over the lifetime of an \execsession{}.
\section{The data model}
\label{sect-data-model}
As outlined in section \ref{subsub-openapi-specification}, this section of the document is
intended to compliment the \openapi{} specificaton published alongside this document.
The \openapi{} specification defines the technical details of the
\executionbroker{} \datamodel{}.
This document uses use cases and examples to describe the reasoning
behind some of the design choices.
The two documents are intended to be read together.
The machine-readable \openapi{} spcification defining the \textit{what},
and the human-readable text document describing the \textit{why}.
\subsection{The \executable{}}
\label{sub-executable}
At the simplest level the client just needs to check whether a platform is able to execute a particular
type of \excutabletask{}.
For example, \textit{"Is this platform able to run a \jupyternotebook{}?"}
In order to do this, the request needs to specify the task type, e.g. \jupyternotebook{},
along with details about it, e.g. where to fetch the notebook from.
The information in this part of the \datamodel{} will be different for each type of \executable{}.
Rather than try to model every possible type of \executable{} in one large \datamodel{},
the \datamodel{} for each type is described in an extension to the core \datamodel{}.
The \datamodel{} uses a common pattern for polymorphic
\footurl{https://swagger.io/docs/specification/v3_0/data-models/inheritance-and-polymorphism/\#polymorphism}
types based on a discriminator
\footurl{https://swagger.io/docs/specification/v3_0/data-models/inheritance-and-polymorphism/\#discriminator}
value to indicate the type of thing it is describing, followed by the specific
details for that type.
This is implemented in the \openapi{} specification as an abstract base class
containing common fields like a name and \uuid{} identifier, followed by a list
of derived types and their type identifiers.
\begin{lstlisting}[]
[+# OpenAPI schema+]
AbstractExecutable:
type: object
discriminator:
propertyName: type
mapping:
"uri:java-program-1.0": 'JavaProgram'
"uri:python-program-1.0": 'PythonProgram'
"uri:docker-container-1.0": 'DockerContainer'
"uri:singular-container-1.0": 'SingularContainer'
"uri:jupyter-notebook-1.0": 'JupyterNotebook'
....
properties:
name:
description: >
A human readable name, assigned by the client.
type: string
uuid:
description: >
A machine readable UUID, assigned by the server.
type: string
format: uuid
type:
description: >
The type discriminator.
type: string
format: uri
\end{lstlisting}
The derived types extend this abstract base class to include the details needed to
describe their particular type of \executablething{}.
This pattern of using a \codeword{type} discriminator to identify the type of thing, followed
by the type specific details is used in several places in the \executionbroker{} \datamodel{}.
This enables us to keep the core \datamodel{} relatively small, defining the common aspects
needed to describe an \executablething{} and the resources it needs while allowing the
\datamodel{} to be extended to describe a wide range of different types of things.
This pattern makes it easy for projects outside the core \ivoa{} community to add new
types of \executablething{s} and resources appropriate for their science domain.
Implementations do not need to understand all of the different possible types of \executable{}.
If a service doesn’t recognize a particular type, it can simply reply \codeword{NO}.
\begin{lstlisting}[]
Request - Can this platform execute <unkown-type> ?
Response - NO
\end{lstlisting}
\subsubsection{\dockercontainer{}}
\label{subsub-dockercontainer}