|
| 1 | +.. _task_management: |
| 2 | + |
| 3 | +================================= |
| 4 | +Task Management |
| 5 | +================================= |
| 6 | +.. currentmodule:: qlib |
| 7 | + |
| 8 | + |
| 9 | +Introduction |
| 10 | +============= |
| 11 | + |
| 12 | +The `Workflow <../component/introduction.html>`_ part introduces how to run research workflow in a loosely-coupled way. But it can only execute one ``task`` when you use ``qrun``. |
| 13 | +To automatically generate and execute different tasks, ``Task Management`` provides a whole process including `Task Generating`_, `Task Storing`_, `Task Training`_ and `Task Collecting`_. |
| 14 | +With this module, users can run their ``task`` automatically at different periods, in different losses, or even by different models. |
| 15 | + |
| 16 | +This whole process can be used in `Online Serving <../component/online.html>`_. |
| 17 | + |
| 18 | +An example of the entire process is shown `here <https://github.com/microsoft/qlib/tree/main/examples/model_rolling/task_manager_rolling.py>`_. |
| 19 | + |
| 20 | +Task Generating |
| 21 | +=============== |
| 22 | +A ``task`` consists of `Model`, `Dataset`, `Record`, or anything added by users. |
| 23 | +The specific task template can be viewed in |
| 24 | +`Task Section <../component/workflow.html#task-section>`_. |
| 25 | +Even though the task template is fixed, users can customize their ``TaskGen`` to generate different ``task`` by task template. |
| 26 | + |
| 27 | +Here is the base class of ``TaskGen``: |
| 28 | + |
| 29 | +.. autoclass:: qlib.workflow.task.gen.TaskGen |
| 30 | + :members: |
| 31 | + |
| 32 | +``Qlib`` provides a class `RollingGen <https://github.com/microsoft/qlib/tree/main/qlib/workflow/task/gen.py>`_ to generate a list of ``task`` of the dataset in different date segments. |
| 33 | +This class allows users to verify the effect of data from different periods on the model in one experiment. More information is `here <../reference/api.html#TaskGen>`_. |
| 34 | + |
| 35 | +Task Storing |
| 36 | +=============== |
| 37 | +To achieve higher efficiency and the possibility of cluster operation, ``Task Manager`` will store all tasks in `MongoDB <https://www.mongodb.com/>`_. |
| 38 | +``TaskManager`` can fetch undone tasks automatically and manage the lifecycle of a set of tasks with error handling. |
| 39 | +Users **MUST** finish the configuration of `MongoDB <https://www.mongodb.com/>`_ when using this module. |
| 40 | + |
| 41 | +Users need to provide the MongoDB URL and database name for using ``TaskManager`` in `initialization <../start/initialization.html#Parameters>`_ or make a statement like this. |
| 42 | + |
| 43 | + .. code-block:: python |
| 44 | +
|
| 45 | + from qlib.config import C |
| 46 | + C["mongo"] = { |
| 47 | + "task_url" : "mongodb://localhost:27017/", # your MongoDB url |
| 48 | + "task_db_name" : "rolling_db" # database name |
| 49 | + } |
| 50 | +
|
| 51 | +.. autoclass:: qlib.workflow.task.manage.TaskManager |
| 52 | + :members: |
| 53 | + |
| 54 | +More information of ``Task Manager`` can be found in `here <../reference/api.html#TaskManager>`_. |
| 55 | + |
| 56 | +Task Training |
| 57 | +=============== |
| 58 | +After generating and storing those ``task``, it's time to run the ``task`` which is in the *WAITING* status. |
| 59 | +``Qlib`` provides a method called ``run_task`` to run those ``task`` in task pool, however, users can also customize how tasks are executed. |
| 60 | +An easy way to get the ``task_func`` is using ``qlib.model.trainer.task_train`` directly. |
| 61 | +It will run the whole workflow defined by ``task``, which includes *Model*, *Dataset*, *Record*. |
| 62 | + |
| 63 | +.. autofunction:: qlib.workflow.task.manage.run_task |
| 64 | + |
| 65 | +Meanwhile, ``Qlib`` provides a module called ``Trainer``. |
| 66 | + |
| 67 | +.. autoclass:: qlib.model.trainer.Trainer |
| 68 | + :members: |
| 69 | + |
| 70 | +``Trainer`` will train a list of tasks and return a list of model recorders. |
| 71 | +``Qlib`` offer two kinds of Trainer, TrainerR is the simplest way and TrainerRM is based on TaskManager to help manager tasks lifecycle automatically. |
| 72 | +If you do not want to use ``Task Manager`` to manage tasks, then use TrainerR to train a list of tasks generated by ``TaskGen`` is enough. |
| 73 | +`Here <../reference/api.html#Trainer>`_ are the details about different ``Trainer``. |
| 74 | + |
| 75 | +Task Collecting |
| 76 | +=============== |
| 77 | +To collect the results of ``task`` after training, ``Qlib`` provides `Collector <../reference/api.html#Collector>`_, `Group <../reference/api.html#Group>`_ and `Ensemble <../reference/api.html#Ensemble>`_ to collect the results in a readable, expandable and loosely-coupled way. |
| 78 | + |
| 79 | +`Collector <../reference/api.html#Collector>`_ can collect objects from everywhere and process them such as merging, grouping, averaging and so on. It has 2 step action including ``collect`` (collect anything in a dict) and ``process_collect`` (process collected dict). |
| 80 | + |
| 81 | +`Group <../reference/api.html#Group>`_ also has 2 steps including ``group`` (can group a set of object based on `group_func` and change them to a dict) and ``reduce`` (can make a dict become an ensemble based on some rule). |
| 82 | +For example: {(A,B,C1): object, (A,B,C2): object} ---``group``---> {(A,B): {C1: object, C2: object}} ---``reduce``---> {(A,B): object} |
| 83 | + |
| 84 | +`Ensemble <../reference/api.html#Ensemble>`_ can merge the objects in an ensemble. |
| 85 | +For example: {C1: object, C2: object} ---``Ensemble``---> object |
| 86 | + |
| 87 | +So the hierarchy is ``Collector``'s second step corresponds to ``Group``. And ``Group``'s second step correspond to ``Ensemble``. |
| 88 | + |
| 89 | +For more information, please see `Collector <../reference/api.html#Collector>`_, `Group <../reference/api.html#Group>`_ and `Ensemble <../reference/api.html#Ensemble>`_, or the `example <https://github.com/microsoft/qlib/tree/main/examples/model_rolling/task_manager_rolling.py>`_. |
0 commit comments