diff --git a/README.RU.md b/README.RU.md index 2dcaeff..f77aa26 100644 --- a/README.RU.md +++ b/README.RU.md @@ -3,7 +3,7 @@ #### Plot Manager для засева Chia: https://www.chia.net/ [English](README.md) / [Русский](README.RU.md) -![The view of the manager](https://i.imgur.com/SmMDD0Q.png "View") +![The view of the manager](https://i.imgur.com/hIhjXt0.png "View") ##### Development Version: v0.0.1 @@ -63,7 +63,51 @@ * Пожалуйста, перешлите этот вопрос в Keybase или на вкладку Discussion. -## Установка +## All Commands [Нужен перевод] + +##### Example Usage of Commands +```text +> python3 manager.py start + +> python3 manager.py restart + +> python3 manager.py stop + +> python3 manager.py view + +> python3 manager.py status + +> python3 manager.py analyze_logs +``` + +### start + +This command will start the manager in the background. Once you start it, it will always be running unless all jobs have had their `max_plots` completed or there is an error. Errors will be logged in a file created `debug.log` + +### stop + +This command will terminate the manager in the background. It does not stop running plots, it will only stop new plots from getting created. + +### restart + +This command will run start and stop sequentially. + +### view + +This command will show the view that you can use to keep track of your running plots. This will get updated every X seconds defined by your `config.yaml`. + +### status + +This command will a single snapshot of the view. It will not loop. + +### analyze_logs + +This command will analyze all completed plot logs in your log folder and calculate the proper weights and line ends for your computer's configuration. Just populate the returned values under the `progress` section in your `config.yaml`. This only impacts the progress bar. + + +## Установка [Нужен перевод] + +#### NOTE: If `python` does not work, please try `python3`. Установка этой библиотеки проста. Ниже я приложил подробные инструкции, которые помогут вам начать работу. @@ -77,8 +121,10 @@ 2. Активация виртуальной среды. Это обязательно делать *каждый раз* открывая новое окно. * Пример Windows: `venv\Scripts\activate` * Пример Linux: `. ./venv/bin/activate` или `source ./venv/bin/activate` + * Пример Mac OS: `/Applications/Chia.app/Contents/Resources/app.asar.unpacked/daemon/chia` 3. Убедитесь что появился префикс `(venv)` в подтверждение активации среды. Префикс будет меняться в зависимости от того, как вы её назвали. 5. Установите необходимые модули: `pip install -r requirements.txt` + * If you plan on using Notifications or Prometheus then run the following to install the required modules: `pip install -r requirements-notification.txt` 6. Скопируйте `config.yaml.default` и переименуйте в `config.yaml` в той же директории. 7. Отредактируйте и настройте config.yaml на ваши персональные установки. Ниже приведена дополнительная информация по этому вопросу. * Вам также нужно будет добавить параметр `chia_location`! Который должен указывать на ваш исполняемый файл chia. @@ -128,15 +174,31 @@ Plot manager работает на основе идеи заданий. Каж Различные настройки уведомлений при запуске Plot Manager'а и когда новое поле готово. +### instrumentation [Нужен перевод] + +Settings for enabling Prometheus to gather metrics. + +* `prometheus_enabled` - If enabled, metrics will be gathered and an HTTP server will start up to expose the metrics for Prometheus. +* `prometheus_port` - HTTP server port. + +List of Metrics Gathered + +- **chia_running_plots**: A [Gauge](https://prometheus.io/docs/concepts/metric_types/#gauge) to see how many plots are currently being created. +- **chia_completed_plots**: A [Counter](https://prometheus.io/docs/concepts/metric_types/#counter) for completed plots. + ### progress * `phase_line_end` - параметр, который будет использоваться для определения того, когда заканчивается фаза. Предполагается, что этот параметр указывает на порядковый номер строки, на которой завершится фаза. Параметр используется механизмом вычисления прогресса вместе с существующим файлом журнала для вычисления процента прогресса. * `phase_weight` - вес, который следует присвоить каждой фазе в расчетах хода выполнения. Как правило, фазы 1 и 3 являются самыми длинными фазами, поэтому они будут иметь больший вес, чем другие. -### global +### global [Нужен перевод] * `max_concurrent` - Максимальное количество полей, которые может засеять ваша система. Менеджер не будет паралелльно запускать больше, чем это количество участков на протяжении всего времени. +* `max_for_phase_1` - The maximum number of plots that your system can run in phase 1. +* `minimum_minutes_between_jobs` - The minimum number of minutes before starting a new plotting job, this prevents multiple jobs from starting at the exact same time. This will alleviate congestion on destination drive. Set to 0 to disable. + +### job [Нужен перевод] -### job +Each job must have unique temporary directories. Настройки, которые будут использоваться каждым заданием. Обратите внимание, что у вас может быть несколько заданий, и каждое задание должно быть в формате YAML, чтобы оно было правильно интерпретировано. Почти все значения здесь будут переданы в исполняемый файл Chia. @@ -146,7 +208,7 @@ Plot manager работает на основе идеи заданий. Каж * `max_plots` - Максимальное количество заданий, выполняемых за один запуск менеджера. При любом перезапуске диспетчера эта переменная будет сброшена. Он здесь только для того, чтобы помочь в краткосрочном планировании засева. * [ОПЦИЯ]`farmer_public_key` - Ваш публичный ключ фермера. Если не указан, менеджер не будет передавать эту переменную исполняемому файлу chia, что приведет к использованию ваших ключей по умолчанию. Этот параметр необходим только в том случае, если на компьютере нет ваших учетных данных chia. * [ОПЦИЯ]`pool_public_key` - Ваш публичный ключ пула. Аналогично как и выше. -* `temporary_directory` - Временное место для засева. Можно указать только одну папку. Обычно размещается на SSD диске. +* `temporary_directory` - Временное место для засева. Может иметь одно или несколько значений. Обычно размещается на SSD диске. These directories must be unique from one another. * [ОПЦИЯ]`temporary2_directory` - Может иметь одно или несколько значений. Это необязательный параметр для использования второго временного каталога засева полей Chia. * `destination_directory` - Может иметь одно или несколько значений. Указывает на финальную директорию куда будет помещено готовое поле. Если вы укажете несколько, готовые поля будут размещаться по одному на каждый следующий диск поочереди. * `size` - соответствует размеру поля (сложности k). Здесь вам следует указывать например 32, 33, 34, 35 и т.д. @@ -156,11 +218,24 @@ Plot manager работает на основе идеи заданий. Каж * `memory_buffer` - Объем памяти, который вы хотите выделить задаче. * `max_concurrent` - Максимальное количество участков для этой задачи на всё время. * `max_concurrent_with_start_early` - Максимальное количество участков для этой задачи в любой момент времени, включая фазы, которые начались раньше. +* `initial_delay_minutes` - This is the initial delay that is used when initiate the first job. It is only ever considered once. If you restart manager, it will still adhere to this value. * `stagger_minutes` - Количество минут ожидания перед запуском следующего задания. Вы можете установить это значение равным нулю, если хотите, чтобы ваши засевы запускались немедленно, когда это позволяют одновременные ограничения * `max_for_phase_1` - Максимальное число засевов в фазе 1 для этой задачи. * `concurrency_start_early_phase` - Фаза, в которой вы хотите начать засеивание заранее. Рекомендуется использовать 4. * `concurrency_start_early_phase_delay` - Максимальное количество минут ожидания до запуска нового участка при обнаружении ранней фазы запуска. * `temporary2_destination_sync` - Представлять каталог назначения как каталог второй временный каталог. Эти два каталога будут синхронизированы, так что они всегда будут представлены как одно и то же значение. +* `exclude_final_directory` - Whether to skip adding `destination_directory` to harvester for farming. This is a Chia feature. +* `skip_full_destinations` - When this is enabled it will calculate the sizes of all running plots and the future plot to determine if there is enough space left on the drive to start a job. If there is not, it will skip the destination and move onto the next one. Once all are full, it will disable the job. +* `unix_process_priority` - UNIX Only. This is the priority that plots will be given when they are spawned. UNIX values must be between -20 and 19. The higher the value, the lower the priority of the process. +* `windows_process_priority` - Windows Only. This is the priority that plots will be given when they are spawned. Windows values vary and should be set to one of the following values: + * 16384 `BELOW_NORMAL_PRIORITY_CLASS` + * 32 `NORMAL_PRIORITY_CLASS` + * 32768 `ABOVE_NORMAL_PRIORITY_CLASS` + * 128 `HIGH_PRIORITY_CLASS` + * 256 `REALTIME_PRIORITY_CLASS` +* `enable_cpu_affinity` - Enable or disable cpu affinity for plot processes. Systems that plot and harvest may see improved harvester or node performance when excluding one or two threads for plotting process. +* `cpu_affinity` - List of cpu (or threads) to allocate for plot processes. The default example assumes you have a hyper-threaded 4 core CPU (8 logical cores). This config will restrict plot processes to use logical cores 0-5, leaving logical cores 6 and 7 for other processes (6 restricted, 2 free). + ### Перевод на Русский Оригинальный текст на Английском языке Вы можете найти по адресу [https://github.com/swar/Swar-Chia-Plot-Manager](https://github.com/swar/Swar-Chia-Plot-Manager) diff --git a/README.md b/README.md index 2022347..88070d4 100644 --- a/README.md +++ b/README.md @@ -3,9 +3,9 @@ #### A plot manager for Chia plotting: https://www.chia.net/ [English](README.md) / [Русский](README.RU.md) -![The view of the manager](https://i.imgur.com/SmMDD0Q.png "View") +![The view of the manager](https://i.imgur.com/hIhjXt0.png "View") -##### Development Version: v0.0.1 +##### Development Version: v0.1.0 This is a cross-platform Chia Plot Manager that will work on the major operating systems. This is not a plotter. The purpose of this library is to manage your plotting and kick off new plots with the settings that you configure. Everyone's system is unique so customization is an important feature that was engraved into this library. @@ -62,10 +62,54 @@ Please do not use GitHub issues for questions or support regarding your own pers * Please forward this question to Keybase or the Discussion tab. +## All Commands + +##### Example Usage of Commands +```text +> python3 manager.py start + +> python3 manager.py restart + +> python3 manager.py stop + +> python3 manager.py view + +> python3 manager.py status + +> python3 manager.py analyze_logs +``` + +### start + +This command will start the manager in the background. Once you start it, it will always be running unless all jobs have had their `max_plots` completed or there is an error. Errors will be logged in a file created `debug.log` + +### stop + +This command will terminate the manager in the background. It does not stop running plots, it will only stop new plots from getting created. + +### restart + +This command will run start and stop sequentially. + +### view + +This command will show the view that you can use to keep track of your running plots. This will get updated every X seconds defined by your `config.yaml`. + +### status + +This command will a single snapshot of the view. It will not loop. + +### analyze_logs + +This command will analyze all completed plot logs in your log folder and calculate the proper weights and line ends for your computer's configuration. Just populate the returned values under the `progress` section in your `config.yaml`. This only impacts the progress bar. + + ## Installation The installation of this library is straightforward. I have attached detailed instructions below that should help you get started. +#### NOTE: If `python` does not work, please try `python3`. + 1. Download and Install Python 3.7 or higher: https://www.python.org/ 2. `git clone` this repo or download it. 3. Open CommandPrompt / PowerShell / Terminal and `cd` into the main library folder. @@ -76,8 +120,10 @@ The installation of this library is straightforward. I have attached detailed in 2. Activate the virtual environment. This must be done *every single time* you open a new window. * Example Windows: `venv\Scripts\activate` * Example Linux: `. ./venv/bin/activate` or `source ./venv/bin/activate` + * Example Mac OS: `/Applications/Chia.app/Contents/Resources/app.asar.unpacked/daemon/chia` 3. Confirm that it has activated by seeing the `(venv)` prefix. The prefix will change depending on what you named it. 5. Install the required modules: `pip install -r requirements.txt` + * If you plan on using Notifications or Prometheus then run the following to install the required modules: `pip install -r requirements-notification.txt` 6. Copy `config.yaml.default` and name it as `config.yaml` in the same directory. 7. Edit and set up the config.yaml to your own personal settings. There is more help on this below. * You will need to add the `chia_location` as well! This should point to your chia executable. @@ -128,6 +174,18 @@ These are the settings that will be used by the view. These are different settings in order to send notifications when the plot manager starts and when a plot has been completed. +### instrumentation + +Settings for enabling Prometheus to gather metrics. + +* `prometheus_enabled` - If enabled, metrics will be gathered and an HTTP server will start up to expose the metrics for Prometheus. +* `prometheus_port` - HTTP server port. + +List of Metrics Gathered + +- **chia_running_plots**: A [Gauge](https://prometheus.io/docs/concepts/metric_types/#gauge) to see how many plots are currently being created. +- **chia_completed_plots**: A [Counter](https://prometheus.io/docs/concepts/metric_types/#counter) for completed plots. + ### progress * `phase_line_end` - These are the settings that will be used to dictate when a phase ends in the progress bar. It is supposed to reflect the line at which the phase will end so the progress calculations can use that information with the existing log file to calculate a progress percent. @@ -135,9 +193,13 @@ These are different settings in order to send notifications when the plot manage ### global * `max_concurrent` - The maximum number of plots that your system can run. The manager will not kick off more than this number of plots total over time. +* `max_for_phase_1` - The maximum number of plots that your system can run in phase 1. +* `minimum_minutes_between_jobs` - The minimum number of minutes before starting a new plotting job, this prevents multiple jobs from starting at the exact same time. This will alleviate congestion on destination drive. Set to 0 to disable. ### job +Each job must have unique temporary directories. + These are the settings that will be used by each job. Please note you can have multiple jobs and each job should be in YAML format in order for it to be interpreted correctly. Almost all the values here will be passed into the Chia executable file. Check for more details on the Chia CLI here: https://github.com/Chia-Network/chia-blockchain/wiki/CLI-Commands-Reference @@ -146,7 +208,7 @@ Check for more details on the Chia CLI here: https://github.com/Chia-Network/chi * `max_plots` - This is the maximum number of jobs to make in one run of the manager. Any restarts to manager will reset this variable. It is only here to help with short term plotting. * [OPTIONAL]`farmer_public_key` - Your farmer public key. If none is provided, it will not pass in this variable to the chia executable which results in your default keys being used. This is only needed if you have chia set up on a machine that does not have your credentials. * [OPTIONAL]`pool_public_key` - Your pool public key. Same information as the above. -* `temporary_directory` - Only a single directory should be passed into here. This is where the plotting will take place. +* `temporary_directory` - Can be a single value or a list of values. This is where the plotting will take place. If you provide a list, it will cycle through each drive one by one. These directories must be unique from one another. * [OPTIONAL]`temporary2_directory` - Can be a single value or a list of values. This is an optional parameter to use in case you want to use the temporary2 directory functionality of Chia plotting. * `destination_directory` - Can be a single value or a list of values. This is the final directory where the plot will be transferred once it is completed. If you provide a list, it will cycle through each drive one by one. * `size` - This refers to the k size of the plot. You would type in something like 32, 33, 34, 35... in here. @@ -156,8 +218,20 @@ Check for more details on the Chia CLI here: https://github.com/Chia-Network/chi * `memory_buffer` - The amount of memory you want to allocate to the process. * `max_concurrent` - The maximum number of plots to have for this job at any given time. * `max_concurrent_with_start_early` - The maximum number of plots to have for this job at any given time including phases that started early. +* `initial_delay_minutes` - This is the initial delay that is used when initiate the first job. It is only ever considered once. If you restart manager, it will still adhere to this value. * `stagger_minutes` - The amount of minutes to wait before the next plot for this job can get kicked off. You can even set this to zero if you want your plots to get kicked off immediately when the concurrent limits allow for it. * `max_for_phase_1` - The maximum number of plots on phase 1 for this job. * `concurrency_start_early_phase` - The phase in which you want to start a plot early. It is recommended to use 4 for this field. * `concurrency_start_early_phase_delay` - The maximum number of minutes to wait before a new plot gets kicked off when the start early phase has been detected. * `temporary2_destination_sync` - This field will always submit the destination directory as the temporary2 directory. These two directories will be in sync so that they will always be submitted as the same value. +* `exclude_final_directory` - Whether to skip adding `destination_directory` to harvester for farming. This is a Chia feature. +* `skip_full_destinations` - When this is enabled it will calculate the sizes of all running plots and the future plot to determine if there is enough space left on the drive to start a job. If there is not, it will skip the destination and move onto the next one. Once all are full, it will disable the job. +* `unix_process_priority` - UNIX Only. This is the priority that plots will be given when they are spawned. UNIX values must be between -20 and 19. The higher the value, the lower the priority of the process. +* `windows_process_priority` - Windows Only. This is the priority that plots will be given when they are spawned. Windows values vary and should be set to one of the following values: + * 16384 `BELOW_NORMAL_PRIORITY_CLASS` + * 32 `NORMAL_PRIORITY_CLASS` + * 32768 `ABOVE_NORMAL_PRIORITY_CLASS` + * 128 `HIGH_PRIORITY_CLASS` + * 256 `REALTIME_PRIORITY_CLASS` +* `enable_cpu_affinity` - Enable or disable cpu affinity for plot processes. Systems that plot and harvest may see improved harvester or node performance when excluding one or two threads for plotting process. +* `cpu_affinity` - List of cpu (or threads) to allocate for plot processes. The default example assumes you have a hyper-threaded 4 core CPU (8 logical cores). This config will restrict plot processes to use logical cores 0-5, leaving logical cores 6 and 7 for other processes (6 restricted, 2 free). diff --git a/VERSION b/VERSION index 95e94cd..f81cb99 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -v0.0.1 \ No newline at end of file +v0.1.0b \ No newline at end of file diff --git a/VERSIONLOG.md b/VERSIONLOG.md index d19bb69..d9d6aa2 100644 --- a/VERSIONLOG.md +++ b/VERSIONLOG.md @@ -1,32 +1,40 @@ # Version Log The version log history will be kept in this file. -### v.0.1.0b -**THIS BRANCH IS CURRENTLY IN DEVELOPMENT!** The first feature packed improvement of the plot manager. +### v.0.1.0 +The first feature packed improvement of the plot manager. #### New Features - Adding `exclude_final_directory` as an option in the `config.yaml`. ([#35](https://github.com/swar/Swar-Chia-Plot-Manager/pull/35)) - Skipping `manager.log` as a file and renaming to `debug.log`. ([#38](https://github.com/swar/Swar-Chia-Plot-Manager/pull/38)) -- Added destination directory skipping when a drive is full. It will calculate size of total running plots and the predicted size of the new plot prior to making that judgement. ([#36](https://github.com/swar/Swar-Chia-Plot-Manager/pull/36), [#193](https://github.com/swar/Swar-Chia-Plot-Manager/pull/193)) +- Added destination directory skipping when a drive is full using `skip_full_destinations` at the job level. It will calculate size of total running plots and the predicted size of the new plot prior to making that judgement. ([#36](https://github.com/swar/Swar-Chia-Plot-Manager/pull/36), [#193](https://github.com/swar/Swar-Chia-Plot-Manager/pull/193)) +- Added global setting for `max_for_phase_1`. +- Added global setting for `minimum_minutes_between_jobs`. ([#380](https://github.com/swar/Swar-Chia-Plot-Manager/pull/380), [#468](https://github.com/swar/Swar-Chia-Plot-Manager/pull/468)) - Added list support for temporary directories. This will cycle through all temporary directories in the order that they are listed for a job. ([#150](https://github.com/swar/Swar-Chia-Plot-Manager/pull/150), [#153](https://github.com/swar/Swar-Chia-Plot-Manager/pull/153/files), [#182](https://github.com/swar/Swar-Chia-Plot-Manager/pull/182)) - Added CPU affinity support on the job level. This allows you to select and dedicate specific threads to your jobs. ([#134](https://github.com/swar/Swar-Chia-Plot-Manager/pull/134), [#281](https://github.com/swar/Swar-Chia-Plot-Manager/pull/281)) - Added process priority levels on the job level. This allows you to set the priority levels to whatever you choose. Some people want low priority, while others want higher priorities. ([#282](https://github.com/swar/Swar-Chia-Plot-Manager/pull/282)) - Added an option to delay a job by a set number of minutes. If you started manager and there is a stagger for the job, it will use the initial delay only if it is longer than the stagger. ([#283](https://github.com/swar/Swar-Chia-Plot-Manager/pull/283)) - Added an option in `manager.py` to spit out a single instance of the view using the `status` argument as well as `json` format of the jobs. ([#300](https://github.com/swar/Swar-Chia-Plot-Manager/pull/300), [#374](https://github.com/swar/Swar-Chia-Plot-Manager/pull/374)) - Added support for Telegram notifications. ([#316](https://github.com/swar/Swar-Chia-Plot-Manager/pull/316), [#364](https://github.com/swar/Swar-Chia-Plot-Manager/pull/364)) +- Added support for IFTTT webhooks. ([#425](https://github.com/swar/Swar-Chia-Plot-Manager/pull/425), [#471](https://github.com/swar/Swar-Chia-Plot-Manager/pull/471)) - Added support for instrumentation using Prometheus ([#87](https://github.com/swar/Swar-Chia-Plot-Manager/pull/87), [#196](https://github.com/swar/Swar-Chia-Plot-Manager/pull/196)) #### Changes - Switching notification imports to a separate requirements file and turning them into lazy imports. ([#159](https://github.com/swar/Swar-Chia-Plot-Manager/pull/159), [196](https://github.com/swar/Swar-Chia-Plot-Manager/pull/196)) -- Reworked the Drives Table in the view to include associated jobs. This includes minor tweaks to the display to remove ambiguity such as renaming plots to "#". ([#191](https://github.com/swar/Swar-Chia-Plot-Manager/pull/191), [#368](https://github.com/swar/Swar-Chia-Plot-Manager/pull/368)) +- Reworked the Drives Table in the view to include associated jobs. This includes minor tweaks to the display to remove ambiguity such as renaming plots to "#". ([#191](https://github.com/swar/Swar-Chia-Plot-Manager/pull/191), [#368](https://github.com/swar/Swar-Chia-Plot-Manager/pull/368), [#406](https://github.com/swar/Swar-Chia-Plot-Manager/pull/406)) +- Adding basic checks that will break and have more detailed error messaging to assist in end-user interaction. Also, I was tired of getting the same repeat questions over and over again. +- Adding more psutil error handling. +- Jobs are now unique based on temporary directories. #### Bug Fixes - Fixed a bug where `max_plots` was not working properly. It was counting running plots when you restarted manager. Now it will only count new plots kicked off. -- Fixed a bug in elpased_time column where elapsed days greater than 24 hours were resulting in calculations being off by a day. ([#190](https://github.com/swar/Swar-Chia-Plot-Manager/pull/190)) +- Fixed a bug in elapsed_time column where elapsed days greater than 24 hours were resulting in calculations being off by a day. ([#190](https://github.com/swar/Swar-Chia-Plot-Manager/pull/190)) - Skipping processes that result in an `AccessDenied` error when finding manager processes. ([#147](https://github.com/swar/Swar-Chia-Plot-Manager/pull/147)) - Fixed a bug where psutil going stale on Linux users was not allowing the script to restart on its own. ([#197](https://github.com/swar/Swar-Chia-Plot-Manager/pull/197)) - Fixed a bug where NFS drives weren't being identified. ([#284](https://github.com/swar/Swar-Chia-Plot-Manager/pull/284)) - Removed the hardcoded next log check date in the view. +- Fixed a bug where NoSuchProcess error pops up when viewing opened files. + ### v0.0.1 This is the initial public release of the plot manager with additional bug fixes to account for edge cases on various operating systems. diff --git a/config.yaml.default b/config.yaml.default index 4e3d038..3f75870 100644 --- a/config.yaml.default +++ b/config.yaml.default @@ -3,6 +3,7 @@ # WINDOWS EXAMPLE: C:\Users\Swar\AppData\Local\chia-blockchain\app-1.1.5\resources\app.asar.unpacked\daemon\chia.exe # LINUX EXAMPLE: /usr/lib/chia-blockchain/resources/app.asar.unpacked/daemon/chia # LINUX2 EXAMPLE: /home/swar/chia-blockchain/venv/bin/chia +# MAC OS EXAMPLE: /Applications/Chia.app/Contents/Resources/app.asar.unpacked/daemon/chia chia_location: @@ -48,6 +49,10 @@ notifications: notify_discord: false discord_webhook_url: https://discord.com/api/webhooks/0000000000000000/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX + # IFTTT, ref https://ifttt.com/maker_webhooks, and this function will send title as value1 and message as value2. + notify_ifttt: false + ifttt_webhook_url: https://maker.ifttt.com/trigger/{event}/with/key/{api_key} + # PLAY AUDIO SOUND notify_sound: false song: audio.mp3 @@ -57,6 +62,10 @@ notifications: pushover_user_key: xx pushover_api_key: xx + # TELEGRAM + notify_telegram: false + telegram_token: xxxxx + # TWILIO notify_twilio: false twilio_account_sid: xxxxx @@ -65,6 +74,12 @@ notifications: twilio_to_phone: +1234657890 +instrumentation: + # This setting is here in case you wanted to enable instrumentation using Prometheus. + prometheus_enabled: false + prometheus_port: 9090 + + progress: # phase_line_end: These are the settings that will be used to dictate when a phase ends in the progress bar. It is # supposed to reflect the line at which the phase will end so the progress calculations can use that @@ -86,7 +101,13 @@ global: # # max_concurrent: The maximum number of plots that your system can run. The manager will not kick off more than this # number of plots total over time. + # max_for_phase_1: The maximum number of plots that your system can run in phase 1. + # minimum_minutes_between_jobs: The minimum number of minutes before starting a new plotting job, this prevents + # multiple jobs from starting at the exact same time. This will alleviate congestion + # on destination drive. Set to 0 to disable. max_concurrent: 10 + max_for_phase_1: 3 + minimum_minutes_between_jobs: 5 jobs: @@ -105,7 +126,8 @@ jobs: # you have chia set up on a machine that does not have your credentials. # [OPTIONAL] pool_public_key: Your pool public key. Same information as the above. # - # temporary_directory: Only a single directory should be passed into here. This is where the plotting will take place. + # temporary_directory: Can be a single value or a list of values. This is where the plotting will take place. If you + # provide a list, it will cycle through each drive one by one. # [OPTIONAL] temporary2_directory: Can be a single value or a list of values. This is an optional parameter to use in # case you want to use the temporary2 directory functionality of Chia plotting. # destination_directory: Can be a single value or a list of values. This is the final directory where the plot will be @@ -121,6 +143,8 @@ jobs: # max_concurrent: The maximum number of plots to have for this job at any given time. # max_concurrent_with_start_early: The maximum number of plots to have for this job at any given time including # phases that started early. + # initial_delay_minutes: This is the initial delay that is used when initiate the first job. It is only ever + # considered once. If you restart manager, it will still adhere to this value. # stagger_minutes: The amount of minutes to wait before the next plot for this job can get kicked off. You can even set this to # zero if you want your plots to get kicked off immediately when the concurrent limits allow for it. # max_for_phase_1: The maximum number of plots on phase 1 for this job. @@ -131,6 +155,25 @@ jobs: # temporary2_destination_sync: This field will always submit the destination directory as the temporary2 directory. # These two directories will be in sync so that they will always be submitted as the # same value. + # exclude_final_directory: Whether to skip adding `destination_directory` to harvester for farming + # skip_full_destinations: When this is enabled it will calculate the sizes of all running plots and the future plot + # to determine if there is enough space left on the drive to start a job. If there is not, + # it will skip the destination and move onto the next one. Once all are full, it will + # disable the job. + # unix_process_priority: UNIX Only. This is the priority that plots will be given when they are spawned. UNIX values + # must be between -20 and 19. The higher the value, the lower the priority of the process. + # windows_process_priority: Windows Only. This is the priority that plots will be given when they are spawned. + # Windows values vary and should be set to one of the following values: + # - 16384 (BELOW_NORMAL_PRIORITY_CLASS) + # - 32 (NORMAL_PRIORITY_CLASS) + # - 32768 (ABOVE_NORMAL_PRIORITY_CLASS) + # - 128 (HIGH_PRIORITY_CLASS) + # - 256 (REALTIME_PRIORITY_CLASS) + # enable_cpu_affinity: Enable or disable cpu affinity for plot processes. Systems that plot and harvest may see + # improved harvester or node performance when excluding one or two threads for plotting process. + # cpu_affinity: List of cpu (or threads) to allocate for plot processes. The default example assumes you have + # a hyper-threaded 4 core CPU (8 logical cores). This config will restrict plot processes to use + # logical cores 0-5, leaving logical cores 6 and 7 for other processes (6 restricted, 2 free). - name: micron max_plots: 999 farmer_public_key: @@ -145,17 +188,26 @@ jobs: memory_buffer: 4000 max_concurrent: 6 max_concurrent_with_start_early: 7 + initial_delay_minutes: 0 stagger_minutes: 60 max_for_phase_1: 2 concurrency_start_early_phase: 4 concurrency_start_early_phase_delay: 0 temporary2_destination_sync: false + exclude_final_directory: false + skip_full_destinations: true + unix_process_priority: 10 + windows_process_priority: 32 + enable_cpu_affinity: false + cpu_affinity: [ 0, 1, 2, 3, 4, 5 ] - name: inland max_plots: 999 farmer_public_key: pool_public_key: - temporary_directory: Y:\Plotter + temporary_directory: + - Y:\Plotter1 + - Y:\Plotter2 temporary2_directory: - J:\Plots - K:\Plots @@ -169,8 +221,15 @@ jobs: memory_buffer: 4000 max_concurrent: 2 max_concurrent_with_start_early: 3 + initial_delay_minutes: 0 stagger_minutes: 180 max_for_phase_1: 1 concurrency_start_early_phase: 4 concurrency_start_early_phase_delay: 0 temporary2_destination_sync: false + exclude_final_directory: false + skip_full_destinations: true + unix_process_priority: 10 + windows_process_priority: 32 + enable_cpu_affinity: false + cpu_affinity: [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23 ] diff --git a/manager.py b/manager.py index b5c030d..7f448a3 100644 --- a/manager.py +++ b/manager.py @@ -1,17 +1,18 @@ import argparse from plotmanager.library.utilities.exceptions import InvalidArgumentException -from plotmanager.library.utilities.commands import start_manager, stop_manager, view, analyze_logs +from plotmanager.library.utilities.commands import start_manager, stop_manager, view, json_output, analyze_logs parser = argparse.ArgumentParser(description='This is the central manager for Swar\'s Chia Plot Manager.') help_description = ''' -There are a few different actions that you can use: "start", "restart", "stop", "view", and "analyze_logs". "start" will -start a manager process. If one already exists, it will display an error message. "restart" will try to kill any -existing manager and start a new one. "stop" will terminate the manager, but all existing plots will be completed. -"view" can be used to display an updating table that will show the progress of your plots. Once a manager has started it -will always be running in the background unless an error occurs. This field is case-sensitive. +There are a few different actions that you can use: "start", "restart", "stop", "view", "status", "json", and +"analyze_logs". "start" will start a manager process. If one already exists, it will display an error message. +"restart" will try to kill any existing manager and start a new one. "stop" will terminate the manager, but all +existing plots will be completed. "view" can be used to display an updating table that will show the progress of your +plots. Once a manager has started it will always be running in the background unless an error occurs. This field is +case-sensitive. "analyze_logs" is a helper command that will scan all the logs in your log_directory to get your custom settings for the progress settings in the YAML file. @@ -34,9 +35,13 @@ stop_manager() elif args.action == 'view': view() +elif args.action == 'json': + json_output() +elif args.action == 'status': + view(loop=False) elif args.action == 'analyze_logs': analyze_logs() else: - error_message = 'Invalid action provided. The valid options are "start", "restart", "stop", "view", and ' \ + error_message = 'Invalid action provided. The valid options are "start", "restart", "stop", "view", "status", "json" and ' \ '"analyze_logs".' raise InvalidArgumentException(error_message) diff --git a/plotmanager/library/commands/plots.py b/plotmanager/library/commands/plots.py index 321657e..50bc34e 100644 --- a/plotmanager/library/commands/plots.py +++ b/plotmanager/library/commands/plots.py @@ -1,5 +1,6 @@ def create(size, memory_buffer, temporary_directory, destination_directory, threads, buckets, bitfield, - chia_location='chia', temporary2_directory=None, farmer_public_key=None, pool_public_key=None): + chia_location='chia', temporary2_directory=None, farmer_public_key=None, pool_public_key=None, + exclude_final_directory=False): flags = dict( k=size, b=memory_buffer, @@ -16,6 +17,8 @@ def create(size, memory_buffer, temporary_directory, destination_directory, thre flags['p'] = pool_public_key if bitfield is False: flags['e'] = '' + if exclude_final_directory: + flags['x'] = '' data = [chia_location, 'plots', 'create'] for key, value in flags.items(): diff --git a/plotmanager/library/parse/configuration.py b/plotmanager/library/parse/configuration.py index 8dc1b8c..53170b4 100644 --- a/plotmanager/library/parse/configuration.py +++ b/plotmanager/library/parse/configuration.py @@ -54,24 +54,53 @@ def _get_jobs(config): return config['jobs'] -def _get_global_max_concurrent_config(config): +def _get_global_config(config): if 'global' not in config: raise InvalidYAMLConfigException('Failed to find global parameter in the YAML.') - if 'max_concurrent' not in config['global']: - raise InvalidYAMLConfigException('Failed to find max_concurrent in the global parameter in the YAML.') - max_concurrent = config['global']['max_concurrent'] + global_config = config['global'] + expected_parameters = ['max_concurrent', 'max_for_phase_1', 'minimum_minutes_between_jobs'] + _check_parameters(parameter=global_config, expected_parameters=expected_parameters, parameter_type='global') + max_concurrent = global_config['max_concurrent'] + max_for_phase_1 = global_config['max_for_phase_1'] + minimum_minutes_between_jobs = global_config['minimum_minutes_between_jobs'] if not isinstance(max_concurrent, int): raise Exception('global -> max_concurrent should be a integer value.') - return max_concurrent + if not isinstance(max_for_phase_1, int): + raise Exception('global -> max_for_phase_1 should be a integer value.') + if not isinstance(minimum_minutes_between_jobs, int): + raise Exception('global -> max_concurrent should be a integer value.') + return max_concurrent, max_for_phase_1, minimum_minutes_between_jobs def _get_notifications_settings(config): - if 'notifications' not in config: - raise InvalidYAMLConfigException('Failed to find notifications parameter in the YAML.') - notifications = config['notifications'] - expected_parameters = ['notify_discord', 'discord_webhook_url', 'notify_sound', 'song', 'notify_pushover', - 'pushover_user_key', 'pushover_api_key'] - _check_parameters(parameter=notifications, expected_parameters=expected_parameters, parameter_type='notification') + notifications = config.get('notifications', None) + if not notifications: + notifications = {} + + if 'notify_discord' in notifications and notifications['notify_discord']: + _check_parameters(parameter=notifications, expected_parameters=['discord_webhook_url'], + parameter_type='notification') + + if 'notify_ifttt' in notifications and notifications['notify_ifttt']: + _check_parameters(parameter=notifications, expected_parameters=['ifttt_webhook_url'], + parameter_type='notification') + + if 'notify_sound' in notifications and notifications['notify_sound']: + _check_parameters(parameter=notifications, expected_parameters=['song'], + parameter_type='notification') + + if 'notify_pushover' in notifications and notifications['notify_pushover']: + _check_parameters(parameter=notifications, expected_parameters=['pushover_user_key', 'pushover_api_key'], + parameter_type='notification') + + if 'notify_telegram' in notifications and notifications['notify_telegram']: + _check_parameters(parameter=notifications, expected_parameters=['telegram_token'], + parameter_type='notification') + + if 'notify_twilio' in notifications and notifications['notify_twilio']: + _check_parameters(parameter=notifications, expected_parameters=['discord_webhook_url'], + parameter_type='notification') + return notifications @@ -85,6 +114,13 @@ def _get_view_settings(config): return view +def _get_instrumentation_settings(config): + if 'instrumentation' not in config: + raise InvalidYAMLConfigException('Failed to find instrumentation parameter in the YAML.') + instrumentation = config.get('instrumentation', {}) + return instrumentation + + def _check_parameters(parameter, expected_parameters, parameter_type): failed_checks = [] checks = expected_parameters @@ -106,10 +142,12 @@ def get_config_info(): if not os.path.exists(log_directory): os.makedirs(log_directory) jobs = _get_jobs(config=config) - max_concurrent = _get_global_max_concurrent_config(config=config) + max_concurrent, max_for_phase_1, minimum_minutes_between_jobs = _get_global_config(config=config) progress_settings = _get_progress_settings(config=config) notification_settings = _get_notifications_settings(config=config) view_settings = _get_view_settings(config=config) + instrumentation_settings = _get_instrumentation_settings(config=config) - return chia_location, log_directory, jobs, manager_check_interval, max_concurrent, \ - progress_settings, notification_settings, log_level, view_settings + return chia_location, log_directory, jobs, manager_check_interval, max_concurrent, max_for_phase_1, \ + minimum_minutes_between_jobs, progress_settings, notification_settings, log_level, view_settings, \ + instrumentation_settings diff --git a/plotmanager/library/utilities/commands.py b/plotmanager/library/utilities/commands.py index 71a20a2..1212feb 100644 --- a/plotmanager/library/utilities/commands.py +++ b/plotmanager/library/utilities/commands.py @@ -8,12 +8,14 @@ from datetime import datetime, timedelta from plotmanager.library.parse.configuration import get_config_info +from plotmanager.library.utilities.configuration import test_configuration from plotmanager.library.utilities.exceptions import ManagerError, TerminationException from plotmanager.library.utilities.jobs import load_jobs from plotmanager.library.utilities.log import analyze_log_dates, check_log_progress, analyze_log_times from plotmanager.library.utilities.notifications import send_notifications -from plotmanager.library.utilities.print import print_view -from plotmanager.library.utilities.processes import is_windows, get_manager_processes, get_running_plots, start_process +from plotmanager.library.utilities.print import print_view, print_json +from plotmanager.library.utilities.processes import is_windows, get_manager_processes, get_running_plots, \ + start_process, identify_drive, get_system_drives def start_manager(): @@ -24,12 +26,18 @@ def start_manager(): stateless_manager_path = os.path.join(directory, 'stateless-manager.py') if not os.path.exists(stateless_manager_path): raise FileNotFoundError('Failed to find stateless-manager.') - manager_log_file_path = os.path.join(directory, 'manager.log') - manager_log_file = open(manager_log_file_path, 'a') + debug_log_file_path = os.path.join(directory, 'debug.log') + debug_log_file = open(debug_log_file_path, 'a') python_file_path = sys.executable - chia_location, log_directory, jobs, manager_check_interval, max_concurrent, progress_settings, \ - notification_settings, debug_level, view_settings = get_config_info() + chia_location, log_directory, config_jobs, manager_check_interval, max_concurrent, max_for_phase_1, \ + minimum_minutes_between_jobs, progress_settings, notification_settings, debug_level, view_settings, \ + instrumentation_settings = get_config_info() + + load_jobs(config_jobs) + + test_configuration(chia_location=chia_location, notification_settings=notification_settings, + instrumentation_settings=instrumentation_settings) extra_args = [] if is_windows(): @@ -41,10 +49,10 @@ def start_manager(): python_file_path = pythonw_file_path args = [python_file_path, stateless_manager_path] + extra_args - start_process(args=args, log_file=manager_log_file) + start_process(args=args, log_file=debug_log_file) time.sleep(3) if not get_manager_processes(): - raise ManagerError('Failed to start Manager. Please look at manager.log for more details on the error. It is in the same folder as manager.py.') + raise ManagerError('Failed to start Manager. Please look at debug.log for more details on the error. It is in the same folder as manager.py.') send_notifications( title='Plot manager started', @@ -69,31 +77,79 @@ def stop_manager(): print("Successfully stopped manager processes.") -def view(): - chia_location, log_directory, config_jobs, manager_check_interval, max_concurrent, progress_settings, \ - notification_settings, debug_level, view_settings = get_config_info() +def json_output(): + chia_location, log_directory, config_jobs, manager_check_interval, max_concurrent, max_for_phase_1, \ + minimum_minutes_between_jobs, progress_settings, notification_settings, debug_level, view_settings, \ + instrumentation_settings = get_config_info() + + system_drives = get_system_drives() + + drives = {'temp': [], 'temp2': [], 'dest': []} + jobs = load_jobs(config_jobs) + for job in jobs: + directories = { + 'temp': job.temporary_directory, + 'dest': job.destination_directory, + 'temp2': job.temporary2_directory, + } + for key, directory_list in directories.items(): + if directory_list is None: + continue + if not isinstance(directory_list, list): + directory_list = [directory_list] + for directory in directory_list: + drive = identify_drive(file_path=directory, drives=system_drives) + if drive in drives[key]: + continue + drives[key].append(drive) + + running_work = {} + + jobs = load_jobs(config_jobs) + jobs, running_work = get_running_plots(jobs=jobs, running_work=running_work, + instrumentation_settings=instrumentation_settings) + check_log_progress(jobs=jobs, running_work=running_work, progress_settings=progress_settings, + notification_settings=notification_settings, view_settings=view_settings, + instrumentation_settings=instrumentation_settings) + print_json(jobs=jobs, running_work=running_work, view_settings=view_settings) + + has_file = False + if len(running_work.values()) == 0: + has_file = True + for work in running_work.values(): + if not work.log_file: + continue + has_file = True + break + if not has_file: + print("Restarting view due to psutil going stale...") + system_args = [f'"{sys.executable}"'] + sys.argv + os.execv(sys.executable, system_args) + exit() + + +def view(loop=True): + chia_location, log_directory, config_jobs, manager_check_interval, max_concurrent, max_for_phase_1, \ + minimum_minutes_between_jobs, progress_settings, notification_settings, debug_level, view_settings, \ + instrumentation_settings = get_config_info() view_check_interval = view_settings['check_interval'] + system_drives = get_system_drives() analysis = {'files': {}} drives = {'temp': [], 'temp2': [], 'dest': []} jobs = load_jobs(config_jobs) for job in jobs: - drive = job.temporary_directory.split('\\')[0] - drives['temp'].append(drive) directories = { 'dest': job.destination_directory, + 'temp': job.temporary_directory, 'temp2': job.temporary2_directory, } for key, directory_list in directories.items(): if directory_list is None: continue - if isinstance(directory_list, list): - for directory in directory_list: - drive = directory.split('\\')[0] - if drive in drives[key]: - continue - drives[key].append(drive) - else: - drive = directory_list.split('\\')[0] + if not isinstance(directory_list, list): + directory_list = [directory_list] + for directory in directory_list: + drive = identify_drive(file_path=directory, drives=system_drives) if drive in drives[key]: continue drives[key].append(drive) @@ -103,11 +159,16 @@ def view(): try: analysis = analyze_log_dates(log_directory=log_directory, analysis=analysis) jobs = load_jobs(config_jobs) - jobs, running_work = get_running_plots(jobs=jobs, running_work=running_work) + jobs, running_work = get_running_plots(jobs=jobs, running_work=running_work, + instrumentation_settings=instrumentation_settings) check_log_progress(jobs=jobs, running_work=running_work, progress_settings=progress_settings, - notification_settings=notification_settings, view_settings=view_settings) + notification_settings=notification_settings, view_settings=view_settings, + instrumentation_settings=instrumentation_settings) print_view(jobs=jobs, running_work=running_work, analysis=analysis, drives=drives, - next_log_check=datetime.now() + timedelta(seconds=60), view_settings=view_settings) + next_log_check=datetime.now() + timedelta(seconds=view_check_interval), + view_settings=view_settings, loop=loop) + if not loop: + break time.sleep(view_check_interval) has_file = False if len(running_work.values()) == 0: @@ -119,7 +180,7 @@ def view(): break if not has_file: print("Restarting view due to psutil going stale...") - system_args = [f'"{sys.executable}"'] + sys.argv + system_args = ['python'] + sys.argv os.execv(sys.executable, system_args) except KeyboardInterrupt: print("Stopped view.") @@ -127,6 +188,7 @@ def view(): def analyze_logs(): - chia_location, log_directory, jobs, manager_check_interval, max_concurrent, progress_settings, \ - notification_settings, debug_level, view_settings = get_config_info() + chia_location, log_directory, config_jobs, manager_check_interval, max_concurrent, max_for_phase_1, \ + minimum_minutes_between_jobs, progress_settings, notification_settings, debug_level, view_settings, \ + instrumentation_settings = get_config_info() analyze_log_times(log_directory) diff --git a/plotmanager/library/utilities/configuration.py b/plotmanager/library/utilities/configuration.py new file mode 100644 index 0000000..c1b13d2 --- /dev/null +++ b/plotmanager/library/utilities/configuration.py @@ -0,0 +1,50 @@ +import os + +from plotmanager.library.utilities.exceptions import InvalidChiaLocationException, MissingImportError + + +def test_configuration(chia_location, notification_settings, instrumentation_settings): + if not os.path.exists(chia_location): + raise InvalidChiaLocationException('The chia_location in your config.yaml does not exist. Please confirm if ' + 'you have the right version. Also confirm if you have a space after the ' + 'colon. "chia_location: " not "chia_location:"') + + if notification_settings.get('notify_discord'): + try: + import discord_notify + except ImportError: + raise MissingImportError('Failed to find import "discord_notify". Be sure to run "pip install -r ' + 'requirements-notification.txt".') + if notification_settings.get('notify_sound'): + try: + import playsound + except ImportError: + raise MissingImportError('Failed to find import "playsound". Be sure to run "pip install -r ' + 'requirements-notification.txt".') + if notification_settings.get('notify_pushover'): + try: + import pushover + except ImportError: + raise MissingImportError('Failed to find import "pushover". Be sure to run "pip install -r ' + 'requirements-notification.txt".') + + if instrumentation_settings.get('notify_telegram'): + try: + import telegram_notifier + except ImportError: + raise MissingImportError('Failed to find import "telegram_notifier". Be sure to run "pip install -r ' + 'requirements-notification.txt".') + + if instrumentation_settings.get('notify_ifttt'): + try: + import requests + except ImportError: + raise MissingImportError('Failed to find import "requests". Be sure to run "pip install -r ' + 'requirements-notification.txt".') + + if instrumentation_settings.get('prometheus_enabled'): + try: + import prometheus_client + except ImportError: + raise MissingImportError('Failed to find import "prometheus_client". Be sure to run "pip install -r ' + 'requirements-notification.txt".') diff --git a/plotmanager/library/utilities/exceptions.py b/plotmanager/library/utilities/exceptions.py index 53a6749..1109564 100644 --- a/plotmanager/library/utilities/exceptions.py +++ b/plotmanager/library/utilities/exceptions.py @@ -1,9 +1,17 @@ -class InvalidYAMLConfigException(Exception): +class InvalidArgumentException(Exception): pass -class InvalidArgumentException(Exception): +class InvalidChiaLocationException(Exception): + pass + + +class InvalidConfigurationSetting(Exception): + pass + + +class InvalidYAMLConfigException(Exception): pass @@ -11,5 +19,9 @@ class ManagerError(Exception): pass +class MissingImportError(Exception): + pass + + class TerminationException(Exception): pass diff --git a/plotmanager/library/utilities/instrumentation.py b/plotmanager/library/utilities/instrumentation.py new file mode 100644 index 0000000..e4ded8e --- /dev/null +++ b/plotmanager/library/utilities/instrumentation.py @@ -0,0 +1,40 @@ +import socket +import logging + +PROCESSED = False +GAUGE_PLOTS_RUNNING = None +COUNTER_PLOTS_COMPLETED = None + + +def _get_metrics(instrumentation_settings): + global PROCESSED + if instrumentation_settings.get('prometheus_enabled', False) and not PROCESSED: + from prometheus_client import Counter, Gauge, start_http_server + global GAUGE_PLOTS_RUNNING + global COUNTER_PLOTS_COMPLETED + GAUGE_PLOTS_RUNNING = Gauge('chia_running_plots', 'Number of running plots', ['hostname', 'queue']) + COUNTER_PLOTS_COMPLETED = Counter('chia_completed_plots', 'Total completed plots', ['hostname', 'queue']) + port = instrumentation_settings.get('prometheus_port', 9090) + logging.info(f'Prometheus port: {port}') + start_http_server(port) + PROCESSED = True + + +def set_plots_running(total_running_plots, job_name, instrumentation_settings): + _get_metrics(instrumentation_settings=instrumentation_settings) + if instrumentation_settings.get('prometheus_enabled', False) and GAUGE_PLOTS_RUNNING: + logging.info(f'Prometheus: Setting running plots {job_name}') + hostname = socket.gethostname() + GAUGE_PLOTS_RUNNING.labels(hostname=hostname, queue=job_name).set(total_running_plots) + else: + logging.debug('Prometheus instrumentation not enabled') + + +def increment_plots_completed(increment, job_name, instrumentation_settings): + _get_metrics(instrumentation_settings=instrumentation_settings) + if instrumentation_settings.get('prometheus_enabled') and COUNTER_PLOTS_COMPLETED: + logging.info(f'Prometheus: Incrementing plots {job_name}') + hostname = socket.gethostname() + COUNTER_PLOTS_COMPLETED.labels(hostname=hostname, queue=job_name).inc(increment) + else: + logging.debug('Prometheus instrumentation not enabled') diff --git a/plotmanager/library/utilities/jobs.py b/plotmanager/library/utilities/jobs.py index 87572a6..07a7c97 100644 --- a/plotmanager/library/utilities/jobs.py +++ b/plotmanager/library/utilities/jobs.py @@ -5,53 +5,112 @@ from datetime import datetime, timedelta from plotmanager.library.commands import plots -from plotmanager.library.utilities.processes import is_windows, start_process +from plotmanager.library.utilities.exceptions import InvalidConfigurationSetting +from plotmanager.library.utilities.processes import identify_drive, is_windows, start_process from plotmanager.library.utilities.objects import Job, Work from plotmanager.library.utilities.log import get_log_file_name def has_active_jobs_and_work(jobs): for job in jobs: - if job.total_completed < job.max_plots: + if job.total_kicked_off < job.max_plots: return True return False -def get_target_directories(job): +def get_target_directories(job, drives_free_space): job_offset = job.total_completed + job.total_running + if job.skip_full_destinations: + logging.info('Checking for full destinations.') + job = check_valid_destinations(job, drives_free_space) destination_directory = job.destination_directory + temporary_directory = job.temporary_directory temporary2_directory = job.temporary2_directory + if not destination_directory: + return None, None, None, job + if isinstance(job.destination_directory, list): destination_directory = job.destination_directory[job_offset % len(job.destination_directory)] + if isinstance(job.temporary_directory, list): + temporary_directory = job.temporary_directory[job_offset % len(job.temporary_directory)] if isinstance(job.temporary2_directory, list): temporary2_directory = job.temporary2_directory[job_offset % len(job.temporary2_directory)] - return destination_directory, temporary2_directory + return destination_directory, temporary_directory, temporary2_directory, job + + +def check_valid_destinations(job, drives_free_space): + job_size = determine_job_size(job.size) + drives = list(drives_free_space.keys()) + destination_directories = job.destination_directory + if not isinstance(destination_directories, list): + destination_directories = [destination_directories] + + valid_destinations = [] + for directory in destination_directories: + drive = identify_drive(file_path=directory, drives=drives) + logging.info(f'Drive "{drive}" has {drives_free_space[drive]} free space.') + if drives_free_space[drive] is None or drives_free_space[drive] >= job_size: + valid_destinations.append(directory) + continue + logging.error(f'Drive "{drive}" does not have enough space. This directory will be skipped.') + + if not valid_destinations: + job.max_plots = 0 + logging.error(f'Job "{job.name}" has no more destination directories with enough space for more work.') + job.destination_directory = valid_destinations + return job + def load_jobs(config_jobs): jobs = [] + checked_job_names = [] + checked_temporary_directories = [] for info in config_jobs: job = deepcopy(Job()) job.total_running = 0 job.name = info['name'] + if job.name in checked_job_names: + raise InvalidConfigurationSetting(f'Found the same job name for multiple jobs. Job names should be unique. ' + f'Duplicate: {job.name}') + checked_job_names.append(info['name']) job.max_plots = info['max_plots'] job.farmer_public_key = info.get('farmer_public_key', None) job.pool_public_key = info.get('pool_public_key', None) job.max_concurrent = info['max_concurrent'] job.max_concurrent_with_start_early = info['max_concurrent_with_start_early'] + + if job.max_concurrent_with_start_early < job.max_concurrent: + raise InvalidConfigurationSetting('Your "max_concurrent_with_start_early" value must be greater than or ' + 'equal to your "max_concurrent" value.') + job.max_for_phase_1 = info['max_for_phase_1'] + job.initial_delay_minutes = info.get('initial_delay_minutes', 0) + if not job.initial_delay_minutes: + job.initial_delay_minutes = 0 job.stagger_minutes = info.get('stagger_minutes', None) job.max_for_phase_1 = info.get('max_for_phase_1', None) job.concurrency_start_early_phase = info.get('concurrency_start_early_phase', None) job.concurrency_start_early_phase_delay = info.get('concurrency_start_early_phase_delay', None) job.temporary2_destination_sync = info.get('temporary2_destination_sync', False) - - job.temporary_directory = info['temporary_directory'] + job.exclude_final_directory = info.get('exclude_final_directory', False) + job.skip_full_destinations = info.get('skip_full_destinations', True) + + temporary_directory = info['temporary_directory'] + if not isinstance(temporary_directory, list): + temporary_directory = [temporary_directory] + for directory in temporary_directory: + if directory not in checked_temporary_directories: + checked_temporary_directories.append(directory) + continue + raise InvalidConfigurationSetting(f'You cannot use the same temporary directory for more than one job: ' + f'{directory}') + job.temporary_directory = temporary_directory job.destination_directory = info['destination_directory'] temporary2_directory = info.get('temporary2_directory', None) @@ -64,18 +123,86 @@ def load_jobs(config_jobs): job.threads = info['threads'] job.buckets = info['buckets'] job.memory_buffer = info['memory_buffer'] + + job.unix_process_priority = info.get('unix_process_priority', 10) + if not -20 <= job.unix_process_priority <= 20: + raise InvalidConfigurationSetting('UNIX Process Priority must be between -20 and 19.') + job.windows_process_priority = info.get('windows_process_priority', 32) + if job.windows_process_priority not in [64, 16384, 32, 32768, 128, 256]: + raise InvalidConfigurationSetting('Windows Process Priority must any of the following: [64, 16384, 32, ' + '32768, 128, 256]. Please view README for more details. If you don\'t ' + 'know what you are doing, please use 32.') + + job.enable_cpu_affinity = info.get('enable_cpu_affinity', False) + if job.enable_cpu_affinity: + job.cpu_affinity = info['cpu_affinity'] + jobs.append(job) return jobs -def monitor_jobs_to_start(jobs, running_work, max_concurrent, next_job_work, chia_location, log_directory, next_log_check): +def determine_job_size(k_size): + try: + k_size = int(k_size) + except ValueError: + return 0 + base_k_size = 32 + size = 109000000000 + if k_size < base_k_size: + # Why 2.058? Just some quick math. + size /= pow(2.058, base_k_size-k_size) + if k_size > base_k_size: + # Why 2.06? Just some quick math from my current plots. + size *= pow(2.06, k_size-base_k_size) + return size + + +def monitor_jobs_to_start(jobs, running_work, max_concurrent, max_for_phase_1, next_job_work, chia_location, + log_directory, next_log_check, minimum_minutes_between_jobs, system_drives): + drives_free_space = {} + for job in jobs: + directories = [job.destination_directory] + if isinstance(job.destination_directory, list): + directories = job.destination_directory + for directory in directories: + drive = identify_drive(file_path=directory, drives=system_drives) + if drive in drives_free_space: + continue + try: + free_space = psutil.disk_usage(drive).free + except: + logging.exception(f"Failed to get disk_usage of drive {drive}.") + # I need to do this because if Manager fails, I don't want it to break. + free_space = None + drives_free_space[drive] = free_space + + logging.info(f'Free space before checking active jobs: {drives_free_space}') + for pid, work in running_work.items(): + drive = work.destination_drive + if drive not in drives_free_space or drives_free_space[drive] is None: + continue + work_size = determine_job_size(work.k_size) + drives_free_space[drive] -= work_size + logging.info(drive) + logging.info(f'Free space after checking active jobs: {drives_free_space}') + + total_phase_1_count = 0 + for pid in running_work.keys(): + if running_work[pid].current_phase > 1: + continue + total_phase_1_count += 1 + for i, job in enumerate(jobs): logging.info(f'Checking to queue work for job: {job.name}') if len(running_work.values()) >= max_concurrent: logging.info(f'Global concurrent limit met, skipping. Running plots: {len(running_work.values())}, ' f'Max global concurrent limit: {max_concurrent}') continue + if total_phase_1_count >= max_for_phase_1: + logging.info(f'Global max for phase 1 limit has been met, skipping. Count: {total_phase_1_count}, ' + f'Setting Max: {max_for_phase_1}') + continue phase_1_count = 0 for pid in job.running_work: if running_work[pid].current_phase > 1: @@ -83,11 +210,11 @@ def monitor_jobs_to_start(jobs, running_work, max_concurrent, next_job_work, chi phase_1_count += 1 logging.info(f'Total jobs in phase 1: {phase_1_count}') if job.max_for_phase_1 and phase_1_count >= job.max_for_phase_1: - logging.info(f'Max for phase 1 met, skipping. Max: {job.max_for_phase_1}') + logging.info(f'Job max for phase 1 met, skipping. Max: {job.max_for_phase_1}') continue - if job.total_completed >= job.max_plots: - logging.info(f'Job\'s total completed greater than or equal to max plots, skipping. Total Completed: ' - f'{job.total_completed}, Max Plots: {job.max_plots}') + if job.total_kicked_off >= job.max_plots: + logging.info(f'Job\'s total kicked off greater than or equal to max plots, skipping. Total Kicked Off: ' + f'{job.total_kicked_off}, Max Plots: {job.max_plots}') continue if job.name in next_job_work and next_job_work[job.name] > datetime.now(): logging.info(f'Waiting for job stagger, skipping. Next allowable time: {next_job_work[job.name]}') @@ -116,24 +243,49 @@ def monitor_jobs_to_start(jobs, running_work, max_concurrent, next_job_work, chi if job.stagger_minutes: next_job_work[job.name] = datetime.now() + timedelta(minutes=job.stagger_minutes) logging.info(f'Calculating new job stagger time. Next stagger kickoff: {next_job_work[job.name]}') - job, work = start_work(job=job, chia_location=chia_location, log_directory=log_directory) + if minimum_minutes_between_jobs: + logging.info(f'Setting a minimum stagger for all jobs. {minimum_minutes_between_jobs}') + minimum_stagger = datetime.now() + timedelta(minutes=minimum_minutes_between_jobs) + for j in jobs: + if next_job_work[j.name] > minimum_stagger: + logging.info(f'Skipping stagger for {j.name}. Stagger is larger than minimum_minutes_between_jobs. ' + f'Min: {minimum_stagger}, Current: {next_job_work[j.name]}') + continue + logging.info(f'Setting a new stagger for {j.name}. minimum_minutes_between_jobs is larger than assigned ' + f'stagger. Min: {minimum_stagger}, Current: {next_job_work[j.name]}') + next_job_work[j.name] = minimum_stagger + + job, work = start_work( + job=job, + chia_location=chia_location, + log_directory=log_directory, + drives_free_space=drives_free_space, + ) jobs[i] = deepcopy(job) + if work is None: + continue + total_phase_1_count += 1 next_log_check = datetime.now() running_work[work.pid] = work return jobs, running_work, next_job_work, next_log_check -def start_work(job, chia_location, log_directory): +def start_work(job, chia_location, log_directory, drives_free_space): logging.info(f'Starting new plot for job: {job.name}') - nice_val = 10 + nice_val = job.unix_process_priority if is_windows(): - nice_val = psutil.NORMAL_PRIORITY_CLASS + nice_val = job.windows_process_priority now = datetime.now() log_file_path = get_log_file_name(log_directory, job, now) logging.info(f'Job log file path: {log_file_path}') - destination_directory, temporary2_directory = get_target_directories(job) + destination_directory, temporary_directory, temporary2_directory, job = \ + get_target_directories(job, drives_free_space=drives_free_space) + if not destination_directory: + return job, None + + logging.info(f'Job temporary directory: {temporary_directory}') logging.info(f'Job destination directory: {destination_directory}') work = deepcopy(Work()) @@ -155,12 +307,13 @@ def start_work(job, chia_location, log_directory): pool_public_key=job.pool_public_key, size=job.size, memory_buffer=job.memory_buffer, - temporary_directory=job.temporary_directory, + temporary_directory=temporary_directory, temporary2_directory=temporary2_directory, destination_directory=destination_directory, threads=job.threads, buckets=job.buckets, bitfield=job.bitfield, + exclude_final_directory=job.exclude_final_directory, ) logging.info(f'Starting with plot command: {plot_command}') @@ -173,9 +326,14 @@ def start_work(job, chia_location, log_directory): logging.info(f'Setting priority level: {nice_val}') psutil.Process(pid).nice(nice_val) logging.info(f'Set priority level') + if job.enable_cpu_affinity: + logging.info(f'Setting process cpu affinity: {job.cpu_affinity}') + psutil.Process(pid).cpu_affinity(job.cpu_affinity) + logging.info(f'Set process cpu affinity') work.pid = pid job.total_running += 1 + job.total_kicked_off += 1 job.running_work = job.running_work + [pid] logging.info(f'Job total running: {job.total_running}') logging.info(f'Job running: {job.running_work}') diff --git a/plotmanager/library/utilities/log.py b/plotmanager/library/utilities/log.py index eb0f992..3944c8b 100644 --- a/plotmanager/library/utilities/log.py +++ b/plotmanager/library/utilities/log.py @@ -5,6 +5,7 @@ import re import socket +from plotmanager.library.utilities.instrumentation import increment_plots_completed from plotmanager.library.utilities.notifications import send_notifications from plotmanager.library.utilities.print import pretty_print_time @@ -154,7 +155,8 @@ def get_progress(line_count, progress_settings): return progress -def check_log_progress(jobs, running_work, progress_settings, notification_settings, view_settings): +def check_log_progress(jobs, running_work, progress_settings, notification_settings, view_settings, + instrumentation_settings): for pid, work in list(running_work.items()): logging.info(f'Checking log progress for PID: {pid}') if not work.log_file: @@ -191,6 +193,7 @@ def check_log_progress(jobs, running_work, progress_settings, notification_setti job.running_work.remove(pid) job.total_running -= 1 job.total_completed += 1 + increment_plots_completed(increment=1, job_name=job.name, instrumentation_settings=instrumentation_settings) send_notifications( title='Plot Completed', diff --git a/plotmanager/library/utilities/notifications.py b/plotmanager/library/utilities/notifications.py index 002868b..b1bc8f9 100644 --- a/plotmanager/library/utilities/notifications.py +++ b/plotmanager/library/utilities/notifications.py @@ -1,19 +1,26 @@ -import discord_notify -import playsound -import pushover - - def _send_notifications(title, body, settings): if settings.get('notify_discord') is True: + import discord_notify notifier = discord_notify.Notifier(settings.get('discord_webhook_url')) notifier.send(body, print_message=False) if settings.get('notify_sound') is True: + import playsound playsound.playsound(settings.get('song')) if settings.get('notify_pushover') is True: + import pushover client = pushover.Client(settings.get('pushover_user_key'), api_token=settings.get('pushover_api_key')) client.send_message(body, title=title) + + if settings.get('notify_telegram') is True: + import telegram_notifier + notifier = telegram_notifier.TelegramNotifier(settings.get('telegram_token'), parse_mode="HTML") + notifier.send(body) + + if settings.get('notify_ifttt') is True: + import requests + requests.post(settings.get('ifttt_webhook_url'), data={'value1': title, 'value2': body}) def send_notifications(title, body, settings): diff --git a/plotmanager/library/utilities/objects.py b/plotmanager/library/utilities/objects.py index 6c7b3d8..b9736cf 100644 --- a/plotmanager/library/utilities/objects.py +++ b/plotmanager/library/utilities/objects.py @@ -6,12 +6,14 @@ class Job: pool_public_key = None total_running = 0 + total_kicked_off = 0 total_completed = 0 max_concurrent = 0 max_concurrent_with_start_early = 0 max_plots = 0 temporary2_destination_sync = None + initial_delay_minutes = None stagger_minutes = None max_for_phase_1 = None concurrency_start_early_phase = None @@ -22,12 +24,20 @@ class Job: temporary_directory = None temporary2_directory = None destination_directory = [] + exclude_final_directory = None + skip_full_destinations = None size = None bitfield = None threads = None buckets = None memory_buffer = None + unix_process_priority = 10 + windows_process_priority = 32 + + enable_cpu_affinity = False + cpu_affinity = [] + class Work: work_id = None diff --git a/plotmanager/library/utilities/print.py b/plotmanager/library/utilities/print.py index ef6cf4c..669ff90 100644 --- a/plotmanager/library/utilities/print.py +++ b/plotmanager/library/utilities/print.py @@ -1,17 +1,21 @@ import os import psutil +import json from datetime import datetime, timedelta -from plotmanager.library.utilities.processes import get_manager_processes, get_chia_drives +from plotmanager.library.utilities.processes import get_manager_processes -def _get_row_info(pid, running_work, view_settings): +def _get_row_info(pid, running_work, view_settings, as_raw_values=False): work = running_work[pid] phase_times = work.phase_times elapsed_time = (datetime.now() - work.datetime_start) - elapsed_time = pretty_print_time(elapsed_time.seconds) + elapsed_time = pretty_print_time(elapsed_time.seconds + elapsed_time.days * 86400) phase_time_log = [] + plot_id_prefix = '' + if work.plot_id: + plot_id_prefix = work.plot_id[0:7] for i in range(1, 5): if phase_times.get(i): phase_time_log.append(phase_times.get(i)) @@ -19,6 +23,7 @@ def _get_row_info(pid, running_work, view_settings): row = [ work.job.name if work.job else '?', work.k_size, + plot_id_prefix, pid, work.datetime_start.strftime(view_settings['datetime_format']), elapsed_time, @@ -27,7 +32,9 @@ def _get_row_info(pid, running_work, view_settings): work.progress, pretty_print_bytes(work.temp_file_size, 'gb', 0, " GiB"), ] - return [str(cell) for cell in row] + if not as_raw_values: + return [str(cell) for cell in row] + return row def pretty_print_bytes(size, size_type, significant_digits=2, suffix=''): @@ -66,52 +73,126 @@ def pretty_print_table(rows): return "\n".join(console) -def get_job_data(jobs, running_work, view_settings): +def get_job_data(jobs, running_work, view_settings, as_json=False): rows = [] - headers = ['num', 'job', 'k', 'pid', 'start', 'elapsed_time', 'phase', 'phase_times', 'progress', 'temp_size'] added_pids = [] for job in jobs: for pid in job.running_work: if pid not in running_work: continue - rows.append(_get_row_info(pid, running_work, view_settings)) + rows.append(_get_row_info(pid, running_work, view_settings, as_json)) added_pids.append(pid) for pid in running_work.keys(): if pid in added_pids: continue - rows.append(_get_row_info(pid, running_work, view_settings)) + rows.append(_get_row_info(pid, running_work, view_settings, as_json)) added_pids.append(pid) - rows.sort(key=lambda x: (x[4]), reverse=True) + rows.sort(key=lambda x: (x[5]), reverse=True) for i in range(len(rows)): rows[i] = [str(i+1)] + rows[i] - rows = [headers] + rows + if as_json: + jobs = dict(jobs=rows) + print(json.dumps(jobs, separators=(',', ':'))) + return jobs + return rows + + +def pretty_print_job_data(job_data): + headers = ['num', 'job', 'k', 'plot_id', 'pid', 'start', 'elapsed_time', 'phase', 'phase_times', 'progress', 'temp_size'] + rows = [headers] + job_data return pretty_print_table(rows) -def get_drive_data(drives): - chia_drives = get_chia_drives() - headers = ['type', 'drive', 'used', 'total', 'percent', 'plots'] - rows = [headers] - for drive_type, drives in drives.items(): - for drive in drives: +def get_drive_data(drives, running_work, job_data): + headers = ['type', 'drive', 'used', 'total', '%', '#', 'temp', 'dest'] + rows = [] + + pid_to_num = {} + for job in job_data: + pid_to_num[job[4]] = job[0] + + drive_types = {} + has_temp2 = False + for drive_type, all_drives in drives.items(): + for drive in all_drives: + if drive in drive_types: + drive_type_list = drive_types[drive] + else: + drive_type_list = ['-', '-', '-'] + if drive_type == 'temp': + drive_type_list[0] = 't' + elif drive_type == 'temp2': + has_temp2 = True + drive_type_list[1] = '2' + elif drive_type == 'dest': + drive_type_list[2] = 'd' + else: + raise Exception(f'Invalid drive type: {drive_type}') + drive_types[drive] = drive_type_list + + checked_drives = [] + for all_drives in drives.values(): + for drive in all_drives: + if drive in checked_drives: + continue + checked_drives.append(drive) + temp, temp2, dest = [], [], [] + for job in running_work: + if running_work[job].temporary_drive == drive: + temp.append(pid_to_num[str(running_work[job].pid)]) + if running_work[job].temporary2_drive == drive: + temp2.append(pid_to_num[str(running_work[job].pid)]) + if running_work[job].destination_drive == drive: + dest.append(pid_to_num[str(running_work[job].pid)]) + try: usage = psutil.disk_usage(drive) - except FileNotFoundError: + except (FileNotFoundError, TypeError): continue - rows.append([drive_type, drive, f'{pretty_print_bytes(usage.used, "tb", 2, "TiB")}', - f'{pretty_print_bytes(usage.total, "tb", 2, "TiB")}', f'{usage.percent}%', - str(chia_drives[drive_type].get(drive, '?'))]) + + counts = ['-', '-', '-'] + if temp: + counts[0] = str(len(temp)) + if temp2: + counts[1] = str(len(temp2)) + if dest: + counts[2] = str(len(dest)) + if not has_temp2: + del counts[1] + del drive_types[drive][1] + drive_type = '/'.join(drive_types[drive]) + + row = [ + drive_type, + drive, + f'{pretty_print_bytes(usage.used, "tb", 2, "TiB")}', + f'{pretty_print_bytes(usage.total, "tb", 2, "TiB")}', + f'{usage.percent}%', + '/'.join(counts), + '/'.join(temp), + '/'.join(dest), + ] + if has_temp2: + row.insert(-1, '/'.join(temp2)) + rows.append(row) + if has_temp2: + headers.insert(-1, 'temp2') + rows = [headers] + rows return pretty_print_table(rows) -def print_view(jobs, running_work, analysis, drives, next_log_check, view_settings): +def print_json(jobs, running_work, view_settings): + get_job_data(jobs=jobs, running_work=running_work, view_settings=view_settings, as_json=True) + + +def print_view(jobs, running_work, analysis, drives, next_log_check, view_settings, loop): # Job Table job_data = get_job_data(jobs=jobs, running_work=running_work, view_settings=view_settings) # Drive Table drive_data = '' if view_settings.get('include_drive_info'): - drive_data = get_drive_data(drives) + drive_data = get_drive_data(drives, running_work, job_data) manager_processes = get_manager_processes() @@ -119,7 +200,7 @@ def print_view(jobs, running_work, analysis, drives, next_log_check, view_settin os.system('cls') else: os.system('clear') - print(job_data) + print(pretty_print_job_data(job_data)) print(f'Manager Status: {"Running" if manager_processes else "Stopped"}') print() @@ -136,5 +217,6 @@ def print_view(jobs, running_work, analysis, drives, next_log_check, view_settin print(f'Plots Completed Yesterday: {analysis["summary"].get(datetime.now().date() - timedelta(days=1), 0)}') print(f'Plots Completed Today: {analysis["summary"].get(datetime.now().date(), 0)}') print() - print(f"Next log check at {next_log_check.strftime('%Y-%m-%d %H:%M:%S')}") + if loop: + print(f"Next log check at {next_log_check.strftime('%Y-%m-%d %H:%M:%S')}") print() diff --git a/plotmanager/library/utilities/processes.py b/plotmanager/library/utilities/processes.py index 898cb88..f1da7cd 100644 --- a/plotmanager/library/utilities/processes.py +++ b/plotmanager/library/utilities/processes.py @@ -9,6 +9,7 @@ from datetime import datetime from plotmanager.library.utilities.objects import Work +from plotmanager.library.utilities.instrumentation import set_plots_running def _contains_in_list(string, lst, case_insensitive=False): @@ -33,7 +34,7 @@ def get_manager_processes(): not _contains_in_list('stateless-manager.py', process.cmdline()): continue processes.append(process) - except psutil.NoSuchProcess: + except (psutil.NoSuchProcess, psutil.AccessDenied): pass return processes @@ -91,12 +92,12 @@ def get_chia_drives(): try: if chia_executable_name not in process.name() and 'python' not in process.name().lower(): continue - except psutil.AccessDenied: + except (psutil.AccessDenied, psutil.NoSuchProcess): continue try: if 'plots' not in process.cmdline() or 'create' not in process.cmdline(): continue - except psutil.ZombieProcess: + except (psutil.ZombieProcess, psutil.NoSuchProcess): continue commands = process.cmdline() temporary_drive, temporary2_drive, destination_drive = get_plot_drives(commands=commands) @@ -119,7 +120,7 @@ def get_chia_drives(): def get_system_drives(): drives = [] - for disk in psutil.disk_partitions(): + for disk in psutil.disk_partitions(all=True): drive = disk.mountpoint if is_windows(): drive = os.path.splitdrive(drive)[0] @@ -169,7 +170,7 @@ def get_temp_size(plot_id, temporary_directory, temporary2_directory): return temp_size -def get_running_plots(jobs, running_work): +def get_running_plots(jobs, running_work, instrumentation_settings): chia_processes = [] logging.info(f'Getting running plots') chia_executable_name = get_chia_executable_name() @@ -177,12 +178,12 @@ def get_running_plots(jobs, running_work): try: if chia_executable_name not in process.name() and 'python' not in process.name().lower(): continue - except psutil.AccessDenied: + except (psutil.AccessDenied, psutil.NoSuchProcess): continue try: if 'plots' not in process.cmdline() or 'create' not in process.cmdline(): continue - except psutil.ZombieProcess: + except (psutil.ZombieProcess, psutil.NoSuchProcess): continue if process.parent(): try: @@ -199,35 +200,33 @@ def get_running_plots(jobs, running_work): for datetime_start, process in chia_processes: logging.info(f'Finding log file for process: {process.pid}') log_file_path = None + commands = [] try: + commands = process.cmdline() for file in process.open_files(): if '.mui' == file.path[-4:]: continue if file.path[-4:] not in ['.log', '.txt']: continue + if file.path[-9:] == 'debug.log': + continue log_file_path = file.path logging.info(f'Found log file: {log_file_path}') break except (psutil.AccessDenied, RuntimeError): logging.info(f'Failed to find log file: {process.pid}') + except psutil.NoSuchProcess: + continue assumed_job = None logging.info(f'Finding associated job') - temporary_directory, temporary2_directory, destination_directory = get_plot_directories(commands=process.cmdline()) + temporary_directory, temporary2_directory, destination_directory = get_plot_directories(commands=commands) for job in jobs: - if temporary_directory != job.temporary_directory: + if isinstance(job.temporary_directory, list) and temporary_directory not in job.temporary_directory: continue - if destination_directory not in job.destination_directory: + if not isinstance(job.temporary_directory, list) and temporary_directory != job.temporary_directory: continue - if temporary2_directory: - job_temporary2_directory = job.temporary2_directory - if not isinstance(job.temporary2_directory, list): - job_temporary2_directory = [job.temporary2_directory] - if job.temporary2_destination_sync and temporary2_directory != destination_directory: - continue - if not job.temporary2_destination_sync and temporary2_directory not in job_temporary2_directory: - continue logging.info(f'Found job: {job.name}') assumed_job = job break @@ -239,8 +238,8 @@ def get_running_plots(jobs, running_work): temp_file_size = get_temp_size(plot_id=plot_id, temporary_directory=temporary_directory, temporary2_directory=temporary2_directory) - temporary_drive, temporary2_drive, destination_drive = get_plot_drives(commands=process.cmdline()) - k_size = get_plot_k_size(commands=process.cmdline()) + temporary_drive, temporary2_drive, destination_drive = get_plot_drives(commands=commands) + k_size = get_plot_k_size(commands=commands) work = deepcopy(Work()) work.job = assumed_job work.log_file = log_file_path @@ -252,6 +251,8 @@ def get_running_plots(jobs, running_work): work.work_id = assumed_job.current_work_id assumed_job.current_work_id += 1 assumed_job.total_running += 1 + set_plots_running(total_running_plots=assumed_job.total_running, job_name=assumed_job.name, + instrumentation_settings=instrumentation_settings) assumed_job.running_work = assumed_job.running_work + [process.pid] work.temporary_drive = temporary_drive work.temporary2_drive = temporary2_drive diff --git a/requirements-notification.txt b/requirements-notification.txt new file mode 100644 index 0000000..e60240a --- /dev/null +++ b/requirements-notification.txt @@ -0,0 +1,6 @@ +discord_notify==1.0.0 +playsound==1.2.2 +prometheus-client==0.10.1 +python_pushover==0.4 +telegram_notifier==0.2 +requests==2.25.1 diff --git a/requirements.txt b/requirements.txt index 839726f..9c26cfd 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1,6 +1,3 @@ dateparser==1.0.0 -discord_notify==1.0.0 -playsound==1.2.2 psutil==5.8.0 -python_pushover==0.4 PyYAML==5.4.1 diff --git a/stateless-manager.py b/stateless-manager.py index 1ec1cec..b79d5d0 100644 --- a/stateless-manager.py +++ b/stateless-manager.py @@ -6,11 +6,12 @@ from plotmanager.library.parse.configuration import get_config_info from plotmanager.library.utilities.jobs import has_active_jobs_and_work, load_jobs, monitor_jobs_to_start from plotmanager.library.utilities.log import check_log_progress -from plotmanager.library.utilities.processes import get_running_plots +from plotmanager.library.utilities.processes import get_running_plots, get_system_drives -chia_location, log_directory, config_jobs, manager_check_interval, max_concurrent, progress_settings, \ - notification_settings, debug_level, view_settings = get_config_info() +chia_location, log_directory, config_jobs, manager_check_interval, max_concurrent, max_for_phase_1, \ + minimum_minutes_between_jobs, progress_settings, notification_settings, debug_level, view_settings, \ + instrumentation_settings = get_config_info() logging.basicConfig(format='%(asctime)s [%(levelname)s]: %(message)s', datefmt='%Y-%m-%d %H:%M:%S', level=debug_level) @@ -20,9 +21,12 @@ logging.info(f'Jobs: {config_jobs}') logging.info(f'Manager Check Interval: {manager_check_interval}') logging.info(f'Max Concurrent: {max_concurrent}') +logging.info(f'Max for Phase 1: {max_for_phase_1}') +logging.info(f'Minimum Minutes between Jobs: {minimum_minutes_between_jobs}') logging.info(f'Progress Settings: {progress_settings}') logging.info(f'Notification Settings: {notification_settings}') logging.info(f'View Settings: {view_settings}') +logging.info(f'Instrumentation Settings: {instrumentation_settings}') logging.info(f'Loading jobs into objects.') jobs = load_jobs(config_jobs) @@ -31,26 +35,58 @@ next_job_work = {} running_work = {} +logging.info(f'Grabbing system drives.') +system_drives = get_system_drives() +logging.info(f"Found System Drives: {system_drives}") + logging.info(f'Grabbing running plots.') -jobs, running_work = get_running_plots(jobs, running_work) +jobs, running_work = get_running_plots(jobs=jobs, running_work=running_work, + instrumentation_settings=instrumentation_settings) for job in jobs: + next_job_work[job.name] = datetime.now() max_date = None for pid in job.running_work: work = running_work[pid] start = work.datetime_start if not max_date or start > max_date: max_date = start + initial_delay_date = datetime.now() + timedelta(minutes=job.initial_delay_minutes) + if job.initial_delay_minutes: + next_job_work[job.name] = initial_delay_date if not max_date: continue - next_job_work[job.name] = max_date + timedelta(minutes=job.stagger_minutes) + max_date = max_date + timedelta(minutes=job.stagger_minutes) + if job.initial_delay_minutes and initial_delay_date > max_date: + logging.info(f'{job.name} Found. Setting initial dalay date to {next_job_work[job.name]} which is ' + f'{job.initial_delay_minutes} minutes.') + continue + next_job_work[job.name] = max_date logging.info(f'{job.name} Found. Setting next stagger date to {next_job_work[job.name]}') +if minimum_minutes_between_jobs and len(running_work.keys()) > 0: + logging.info(f'Checking to see if stagger needs to be altered due to minimum_minutes_between_jobs. ' + f'Value: {minimum_minutes_between_jobs}') + maximum_start_date = max([work.datetime_start for work in running_work.values()]) + minimum_stagger = maximum_start_date + timedelta(minutes=minimum_minutes_between_jobs) + logging.info(f'All dates: {[work.datetime_start for work in running_work.values()]}') + logging.info(f'Calculated Latest Job Start Date: {maximum_start_date}') + logging.info(f'Calculated Minimum Stagger: {minimum_stagger}') + for job_name in next_job_work: + if next_job_work[job_name] > minimum_stagger: + logging.info(f'Skipping stagger for {job_name}. Stagger is larger than minimum_minutes_between_jobs. ' + f'Minimum: {minimum_stagger}, Current: {next_job_work[job_name]}') + continue + next_job_work[job_name] = minimum_stagger + logging.info(f'Setting a new stagger for {job_name}. minimum_minutes_between_jobs is larger than assigned ' + f'stagger. Minimum: {minimum_stagger}, Current: {next_job_work[job_name]}') + logging.info(f'Starting loop.') while has_active_jobs_and_work(jobs): # CHECK LOGS FOR DELETED WORK logging.info(f'Checking log progress..') check_log_progress(jobs=jobs, running_work=running_work, progress_settings=progress_settings, - notification_settings=notification_settings, view_settings=view_settings) + notification_settings=notification_settings, view_settings=view_settings, + instrumentation_settings=instrumentation_settings) next_log_check = datetime.now() + timedelta(seconds=manager_check_interval) # DETERMINE IF JOB NEEDS TO START @@ -59,11 +95,16 @@ jobs=jobs, running_work=running_work, max_concurrent=max_concurrent, + max_for_phase_1=max_for_phase_1, next_job_work=next_job_work, chia_location=chia_location, log_directory=log_directory, next_log_check=next_log_check, + minimum_minutes_between_jobs=minimum_minutes_between_jobs, + system_drives=system_drives, ) logging.info(f'Sleeping for {manager_check_interval} seconds.') time.sleep(manager_check_interval) + +logging.info(f'Manager has exited loop because there are no more active jobs.')