|
| 1 | +# Troubleshooting |
| 2 | + |
| 3 | +## Where is ...? |
| 4 | + |
| 5 | +### Instrument-specific plans & devices |
| 6 | + |
| 7 | +This is located in each instrument's `inst` script area, with their other instrument-specific scripts. |
| 8 | + |
| 9 | +`c:\Instrument\Settings\config\NDX<inst>\configurations\Python\inst\bluesky`. |
| 10 | + |
| 11 | +Instrument-specific bluesky plans will typically be defined in the `plans` module in the above area, and devices will |
| 12 | +be defined in a `devices` module. |
| 13 | + |
| 14 | +### This library (`ibex_bluesky_core`) |
| 15 | + |
| 16 | +In a clean installation of IBEX, `ibex_bluesky_core` is installed as a python module via pip into the |
| 17 | +[uktena](https://github.com/IsisComputingGroup/uktena) environment which is in `c:\instrument\apps\python3`. |
| 18 | + |
| 19 | +However, for instruments which have been testing bluesky and may have versions of `ibex_bluesky_core` with local |
| 20 | +modifications, a version of `ibex_bluesky_core` may be checked out into `c:\instrument\dev\ibex_bluesky_core` on the |
| 21 | +instrument, and this will have been editable-installed into the `uktena` environment. |
| 22 | + |
| 23 | +### Other bluesky libraries (`ophyd_async`, `bluesky`, ...) |
| 24 | + |
| 25 | +`ophyd_async` and `bluesky` are installed as python modules. via pip, into the |
| 26 | +[uktena](https://github.com/IsisComputingGroup/uktena) environment which is in `c:\instrument\apps\python3` because |
| 27 | +they are specified in `pyproject.toml` of `ibex_bluesky_core`, which itself is a dependency of `uktena`. |
| 28 | + |
| 29 | +### Logs |
| 30 | + |
| 31 | +Log files are stored in `C:\Instrument\Var\logs\bluesky`. |
| 32 | + |
| 33 | +```{note} |
| 34 | +Older log files will be moved by the log rotation script to |
| 35 | +`<isis share>\inst$\Backups$\stage-deleted\ndx<instrument>\Instrument\Var\logs\bluesky` |
| 36 | +``` |
| 37 | + |
| 38 | +The default log level is `INFO` and all messages from `ibex_bluesky_core`, `bluesky` and `ophyd_async` are captured. |
| 39 | + |
| 40 | +If you need to increase this to `DEBUG` to isolate an issue, you can do so using |
| 41 | +{py:obj}`ibex_bluesky_core.log.set_bluesky_log_levels`. See [logging documentation](logging.md) for a full example |
| 42 | +showing how to do this. |
| 43 | + |
| 44 | +### Scientist-facing data |
| 45 | + |
| 46 | +Scientist-facing output files are written to `<isis share>\inst$\NDX<inst>\user\test\scans\<current rb number>` by |
| 47 | +default. |
| 48 | + |
| 49 | +Custom file-output paths can be specified by passing extra arguments to |
| 50 | +{py:obj}`HumanReadableFileCallback<ibex_bluesky_core.callbacks.file_logger.HumanReadableFileCallback>` |
| 51 | +for the "human-readable" files, or |
| 52 | +{py:obj}`LiveFitLogger<ibex_bluesky_core.callbacks.fitting.livefit_logger.LiveFitLogger>` |
| 53 | +for the fit output files. These callbacks may be hidden behind |
| 54 | +{py:obj}`ISISCallbacks<ibex_bluesky_core.callbacks.ISISCallbacks>` which also allows specifying output paths. |
| 55 | + |
| 56 | +See [dae_scan manual system test](https://github.com/ISISComputingGroup/ibex_bluesky_core/blob/0.2.1/manual_system_tests/dae_scan.py#L83) |
| 57 | +for an example of how to configure these output paths using {py:obj}`ISISCallbacks<ibex_bluesky_core.callbacks.ISISCallbacks>`. |
| 58 | + |
| 59 | +### Raw diagnostic data |
| 60 | + |
| 61 | +Raw documents emitted by bluesky are stored in `C:\Instrument\Var\logs\bluesky\raw_documents` - these show the raw |
| 62 | +data emitted by bluesky scans. The filenames in this directory correspond to bluesky's scan ID, which is printed to |
| 63 | +the console at the end of each scan, and is also included as metadata in the scientist-facing output files. |
| 64 | + |
| 65 | +These files are written by {py:obj}`ibex_bluesky_core.callbacks.document_logger.DocLoggingCallback`, which is subscribed |
| 66 | +to the run engine by default in {py:obj}`ibex_bluesky_core.run_engine.get_run_engine`. |
| 67 | + |
| 68 | +```{note} |
| 69 | +Older raw documents will be moved by the log rotation script to |
| 70 | +`<isis share>\inst$\Backups$\stage-deleted\ndx<instrument>\Instrument\Var\logs\bluesky\raw_documents` |
| 71 | +``` |
| 72 | + |
| 73 | +--- |
| 74 | + |
| 75 | +## How do I ...? |
| 76 | + |
| 77 | +### Run a plan |
| 78 | + |
| 79 | +To run a plan, you need to pass it to the `RE` object, which is made for you by default, as an initialisation command, |
| 80 | +in IBEX GUI versions later than 2025.2. |
| 81 | + |
| 82 | +:::{tip} |
| 83 | +For versions of the GUI earlier than 2025.2, where the run engine is not created automatically, an `RE` object can be |
| 84 | +made manually using |
| 85 | +```python |
| 86 | +from ibex_bluesky_core.run_engine import get_run_engine |
| 87 | +RE = get_run_engine() |
| 88 | +``` |
| 89 | +::: |
| 90 | + |
| 91 | +For example, to run a plan, use: |
| 92 | + |
| 93 | +``` |
| 94 | +>>> RE(some_clever_plan(which, takes, some, arguments)) |
| 95 | +``` |
| 96 | + |
| 97 | +Note that without the `RE(...)` call, the plan **does nothing**: |
| 98 | + |
| 99 | +``` |
| 100 | +>>> some_clever_plan(which, takes, some, arguments) |
| 101 | +<generator object scan at 0x000002286311D080> |
| 102 | +``` |
| 103 | + |
| 104 | +### Pause/stop/resume/abort a plan |
| 105 | + |
| 106 | +While a plan is running, a single ctrl-c will stop gracefully after the next point has finished counting. A message |
| 107 | +like the following will be printed: |
| 108 | + |
| 109 | +> A 'deferred pause' has been requested. The RunEngine will pause at the next checkpoint. |
| 110 | +> To pause immediately, hit Ctrl+C again in the next 10 seconds. |
| 111 | +> Deferred pause acknowledged. Continuing to checkpoint. |
| 112 | +
|
| 113 | +Two ctrl-c keystrokes within 10 seconds of each other will stop the plan immediately. |
| 114 | + |
| 115 | +When a plan is paused, you will get a short exception message with a summary of your options: |
| 116 | + |
| 117 | +``` |
| 118 | +Pausing... |
| 119 | +Traceback (most recent call last): |
| 120 | + File "C:\Instrument\Apps\Python3\Lib\site-packages\IPython\core\interactiveshell.py", line 3579, in run_code |
| 121 | + exec(code_obj, self.user_global_ns, self.user_ns) |
| 122 | + File "<ipython-input-27-288a69b9f38e>", line 1, in <module> |
| 123 | + RE(bps.sleep(999)) |
| 124 | + File "C:\Instrument\Apps\Python3\Lib\site-packages\bluesky\run_engine.py", line 976, in __call__ |
| 125 | + raise RunEngineInterrupted(self.pause_msg) from None |
| 126 | +bluesky.utils.RunEngineInterrupted: |
| 127 | +Your RunEngine is entering a paused state. These are your options for changing |
| 128 | +the state of the RunEngine: |
| 129 | +
|
| 130 | +RE.resume() Resume the plan. |
| 131 | +RE.abort() Perform cleanup, then kill plan. Mark exit_stats='aborted'. |
| 132 | +RE.stop() Perform cleanup, then kill plan. Mark exit_status='success'. |
| 133 | +RE.halt() Emergency Stop: Do not perform cleanup --- just stop. |
| 134 | +``` |
| 135 | + |
| 136 | +At this point, you will be returned to an interactive shell. You must choose one of these options above before |
| 137 | +attempting further operations with the `RunEngine`. |
| 138 | + |
| 139 | +- `RE.resume()` - resumes the scan from the point at which it was interrupted. The plan will complete as usual. |
| 140 | +- `RE.stop()` and `RE.abort()` are functionally identical - the plan will gracefully terminate, including running any |
| 141 | +cleanup actions and calling registered callbacks. |
| 142 | +- `RE.halt()` tells the RunEngine to drop dead and not do **anything** on the way out. This includes cleanup handlers, |
| 143 | +which for example may return motors to sensible positions or return DAE configuration to a state in which data can be |
| 144 | +taken. |
| 145 | + |
| 146 | +If you are unsure exactly what state your `RunEngine` is in, `RE.state` can be used to check this. If you attempt to |
| 147 | +run another plan before choosing one of the options above, you will get an exception like: |
| 148 | + |
| 149 | +``` |
| 150 | +Traceback (most recent call last): |
| 151 | + File "C:\Instrument\Apps\Python3\Lib\site-packages\IPython\core\interactiveshell.py", line 3579, in run_code |
| 152 | + exec(code_obj, self.user_global_ns, self.user_ns) |
| 153 | + File "<ipython-input-29-288a69b9f38e>", line 1, in <module> |
| 154 | + RE(bps.sleep(999)) |
| 155 | + File "C:\Instrument\Apps\Python3\Lib\site-packages\bluesky\run_engine.py", line 920, in __call__ |
| 156 | + raise RuntimeError(f"The RunEngine is in a {self._state} state") |
| 157 | +RuntimeError: The RunEngine is in a paused state |
| 158 | +``` |
| 159 | + |
| 160 | +This means that you still need to choose how to continue from the previous interruption. |
| 161 | + |
| 162 | +### Get the result of a plan |
| 163 | + |
| 164 | +A plan will return a `RunEngineResult` object, which has a `plan_result` attribute: |
| 165 | + |
| 166 | +``` |
| 167 | +result = RE(some_plan()) |
| 168 | +result.plan_result |
| 169 | +``` |
| 170 | + |
| 171 | +If a plan was interrupted and resumed later, the result is returned by the `RE.resume()` call: |
| 172 | + |
| 173 | +``` |
| 174 | +RE(some_plan()) |
| 175 | +<KeyboardInterrupt> |
| 176 | +result = RE.resume() |
| 177 | +result.plan_result |
| 178 | +``` |
| 179 | + |
| 180 | +```{tip} |
| 181 | +If the result wasn't captured, it's still possible to access it in an IPython interactive shell using the special |
| 182 | +underscore variables, for example `result = _`. |
| 183 | +
|
| 184 | +This is not specific to bluesky - it's an IPython feature where `_` is bound to the return value of the last expression, |
| 185 | +`__` is bound to the result of the second-to-last expression, and so on. |
| 186 | +``` |
| 187 | + |
| 188 | +[Do not try to "hide" the `RunEngine` in a script/function](https://blueskyproject.io/bluesky/main/tutorial.html#plans-in-series). |
| 189 | +The `RE(...)` call should always be typed by the user, at the terminal. |
| 190 | + |
| 191 | +### Connect a device |
| 192 | + |
| 193 | +Bluesky plans need devices to be connected up-front. This is so that errors (such as PVs not existing) are detected |
| 194 | +as soon as possible, not mid-way through a scan. |
| 195 | + |
| 196 | +Top-level plan wrappers should use the `ensure_connected` plan stub from `ophyd_async`: |
| 197 | + |
| 198 | +```python |
| 199 | +from ophyd_async.plan_stubs import ensure_connected |
| 200 | + |
| 201 | + |
| 202 | +def top_level_plan(dae, block): |
| 203 | + yield from ensure_connected(dae, block) |
| 204 | + yield from scan(...) |
| 205 | +``` |
| 206 | + |
| 207 | +If you forget to do this, you will get a stack trace containing: |
| 208 | + |
| 209 | +``` |
| 210 | +NotImplementedError: No PV has been set as connect() has not been called |
| 211 | +``` |
| 212 | + |
| 213 | +### Debug `NotConnected` errors |
| 214 | + |
| 215 | +If `ophyd_async` cannot connect to a PV, you will get an error that looks like: |
| 216 | + |
| 217 | +``` |
| 218 | +sample_changer2: NotConnected: |
| 219 | + readback: NotConnected: ca://TE:NDW2922:CS:SB:sample_changer2 |
| 220 | +``` |
| 221 | + |
| 222 | +In the first instance, check that the relevant PV exists if you get it via `caget`. |
| 223 | + |
| 224 | +If the device is a {py:obj}`block_rw<ibex_bluesky_core.devices.block.BlockRw>`, but the `:SP` record does not exist, |
| 225 | +you may use a `block_w` instead. This will both read and write to the same PV. Similarly, if `blockname:SP:RBV` does |
| 226 | +not exist, a {py:obj}`block_rw_rbv<ibex_bluesky_core.devices.block.BlockRwRbv>` cannot be used. |
| 227 | + |
| 228 | +If the device is a {py:obj}`ibex_bluesky_core.devices.reflectometry.ReflParameter` and `redefine` fails to connect, |
| 229 | +pass `has_redefine=False` when constructing that parameter (this means you won't be able to redefine the position |
| 230 | +of this refl parameter). |
| 231 | + |
| 232 | +If instead you get a `TypeError` that looks like: |
| 233 | + |
| 234 | +``` |
| 235 | +sample_changer: NotConnected: |
| 236 | + readback: TypeError: TE:NDW2922:CS:SB:sample_changer with inferred datatype float cannot be coerced to str |
| 237 | +``` |
| 238 | + |
| 239 | +This is because the **datatype of the underlying PV** does not match the **declared type in bluesky**. `ophyd_async` |
| 240 | +will not allow you to connect a `block_r(str, "some_block")` if `"some_block"` is a float-type PV. Every signal in |
| 241 | +`ophyd_async` is strongly typed. |
| 242 | + |
| 243 | +### Change how `set` on a device behaves |
| 244 | + |
| 245 | +For a {py:obj}`writable block<ibex_bluesky_core.devices.block.BlockRw>`, a `write_config` argument can be specified. |
| 246 | +See the options on {py:obj}`BlockWriteConfig<ibex_bluesky_core.devices.block.BlockWriteConfig>` for detailed options. These are specified |
| 247 | +when first creating the `block` object, which is likely to be in |
| 248 | +`\Instrument\Settings\config\NDX<inst>\configurations\Python\inst\bluesky\devices.py`, or dynamically created as part |
| 249 | +of a wrapper plan. See [block documentation](../devices/blocks.md) for detailed documentation about how to set up block |
| 250 | +devices. |
| 251 | + |
| 252 | +For complex cases, where a `set` being complete depends on multiple conditions, a custom `ophyd_async` device is usually |
| 253 | +the cleanest way to accomplish this. These can be defined in an instrument's `inst` scripts area if they are specific |
| 254 | +to one instrument, or can be promoted to `ibex_bluesky_core` if generally useful. |
| 255 | + |
| 256 | +### Add extra sleeps into a plan |
| 257 | + |
| 258 | +Generally you should not need to insert sleeps at the *plan* level - prefer instead to modify *devices* so that they |
| 259 | +accurately report when they have finished setting (including any necessary settle times, for example). Writable blocks |
| 260 | +have a {py:obj}`BlockWriteConfig<ibex_bluesky_core.devices.block.BlockWriteConfig>` which has a `settle_time_s` |
| 261 | +parameter for this purpose. |
| 262 | + |
| 263 | +If you *really* need to sleep in a plan, rather than at the device level, use `yield from bps.sleep(duration_seconds)`. |
| 264 | +Do **not** just call `time.sleep()` in a plan - this will stall the event loop and interfere with keyboard interrupts, |
| 265 | +for example. |
| 266 | + |
| 267 | +### Add exception-handling to a plan |
| 268 | + |
| 269 | +In plans, because they are generators, you cannot just use `try-finally` or `try-except`. |
| 270 | + |
| 271 | +Instead, bluesky provides wrappers that implement error handling: |
| 272 | +[`finalize_wrapper`](https://blueskyproject.io/bluesky/main/generated/bluesky.preprocessors.contingency_wrapper.html#bluesky.preprocessors.contingency_wrapper) |
| 273 | +and |
| 274 | +[`contingency_wrapper`](https://blueskyproject.io/bluesky/main/generated/bluesky.preprocessors.finalize_wrapper.html#bluesky.preprocessors.finalize_wrapper). |
| 275 | + |
| 276 | +### Debug DAE failing to begin |
| 277 | + |
| 278 | +Check the DAE can begin a run without using bluesky. |
| 279 | + |
| 280 | +If you are counting using periods, and using the spectrum-data map, this imposes a limit of {math}`5000000` on number of |
| 281 | +`periods * (spectra+1) * (time channels+1)`. This means that scans with high numbers of points may fail, where those |
| 282 | +with fewer points will work. |
| 283 | + |
| 284 | +### Debug a failed fit |
| 285 | + |
| 286 | +Fitting will only ever start after at least as many points have been collected, as there are free parameters in the |
| 287 | +fitting model. For example, a guassian fit will not be attempted with less than 4 data points. This can occur if a |
| 288 | +scan is interrupted early, for example. |
| 289 | + |
| 290 | +Fits can also fail if the data is particularly poor. |
| 291 | + |
| 292 | +### Debug a bad fit |
| 293 | + |
| 294 | +Consider whether you can use a different fitting model. Some models are more prone to getting stuck in local minima than |
| 295 | +others. |
| 296 | + |
| 297 | +For peak-like data, `"com"` or `"max"` from bluesky's |
| 298 | +[`PeakStats callback`](https://blueskyproject.io/bluesky/main/callbacks.html#peakstats) can robustly give *an* answer, |
| 299 | +although often this will not be as good as a full fit and may not always be appropriate. |
| 300 | + |
| 301 | +### Debug keyboard-interrupt problems |
| 302 | + |
| 303 | +If a double ctrl-c does nothing, this is probably because the event loop is stalled - which is likely the result of a |
| 304 | +plan or device doing I/O or other synchronous operations directly, rather than via `asyncio` or `yield from`. |
| 305 | + |
| 306 | +Find the offending function and replace it with it's bluesky-native equivalent. Python's `asyncio` module has a debug |
| 307 | +mode which warns on long-running tasks - this can help identify the offending call. |
| 308 | + |
| 309 | +--- |
| 310 | + |
| 311 | +## What is this syntax ...? |
| 312 | + |
| 313 | +### `async def` / `await` / `asyncio` |
| 314 | + |
| 315 | +Functions declared as `async def` are [python **coroutines**](https://docs.python.org/3/library/asyncio-task.html#). |
| 316 | + |
| 317 | +In a bluesky context, if you see an `async def` function, it is likely that you are looking at either |
| 318 | +an `ophyd_async` device class, or a `RunEngine` message handler. |
| 319 | + |
| 320 | +In order to get the result of a coroutine, it needs to be `await`ed: |
| 321 | + |
| 322 | +```python |
| 323 | +async def foo(): |
| 324 | + result = await some_coroutine() |
| 325 | +``` |
| 326 | + |
| 327 | +In an `async def` function, do not perform blocking operations. For example, always use `await asyncio.sleep()`, |
| 328 | +and not `time.sleep()`. Similarly, do not use synchronous functions which perform I/O from a coroutine (e.g. calling |
| 329 | +`genie` APIs) - use the `async` APIs provided by `ophyd_async` instead. |
| 330 | + |
| 331 | +As a general rule of thumb, any function that could reasonably take more than a few tens of milliseconds, should not be |
| 332 | +called directly from a coroutine. |
| 333 | + |
| 334 | +### `yield` / `yield from` / `Generator` |
| 335 | + |
| 336 | +Functions containing `yield` and/or `yield from` are [python **generators**](https://wiki.python.org/moin/Generators). |
| 337 | + |
| 338 | +If you see `yield from` in a function, you are likely to be looking at a |
| 339 | +[bluesky plan](https://blueskyproject.io/bluesky/main/plans.html#). |
| 340 | + |
| 341 | +`yield from` is the syntax used to delegate to subplans: |
| 342 | + |
| 343 | +```python |
| 344 | +def some_plan(): |
| 345 | + yield from move(...) |
| 346 | + yield from scan(...) |
| 347 | +``` |
| 348 | + |
| 349 | +Will perform an initial move followed by a scan. |
| 350 | + |
| 351 | +Return values from plans can be captured: |
| 352 | + |
| 353 | +```python |
| 354 | +def plan_that_returns(): |
| 355 | + yield from something() |
| 356 | + return "hello, world" |
| 357 | + |
| 358 | +def plan(): |
| 359 | + returned_value = yield from plan_that_returns() |
| 360 | +``` |
| 361 | + |
| 362 | +Like `coroutines` above, long-running functions or functions which perform I/O should not be used in plans. Doing so |
| 363 | +can stall the `RunEngine`'s internal event loop, break keyboard-interrupts, and break rewindability. Use bluesky-native |
| 364 | +functionality instead - for example: |
| 365 | +- [`yield from bps.sleep()`](https://blueskyproject.io/bluesky/main/generated/bluesky.plan_stubs.sleep.html#bluesky.plan_stubs.sleep) instead of `time.sleep()` |
| 366 | +- [`yield from bps.mv(block_object, value)`](https://blueskyproject.io/bluesky/main/generated/bluesky.plan_stubs.mv.html#bluesky.plan_stubs.mv) instead of `g.cset(block_name, value)` |
| 367 | +- [`value = yield from bps.rd(block_object)`](https://blueskyproject.io/bluesky/main/generated/bluesky.plan_stubs.rd.html#bluesky.plan_stubs.rd) instead of `value = g.cget(block_name)["value"]` |
| 368 | + |
| 369 | +If you must call an external function which may be long running or do I/O from a plan, review |
| 370 | +[call_sync](../plan_stubs/external_code). |
0 commit comments