Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP - Improved HA stability #9854

Closed
wants to merge 25 commits into from

Commits on Oct 31, 2022

  1. Detect distributed setup early in activate process.

    Split loading of enabled server plugins and starting of plugins to allow presence of a distributed server manager to be detected prior to network listeners being established and storage open.
    This allows guarding of constructs that require the distributed plugin to be present and running, which currently experience a race condition between the network listeners starting and the distributed plugin fully starting.
    timw authored and pkendall64 committed Oct 31, 2022
    Configuration menu
    Copy the full SHA
    1694e6f View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    392c94b View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    06cf474 View commit details
    Browse the repository at this point in the history
  4. Add distinct openInternal operation that bypasses online checks.

    The current usages of openNoAuthenticate include cases (like DB delta/full syncs) that need to bypass not only auth checks but distributed online status.
    timw authored and pkendall64 committed Oct 31, 2022
    Configuration menu
    Copy the full SHA
    c6f96fd View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    3b88eca View commit details
    Browse the repository at this point in the history
  6. Use openInternal path for bypass access.

    timw authored and pkendall64 committed Oct 31, 2022
    Configuration menu
    Copy the full SHA
    5808351 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    345c9b6 View commit details
    Browse the repository at this point in the history
  8. Add tracing support to executors.

    Errors in unbound tasks in executors that are launched from common points (e.g. OrientDBEmbedded#execute) are hard to trace.
    This change allows a task ID to be associated with each execution, which will be reported on any exception, and if debug logging is enabled, a full stack trace identifying the launching call site will be attached.
    timw authored and pkendall64 committed Oct 31, 2022
    Configuration menu
    Copy the full SHA
    11427e7 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    fb9170d View commit details
    Browse the repository at this point in the history
  10. Add a global executor for use instead of general one-off threads.

    This allows improved logging and tracing consistency over general use of new Thread()
    timw authored and pkendall64 committed Oct 31, 2022
    Configuration menu
    Copy the full SHA
    6ff2afd View commit details
    Browse the repository at this point in the history
  11. Defer setting running to end of distributed plugin startup.

    This prevents storage tasks that require the distributed status to be online from accessing distributed lifecycle objects that have not yet been set up (which shows up as NPEs during execution).
    timw authored and pkendall64 committed Oct 31, 2022
    Configuration menu
    Copy the full SHA
    0ce5af1 View commit details
    Browse the repository at this point in the history
  12. Defer installation of databases until distributed plugin is online.

    This avoids accesses to uninitialised distributed state during initial database setup from cluster.
    timw authored and pkendall64 committed Oct 31, 2022
    Configuration menu
    Copy the full SHA
    1c49c99 View commit details
    Browse the repository at this point in the history
  13. Sanity check that distributed plugin is online before attempting dist…

    …ributed lock.
    timw authored and pkendall64 committed Oct 31, 2022
    Configuration menu
    Copy the full SHA
    cdcec5b View commit details
    Browse the repository at this point in the history
  14. Fix waiting for last task in ViewManager close.

    Prefer attempting to cancel task before execution before waiting.
    Also removes double logging of execution exception, and avoids problem where get cannot be called after cancel.
    timw authored and pkendall64 committed Oct 31, 2022
    Configuration menu
    Copy the full SHA
    3c53320 View commit details
    Browse the repository at this point in the history
  15. Use tracing executor service in OrientDBEmbedded

    Provide tracing overrides to aid in tracking async errors.
    timw authored and pkendall64 committed Oct 31, 2022
    Configuration menu
    Copy the full SHA
    db45760 View commit details
    Browse the repository at this point in the history
  16. Use internal open for ViewManager init

    Allows registering live updates to succeed when distributed plugin not online.
    timw authored and pkendall64 committed Oct 31, 2022
    Configuration menu
    Copy the full SHA
    ad6363f View commit details
    Browse the repository at this point in the history
  17. Make ViewManager updates resilient to offline DB status.

    View update uses distributed state, which can break if view update occurs during a distributed state change, breaking the update loop.
    timw authored and pkendall64 committed Oct 31, 2022
    Configuration menu
    Copy the full SHA
    ede8fbf View commit details
    Browse the repository at this point in the history
  18. Log transaction ID on re-enqueue

    timw authored and pkendall64 committed Oct 31, 2022
    Configuration menu
    Copy the full SHA
    4ec87b1 View commit details
    Browse the repository at this point in the history
  19. Move DistributedDatabase registration out of constructor.

    ODistributedDatabaseImpl construction registered the instance, leaking the this reference, and shut down the previous instance if present.
    The previous instance may not have been constructed fully however, so shutdown could NPE, resulting in the construction of the current instance aborting with uninitialised state, which would then be picked up by other threads finding it registered in the message service.
    
    This change externalises the construction into an atomic operation in the message service, and makes the state in the distributed database impl final.
    
    The warning about needing registration because of use "further in the call chain" appears to be spurious.
    timw authored and pkendall64 committed Oct 31, 2022
    Configuration menu
    Copy the full SHA
    8c8e67c View commit details
    Browse the repository at this point in the history
  20. Configuration menu
    Copy the full SHA
    d533fc8 View commit details
    Browse the repository at this point in the history
  21. Configuration menu
    Copy the full SHA
    99a0bfb View commit details
    Browse the repository at this point in the history
  22. Configuration menu
    Copy the full SHA
    3d6d5e5 View commit details
    Browse the repository at this point in the history
  23. Eliminate duplicate scheduleTask code.

    timw authored and pkendall64 committed Oct 31, 2022
    Configuration menu
    Copy the full SHA
    d042c99 View commit details
    Browse the repository at this point in the history
  24. Guard database create to avoid partially initialised storage being cr…

    …eated.
    
    If the plugin isn't online, initialisation of newly created database will fail, resulting in a partially initialised database that will break when used (usually because the schema hasn't been loaded).
    timw authored and pkendall64 committed Oct 31, 2022
    Configuration menu
    Copy the full SHA
    5899101 View commit details
    Browse the repository at this point in the history
  25. DEV: Expand HazelcastPlugin startup time to widen window to expose ra…

    …ce conditions on startup.
    timw authored and pkendall64 committed Oct 31, 2022
    Configuration menu
    Copy the full SHA
    61e3af5 View commit details
    Browse the repository at this point in the history