Skip to content

Commit

Permalink
Improvements that significantly reduce the chances of request timeouts (
Browse files Browse the repository at this point in the history
#7042)

Request Timeouts started plaguing IDE due to numerous `executionContext/***Visualization` requests. While caused by a bug they revealed a bigger problem in the Language Server when serving large amounts of requests:
1) Long and short lived jobs are fighting for various locks. Lock contention leads to some jobs waiting for a longer than desired leading to unexpected request timeouts. Increasing timeout value is just delaying the problem.
2) Requests coming from IDE are served almost instantly and handled by various commands. Commands can issue further jobs that serve request. We apparently have and always had a single-thread thread pool for serving such jobs, leading to immediate thread starvation.

Both reasons increase the chances of Request Timeouts when dealing with a large number of requests. For 2) I noticed that while we used to set the `enso-runtime-server.jobParallelism` option descriptor key to some machine-dependent value (most likely > 1), the value set would **only** be available for instrumentation. `JobExecutionEngine` where it is actually used would always get the default, i.e. a single-threaded ThreadPool. This means that this option descriptor was simply misused since its introduction. Moved that option to runtime options so that it can be set and retrieved during normal operation.

Adding parallelism intensified problem 1), because now we could execute multiple jobs and they would compete for resources. It also revealed a scenario for a yet another deadlock scenario, due to invalid order of lock acquisition. See `ExecuteJob` vs `UpsertVisualisationJob` order for details.

Still, a number of requests would continue to randomly timeout due to lock contention. It became apparent that
`Attach/Modify/Detach-VisualisationCmd` should not wait until a triggered `UpsertVisualisationJob` sends a response to the client; long and short lived jobs will always compete for resources and we cannot guarantee that they will not timeout that way. That is why the response is sent immediately from the command handler and not from the job executed after it.

This brings another problematic scenario:
1. `AttachVisualisationCmd` is executed, response sent to the client, `UpsertVisualisationJob` scheduled.
2. In the meantime `ModifyVisualisationCmd` comes and fails; command cannot find the visualization that will only be added by `UpsertVisualisationJob`, which might have not yet been scheduled to run.

Remedied that by checking visualisation-related jobs that are still in progress. It also allowed for cancelling jobs which results wouldn't be used anyway (`ModifyVisualisationCmd` sends its own `UpsertVisualisationJob`). This is not a theoretical scenario, it happened frequently on IDE startup.

This change does not fully solve the rather problematic setup of numerous locks, which are requested by short and long lived jobs. A better design should still be investigated. But it significantly reduces the chances of Request Timeouts which IDE had to deal with.

With this change I haven't been able to experience Request Timeouts for relatively modest projects anymore.

I added the possibility of logging wait times for locks to better investigate further problems.

Closes #7005
  • Loading branch information
hubertp authored and Frizi committed Jun 19, 2023
1 parent 83ad16c commit 3c05f61
Show file tree
Hide file tree
Showing 27 changed files with 336 additions and 116 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -826,6 +826,7 @@
- [Improve and colorize compiler's diagnostic messages][6931]
- [Execute some runtime commands synchronously to avoid race conditions][6998]
- [Scala 2.13.11 update][7010]
- [Improve parallel execution of commands and jobs in Language Server][7042]

[3227]: https://github.com/enso-org/enso/pull/3227
[3248]: https://github.com/enso-org/enso/pull/3248
Expand Down Expand Up @@ -944,6 +945,7 @@
[6931]: https://github.com/enso-org/enso/pull/6931
[6998]: https://github.com/enso-org/enso/pull/6998
[7010]: https://github.com/enso-org/enso/pull/7010
[7042]: https://github.com/enso-org/enso/pull/7042

# Enso 2.0.0-alpha.18 (2021-10-12)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -291,7 +291,7 @@ class MainModule(serverConfig: LanguageServerConfig, logLevel: LogLevel) {
.option(RuntimeOptions.LOG_MASKING, Masking.isMaskingEnabled.toString)
.option(RuntimeOptions.EDITION_OVERRIDE, Info.currentEdition)
.option(
RuntimeServerInfo.JOB_PARALLELISM_OPTION,
RuntimeOptions.JOB_PARALLELISM,
Runtime.getRuntime.availableProcessors().toString
)
.option(RuntimeOptions.PREINITIALIZE, "js")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,11 @@ public class RuntimeOptions {
INTERPRETER_SEQUENTIAL_COMMAND_EXECUTION)
.build();

public static final String JOB_PARALLELISM = interpreterOptionName("jobParallelism");
public static final OptionKey<Integer> JOB_PARALLELISM_KEY = new OptionKey<>(1);
public static final OptionDescriptor JOB_PARALLELISM_DESCRIPTOR =
OptionDescriptor.newBuilder(JOB_PARALLELISM_KEY, JOB_PARALLELISM).build();

public static final String ENABLE_PROJECT_SUGGESTIONS = optionName("enableProjectSuggestions");
public static final OptionKey<Boolean> ENABLE_PROJECT_SUGGESTIONS_KEY = new OptionKey<>(true);
private static final OptionDescriptor ENABLE_PROJECT_SUGGESTIONS_DESCRIPTOR =
Expand Down Expand Up @@ -127,6 +132,7 @@ public class RuntimeOptions {
LANGUAGE_HOME_OVERRIDE_DESCRIPTOR,
EDITION_OVERRIDE_DESCRIPTOR,
INTERPRETER_SEQUENTIAL_COMMAND_EXECUTION_DESCRIPTOR,
JOB_PARALLELISM_DESCRIPTOR,
DISABLE_IR_CACHES_DESCRIPTOR,
PREINITIALIZE_DESCRIPTOR,
WAIT_FOR_PENDING_SERIALIZATION_JOBS_DESCRIPTOR,
Expand Down
Original file line number Diff line number Diff line change
@@ -1,16 +1,11 @@
package org.enso.polyglot;

import org.graalvm.options.OptionDescriptor;
import org.graalvm.options.OptionKey;

/** Container for Runtime Server related constants. */
public class RuntimeServerInfo {
public static final String URI = "enso://runtime-server";
public static final String INSTRUMENT_NAME = "enso-runtime-server";
public static final String ENABLE_OPTION = INSTRUMENT_NAME + ".enable";

public static final String JOB_PARALLELISM_OPTION = INSTRUMENT_NAME + ".jobParallelism";
public static final OptionKey<Integer> JOB_PARALLELISM_KEY = new OptionKey<>(1);
public static final OptionDescriptor JOB_PARALLELISM_DESCRIPTOR =
OptionDescriptor.newBuilder(JOB_PARALLELISM_KEY, JOB_PARALLELISM_OPTION).build();
public static final OptionKey<String> ENABLE_OPTION_KEY = new OptionKey<>("");
}
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
package org.enso.interpreter.instrument.command;

import java.util.UUID;
import java.util.logging.Level;
import org.enso.interpreter.instrument.CacheInvalidation;
import org.enso.interpreter.instrument.InstrumentFrame;
import org.enso.interpreter.instrument.execution.RuntimeContext;
Expand Down Expand Up @@ -40,8 +41,10 @@ public Future<BoxedUnit> execute(RuntimeContext ctx, ExecutionContext ec) {

private void setExecutionEnvironment(
Runtime$Api$ExecutionEnvironment executionEnvironment, UUID contextId, RuntimeContext ctx) {
ctx.locking().acquireContextLock(contextId);
ctx.locking().acquireWriteCompilationLock();
var logger = ctx.executionService().getLogger();
var contextLockTimestamp = ctx.locking().acquireContextLock(contextId);
var writeLockTimestamp = ctx.locking().acquireWriteCompilationLock();

try {
Stack<InstrumentFrame> stack = ctx.contextManager().getStack(contextId);
ctx.jobControlPlane().abortJobs(contextId);
Expand All @@ -53,7 +56,17 @@ private void setExecutionEnvironment(
reply(new Runtime$Api$SetExecutionEnvironmentResponse(contextId), ctx);
} finally {
ctx.locking().releaseWriteCompilationLock();
logger.log(
Level.FINEST,
"Kept write compilation lock [SetExecutionEnvironmentCommand] for "
+ (System.currentTimeMillis() - writeLockTimestamp)
+ " milliseconds");
ctx.locking().releaseContextLock(contextId);
logger.log(
Level.FINEST,
"Kept context lock [SetExecutionEnvironmentCommand] for "
+ (System.currentTimeMillis() - contextLockTimestamp)
+ " milliseconds");
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ public Void run(RuntimeContext ctx) {
.getEnvironment()
.getOptions()
.get(RuntimeOptions.USE_GLOBAL_IR_CACHE_LOCATION_KEY);
ctx.locking().acquireWriteCompilationLock();
var writeLockTimestamp = ctx.locking().acquireWriteCompilationLock();
try {
ctx.executionService()
.getContext()
Expand All @@ -50,6 +50,13 @@ public Void run(RuntimeContext ctx) {
});
} finally {
ctx.locking().releaseWriteCompilationLock();
ctx.executionService()
.getLogger()
.log(
Level.FINEST,
"Kept write compilation lock [SetExecutionEnvironmentCommand] for "
+ (System.currentTimeMillis() - writeLockTimestamp)
+ " milliseconds");
}
return null;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@ package org.enso.interpreter.instrument.command
import org.enso.interpreter.instrument.execution.RuntimeContext
import org.enso.interpreter.instrument.job.{ExecuteJob, UpsertVisualisationJob}
import org.enso.polyglot.runtime.Runtime.Api
import org.enso.polyglot.runtime.Runtime.Api.RequestId

import java.util.logging.Level
import scala.concurrent.{ExecutionContext, Future}

/** A command that attaches a visualisation to an expression.
Expand All @@ -13,7 +13,7 @@ import scala.concurrent.{ExecutionContext, Future}
* @param request a request for a service
*/
class AttachVisualisationCmd(
maybeRequestId: Option[RequestId],
maybeRequestId: Option[Api.RequestId],
request: Api.AttachVisualisation
) extends AsynchronousCommand(maybeRequestId) {

Expand All @@ -22,8 +22,9 @@ class AttachVisualisationCmd(
ctx: RuntimeContext,
ec: ExecutionContext
): Future[Unit] = {
val contextId = request.visualisationConfig.executionContextId
ctx.locking.acquireContextLock(contextId)
val logger = ctx.executionService.getLogger
val contextId = request.visualisationConfig.executionContextId
val lockTimestamp = ctx.locking.acquireContextLock(contextId)
try {
if (doesContextExist) {
attachVisualisation()
Expand All @@ -32,6 +33,10 @@ class AttachVisualisationCmd(
}
} finally {
ctx.locking.releaseContextLock(contextId)
logger.log(
Level.FINEST,
s"Kept context lock [AttachVisualisationCmd] for ${System.currentTimeMillis() - lockTimestamp} milliseconds"
)
}
}

Expand All @@ -45,11 +50,13 @@ class AttachVisualisationCmd(
ctx: RuntimeContext,
ec: ExecutionContext
): Future[Unit] = {
ctx.endpoint.sendToClient(
Api.Response(maybeRequestId, Api.VisualisationAttached())
)
val maybeFutureExecutable =
ctx.jobProcessor.run(
new UpsertVisualisationJob(
maybeRequestId,
Api.VisualisationAttached(),
request.visualisationId,
request.expressionId,
request.visualisationConfig
Expand All @@ -73,4 +80,8 @@ class AttachVisualisationCmd(
}
}

override def toString: String = {
"AttachVisualizationCmd(visualizationId: " + request.visualisationId + ",expressionId=" + request.expressionId + ")"
}

}
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ package org.enso.interpreter.instrument.command
import org.enso.interpreter.instrument.execution.RuntimeContext
import org.enso.polyglot.runtime.Runtime.Api

import java.util.logging.Level

/** A command that closes a file.
*
* @param request a request for a service
Expand All @@ -11,16 +13,29 @@ class CloseFileCmd(request: Api.CloseFileNotification)
extends SynchronousCommand(None) {

override def executeSynchronously(implicit ctx: RuntimeContext): Unit = {
ctx.locking.acquireReadCompilationLock()
ctx.locking.acquireFileLock(request.path)
ctx.locking.acquirePendingEditsLock()
val logger = ctx.executionService.getLogger
val readLockTimestamp = ctx.locking.acquireReadCompilationLock()
val fileLockTimestamp = ctx.locking.acquireFileLock(request.path)
val pendingEditsLockTimestamp = ctx.locking.acquirePendingEditsLock()
try {
ctx.state.pendingEdits.dequeue(request.path)
ctx.executionService.resetModuleSources(request.path)
} finally {
ctx.locking.releasePendingEditsLock()
logger.log(
Level.FINEST,
"Kept pending edits lock [CloseFileCmd] for " + (System.currentTimeMillis - pendingEditsLockTimestamp) + " milliseconds"
)
ctx.locking.releaseFileLock(request.path)
logger.log(
Level.FINEST,
"Kept file lock [CloseFileCmd] for " + (System.currentTimeMillis - fileLockTimestamp) + " milliseconds"
)
ctx.locking.releaseReadCompilationLock()
logger.log(
Level.FINEST,
"Kept read compilation lock [CloseFileCmd] for " + (System.currentTimeMillis - readLockTimestamp) + " milliseconds"
)
}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ import org.enso.interpreter.instrument.execution.RuntimeContext
import org.enso.polyglot.runtime.Runtime.Api
import org.enso.polyglot.runtime.Runtime.Api.RequestId

import java.util.logging.Level
import scala.concurrent.{ExecutionContext, Future}

/** A command that destroys the specified execution context.
Expand Down Expand Up @@ -34,13 +35,18 @@ class DestroyContextCmd(
}

private def removeContext()(implicit ctx: RuntimeContext): Unit = {
val logger = ctx.executionService.getLogger
ctx.jobControlPlane.abortJobs(request.contextId)
ctx.locking.acquireContextLock(request.contextId)
val lockTimestamp = ctx.locking.acquireContextLock(request.contextId)
try {
ctx.contextManager.destroy(request.contextId)
reply(Api.DestroyContextResponse(request.contextId))
} finally {
ctx.locking.releaseContextLock(request.contextId)
logger.log(
Level.FINEST,
s"Kept context lock [DestroyContextCmd] for ${System.currentTimeMillis() - lockTimestamp} milliseconds"
)
ctx.locking.removeContextLock(request.contextId)
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ import org.enso.interpreter.instrument.job.DetachVisualisationJob
import org.enso.polyglot.runtime.Runtime.Api
import org.enso.polyglot.runtime.Runtime.Api.RequestId

import java.util.logging.Level
import scala.concurrent.{ExecutionContext, Future}

/** A command that detaches a visualisation from the expression.
Expand All @@ -22,7 +23,8 @@ class DetachVisualisationCmd(
ctx: RuntimeContext,
ec: ExecutionContext
): Future[Unit] = {
ctx.locking.acquireContextLock(request.contextId)
val logger = ctx.executionService.getLogger
val lockTimestamp = ctx.locking.acquireContextLock(request.contextId)
try {
if (doesContextExist) {
detachVisualization()
Expand All @@ -31,6 +33,10 @@ class DetachVisualisationCmd(
}
} finally {
ctx.locking.releaseContextLock(request.contextId)
logger.log(
Level.FINEST,
s"Kept context lock [DetachVisualisationCmd] for ${System.currentTimeMillis() - lockTimestamp} milliseconds"
)
}
}

Expand All @@ -40,16 +46,18 @@ class DetachVisualisationCmd(

private def detachVisualization()(implicit
ctx: RuntimeContext
): Future[Unit] =
): Future[Unit] = {
ctx.endpoint.sendToClient(
Api.Response(maybeRequestId, Api.VisualisationDetached())
)
ctx.jobProcessor.run(
new DetachVisualisationJob(
maybeRequestId,
request.visualisationId,
request.expressionId,
request.contextId,
Api.VisualisationDetached()
request.contextId
)
)
}

private def replyWithContextNotExistError()(implicit
ctx: RuntimeContext,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ import org.enso.interpreter.instrument.execution.model.PendingEdit
import org.enso.interpreter.instrument.job.{EnsureCompiledJob, ExecuteJob}
import org.enso.polyglot.runtime.Runtime.Api

import java.util.logging.Level
import scala.concurrent.{ExecutionContext, Future}

/** A command that performs edition of a file.
Expand All @@ -22,8 +23,9 @@ class EditFileCmd(request: Api.EditFileNotification)
ctx: RuntimeContext,
ec: ExecutionContext
): Future[Unit] = {
ctx.locking.acquireFileLock(request.path)
ctx.locking.acquirePendingEditsLock()
val logger = ctx.executionService.getLogger
val fileLockTimestamp = ctx.locking.acquireFileLock(request.path)
val pendingEditsLockTimestamp = ctx.locking.acquirePendingEditsLock()
try {
val edits =
request.edits.map(edit => PendingEdit.ApplyEdit(edit, request.execute))
Expand All @@ -36,7 +38,15 @@ class EditFileCmd(request: Api.EditFileNotification)
Future.successful(())
} finally {
ctx.locking.releasePendingEditsLock()
logger.log(
Level.FINEST,
"Kept pending edits lock [EditFileCmd] for " + (System.currentTimeMillis - pendingEditsLockTimestamp) + " milliseconds"
)
ctx.locking.releaseFileLock(request.path)
logger.log(
Level.FINEST,
"Kept file lock [EditFileCmd] for " + (System.currentTimeMillis - fileLockTimestamp) + " milliseconds"
)
}
}

Expand Down
Loading

0 comments on commit 3c05f61

Please sign in to comment.