-
Notifications
You must be signed in to change notification settings - Fork 217
Start WLS with -Dweblogic.SituationalConfig.failBootOnError=true #1117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…nd in situational config files
The change LGTM except that: (1) it'd be good to make the new behavior configurable via an environment variable (default to true (2) it'd be good to update the config-override .md doc to explain the new behavior on a failure for 12.2.1.4 and later (the current doc explicitly details the current behavior - which is to 'keep on truckin' even though there are errors), and to perhaps even mention the env var in (1). In addition we've been discussing additional 'bonus work' in slack which could be done by a different JIRA/change, or maybe in this change:
@TheFrogPad FYI |
@@ -338,6 +338,11 @@ spec: | |||
|
|||
Incorrectly formatted override files may be accepted without warnings or errors and will not prevent WebLogic pods from booting. So, it is important to make sure that the template files are correct in a QA environment, otherwise your WebLogic Servers may start even though critically required overrides are failing to take effect. | |||
|
|||
On WebLogic servers that support the weblogic.SituationalConfig.failBootOnError system property ( Note: it is not supported in WebLogic 12.2.1.3 ), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WebLogic servers -> WebLogic Servers
weblogic.SituationalConfig.failBootOnError -> weblogic.SituationalConfig.failBootOnError
Note: it is not supported in WebLogic 12.2.1.3 -> Note: It is not supported in WebLogic Server 12.2.1.3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. WLS Version number also modified to 12.2.1.3.0
docs-source/content/userguide/managing-domains/configoverrides/_index.md
Outdated
Show resolved
Hide resolved
@@ -338,6 +338,11 @@ spec: | |||
|
|||
Incorrectly formatted override files may be accepted without warnings or errors and will not prevent WebLogic pods from booting. So, it is important to make sure that the template files are correct in a QA environment, otherwise your WebLogic Servers may start even though critically required overrides are failing to take effect. | |||
|
|||
On WebLogic servers that support the weblogic.SituationalConfig.failBootOnError system property ( Note: it is not supported in WebLogic 12.2.1.3 ), | |||
by default the WebLogic server will fail to boot if any situational configuration files are invalid. | |||
This can be configured by using the `failBootOnSituationalError` attribute in the Domain spec to `false` to start up the WebLogic servers even with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(suggested sentence re-write) By setting the failBootOnSituationalError
attribute in the Domain spec to false
, you can start up WebLogic Servers even with incorrectly formatted override files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latest update removed the failBootOnSituationalError
attribute from the Domain spec. The sentence now read:
By setting the FAIL_BOOT_ON_SITUATIONAL_CONFIG_ERROR
environment variable in the Kubernetes containers for the WebLogic Servers to false
, you can start up the WebLogic Servers even with incorrectly formatted override files.
@@ -366,6 +371,14 @@ Incorrectly formatted override files may be accepted without warnings or errors | |||
* This directory, or a subdirectory within this directory, should contain each of your custom situational configuration files. | |||
* If it doesn't, then this likely indicates your domain resource `configOverrides` was not set to match your custom override configuration map name, or that your custom override configuration map does not contain your override files. | |||
|
|||
* If the Administration Server pod do start but fails to reach ready state: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the Administration Server pod do start -> If the Administration Server pod does start
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Thanks
@@ -366,6 +371,14 @@ Incorrectly formatted override files may be accepted without warnings or errors | |||
* This directory, or a subdirectory within this directory, should contain each of your custom situational configuration files. | |||
* If it doesn't, then this likely indicates your domain resource `configOverrides` was not set to match your custom override configuration map name, or that your custom override configuration map does not contain your override files. | |||
|
|||
* If the Administration Server pod do start but fails to reach ready state: | |||
* Check if there is a message ` WebLogic server failed to start due to missing or invalid situational configuration files` in the Administration Server pod's `kubectl log` | |||
* This suggests that the Administration Server failed to start may be caused by errors found in the a configuration override file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Administration Server failed to start -> Administration Server failure to start
may be caused by errors -> may have been caused by errors
in the a configuration override -> in a configuration override
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sentence now read:
This suggests that the Administration Server failure to start may have been caused by errors found in a configuration override file.
docs/domains/Domain.md
Outdated
@@ -23,6 +23,7 @@ DomainSpec is a description of a domain. | |||
| `domainHome` | string | The folder for the WebLogic Domain. Not required. Defaults to /shared/domains/domains/domainUID if domainHomeInImage is false Defaults to /u01/oracle/user_projects/domains/ if domainHomeInImage is true | | |||
| `domainHomeInImage` | Boolean | True if this domain's home is defined in the docker image for the domain. Defaults to true. | | |||
| `domainUID` | string | Domain unique identifier. Must be unique across the Kubernetes cluster. Not required. Defaults to the value of metadata.name | | |||
| `failBootOnSituationalError` | Boolean | If true (the default), on WebLogic server that supports this feature, the WebLogic server boot would fail if any errors occur when applying situational configuration during server startup. If false, WebLogic server would start if there are errors in the situational configuration files, but some configuration overrides may be skipped. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(suggested re-write) In WebLogic Server versions that support this feature (greater than 12.2.1.3.0):
- If true (the default), then WebLogic Server would fail to boot if any errors occurred when applying the situational configuration during server startup.
- If false, and if there were errors in the situational configuration files, then WebLogic Server would start but some configuration overrides may be skipped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The failBootOnSituationalError
attribute has been removed in the latest update
@@ -366,6 +371,14 @@ Incorrectly formatted override files may be accepted without warnings or errors | |||
* This directory, or a subdirectory within this directory, should contain each of your custom situational configuration files. | |||
* If it doesn't, then this likely indicates your domain resource `configOverrides` was not set to match your custom override configuration map name, or that your custom override configuration map does not contain your override files. | |||
|
|||
* If the Administration Server pod do start but fails to reach ready state: | |||
* Check if there is a message ` WebLogic server failed to start due to missing or invalid situational configuration files` in the Administration Server pod's `kubectl log` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check if there is a message -> Check for this message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
docs-source/content/userguide/managing-domains/configoverrides/_index.md
Outdated
Show resolved
Hide resolved
model/src/main/java/oracle/kubernetes/weblogic/domain/model/DomainSpec.java
Outdated
Show resolved
Hide resolved
@@ -338,6 +338,11 @@ spec: | |||
|
|||
Incorrectly formatted override files may be accepted without warnings or errors and will not prevent WebLogic pods from booting. So, it is important to make sure that the template files are correct in a QA environment, otherwise your WebLogic Servers may start even though critically required overrides are failing to take effect. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change 'will not prevent' to 'may not prevent'. (Due to the new fail-boot-on-error behavior)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -366,6 +371,14 @@ Incorrectly formatted override files may be accepted without warnings or errors | |||
* This directory, or a subdirectory within this directory, should contain each of your custom situational configuration files. | |||
* If it doesn't, then this likely indicates your domain resource `configOverrides` was not set to match your custom override configuration map name, or that your custom override configuration map does not contain your override files. | |||
|
|||
* If the Administration Server pod does start but fails to reach ready state: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe 'ready state' --> 'ready state or tries to restart:'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
@@ -366,6 +371,14 @@ Incorrectly formatted override files may be accepted without warnings or errors | |||
* This directory, or a subdirectory within this directory, should contain each of your custom situational configuration files. | |||
* If it doesn't, then this likely indicates your domain resource `configOverrides` was not set to match your custom override configuration map name, or that your custom override configuration map does not contain your override files. | |||
|
|||
* If the Administration Server pod does start but fails to reach ready state: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, please move this bullet up above the previous bullet, so it appears just after the other 'does start' bullet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean move this bullet to before line 370?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
docs-source/content/userguide/managing-domains/configoverrides/_index.md
Show resolved
Hide resolved
@@ -137,6 +149,9 @@ function copySitCfg() { | |||
if [ $? = 0 ]; then | |||
for local_fname in ${src_dir}/${fil_prefix}*.xml ; do | |||
copyIfChanged $local_fname $tgt_dir/`basename ${local_fname/${fil_prefix}//}` | |||
trace "Printing contents of situational configuration file $local_fname:" | |||
file_content=`cat $local_fname` | |||
echo "$file_content" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can
file_content=`cat $local_fname`
echo "$file_content"```
be replaced with
cat $local_fname```?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reduced to one line:
echo `cat $local_fname`
"For details, please search your pod log or" | ||
"${SERVER_OUT_FILE} for the keyword 'situational'." | ||
) | ||
trace "${msg[*]}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two comments:
- Looks like spaces are missing? Do you need a space just before each trailing double-quote?
- I keep going back and forth on this one - not sure if it's a concern - but I assume calling grep on the .out log every three seconds might actually incur significant overhead - maybe even enough to cause a big enough blip so as to affect latency sensitive apps. Maybe call this less often then 'doShutdown' is checked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Tom's #2, what about running another script in the background like tailLog.sh that does the grep at periodic interval and that interval could even have an environment variable to change the sleep window?
I'm not suggesting this to make things more complicated but from trying to decipher the whole server lifecycle handling the process looks rather intricate so having a script to check the log means that same script can also be run or even exec'd independent of the lifecycle or other k8s probes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- No need to add spaces in each line
- Created monitorLog.sh script. Since we only monitor the fail to start server due to sit config error message, the script also monitor BEA-000360 (Server started) and exit when server has started. Sleep window by default is 30 seconds, and is configurable via environment variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! This looks much cleaner too...
|
||
* Look in your `DOMAIN_HOME/optconfig` directory. | ||
* This directory, or a subdirectory within this directory, should contain each of your custom situational configuration files. | ||
* If it doesn't, then this likely indicates your domain resource `configOverrides` was not set to match your custom override configuration map name, or that your custom override configuration map does not contain your override files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this bullet still belongs in its original location.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E.g. current lines 379-381 are more clear if they stay in their original location...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move lines 379-381 back to before line 371?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
back to being a sub-bullet of the 'If WebLogic pods do start, then:' bullet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved
@@ -137,6 +142,8 @@ function copySitCfg() { | |||
if [ $? = 0 ]; then | |||
for local_fname in ${src_dir}/${fil_prefix}*.xml ; do | |||
copyIfChanged $local_fname $tgt_dir/`basename ${local_fname/${fil_prefix}//}` | |||
trace "Printing contents of situational configuration file $local_fname:" | |||
echo `cat $local_fname` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why echo `cat $local_fname`
instead of just cat $local_fname
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree. not sure why didn't do that earlier.
@@ -102,6 +102,11 @@ function waitUntilShutdown() { | |||
trace "Showing the server out file from ${SERVER_OUT_FILE}" | |||
${SCRIPTPATH}/tailLog.sh ${SERVER_OUT_FILE} & | |||
fi | |||
FAIL_BOOT_ON_SITUATIONAL_CONFIG_ERROR=${FAIL_BOOT_ON_SITUATIONAL_CONFIG_ERROR:-true} | |||
SERVER_OUT_MONITOR_INTERVAL=${SERVER_OUT_MONITOR_INTERVAL:-30} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the monitor script only runs until the server reports that it started successfully - (I don't recall the original change did that - maybe I missed that) - it seems like it'd be OK to run the monitor more frequently than every 30 seconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was added in the latest change. any suggestion on interval? 10 seconds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe even back to 3 seconds? Boots can occur pretty quick these days.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated interval back to 3 seconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
This feature would require a WebLogic that supports the weblogic.SituationalConfig.failBootOnError system property in the docker image.
By setting the
FAIL_BOOT_ON_SITUATIONAL_CONFIG_ERROR
environment variable in the Kubernetes containers for the WebLogic Servers tofalse
, customers can start up the WebLogic Servers even with incorrectly formatted override files.Prints configuration overrides on server pod log
startServer.sh looks for BEA-141335 which suggests WebLogic fails to start due to situational config errors, and logs message "WebLogic server failed to start due to missing or invalid situational configuration files. Please check ${SERVER_OUT_FILE} for details" to the pod log.
Updated configoverrides/_index.md doc