Skip to content

Start WLS with -Dweblogic.SituationalConfig.failBootOnError=true #1117

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Jul 1, 2019

Conversation

alai8
Copy link
Member

@alai8 alai8 commented Jun 18, 2019

This feature would require a WebLogic that supports the weblogic.SituationalConfig.failBootOnError system property in the docker image.

By setting the FAIL_BOOT_ON_SITUATIONAL_CONFIG_ERROR environment variable in the Kubernetes containers for the WebLogic Servers to false, customers can start up the WebLogic Servers even with incorrectly formatted override files.

Prints configuration overrides on server pod log

startServer.sh looks for BEA-141335 which suggests WebLogic fails to start due to situational config errors, and logs message "WebLogic server failed to start due to missing or invalid situational configuration files. Please check ${SERVER_OUT_FILE} for details" to the pod log.

Updated configoverrides/_index.md doc

@alai8 alai8 requested a review from tbarnes-us June 18, 2019 17:58
@tbarnes-us
Copy link

The change LGTM except that:

(1) it'd be good to make the new behavior configurable via an environment variable (default to true

(2) it'd be good to update the config-override .md doc to explain the new behavior on a failure for 12.2.1.4 and later (the current doc explicitly details the current behavior - which is to 'keep on truckin' even though there are errors), and to perhaps even mention the env var in (1).

In addition we've been discussing additional 'bonus work' in slack which could be done by a different JIRA/change, or maybe in this change:

  • When an error occurs, it'll be a generic 'situational configuration' error that might make it hard for many Kubernetes Operator administrators to figure out the root cause. So it'd be helpful if we could detect and report it with additional information. For example, we could hint that the customer's 'domain CR configuration overrides' may have an error.

  • One approach might be to grep the servers stdout capture files during shutdown for the new error and log a more detailed exception to stdout. (Search for 'doShutdown' in the scripts.)

  • A more advanced approach would be to add an Event to the Domain CR.

@TheFrogPad FYI

@alai8 alai8 requested a review from rosemarymarano June 20, 2019 16:28
@@ -338,6 +338,11 @@ spec:

Incorrectly formatted override files may be accepted without warnings or errors and will not prevent WebLogic pods from booting. So, it is important to make sure that the template files are correct in a QA environment, otherwise your WebLogic Servers may start even though critically required overrides are failing to take effect.

On WebLogic servers that support the weblogic.SituationalConfig.failBootOnError system property ( Note: it is not supported in WebLogic 12.2.1.3 ),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WebLogic servers -> WebLogic Servers
weblogic.SituationalConfig.failBootOnError -> weblogic.SituationalConfig.failBootOnError
Note: it is not supported in WebLogic 12.2.1.3 -> Note: It is not supported in WebLogic Server 12.2.1.3

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. WLS Version number also modified to 12.2.1.3.0

@@ -338,6 +338,11 @@ spec:

Incorrectly formatted override files may be accepted without warnings or errors and will not prevent WebLogic pods from booting. So, it is important to make sure that the template files are correct in a QA environment, otherwise your WebLogic Servers may start even though critically required overrides are failing to take effect.

On WebLogic servers that support the weblogic.SituationalConfig.failBootOnError system property ( Note: it is not supported in WebLogic 12.2.1.3 ),
by default the WebLogic server will fail to boot if any situational configuration files are invalid.
This can be configured by using the `failBootOnSituationalError` attribute in the Domain spec to `false` to start up the WebLogic servers even with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(suggested sentence re-write) By setting the failBootOnSituationalError attribute in the Domain spec to false, you can start up WebLogic Servers even with incorrectly formatted override files.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest update removed the failBootOnSituationalError attribute from the Domain spec. The sentence now read:
By setting the FAIL_BOOT_ON_SITUATIONAL_CONFIG_ERROR environment variable in the Kubernetes containers for the WebLogic Servers to false, you can start up the WebLogic Servers even with incorrectly formatted override files.

@@ -366,6 +371,14 @@ Incorrectly formatted override files may be accepted without warnings or errors
* This directory, or a subdirectory within this directory, should contain each of your custom situational configuration files.
* If it doesn't, then this likely indicates your domain resource `configOverrides` was not set to match your custom override configuration map name, or that your custom override configuration map does not contain your override files.

* If the Administration Server pod do start but fails to reach ready state:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the Administration Server pod do start -> If the Administration Server pod does start

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Thanks

@@ -366,6 +371,14 @@ Incorrectly formatted override files may be accepted without warnings or errors
* This directory, or a subdirectory within this directory, should contain each of your custom situational configuration files.
* If it doesn't, then this likely indicates your domain resource `configOverrides` was not set to match your custom override configuration map name, or that your custom override configuration map does not contain your override files.

* If the Administration Server pod do start but fails to reach ready state:
* Check if there is a message ` WebLogic server failed to start due to missing or invalid situational configuration files` in the Administration Server pod's `kubectl log`
* This suggests that the Administration Server failed to start may be caused by errors found in the a configuration override file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Administration Server failed to start -> Administration Server failure to start
may be caused by errors -> may have been caused by errors
in the a configuration override -> in a configuration override

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sentence now read:
This suggests that the Administration Server failure to start may have been caused by errors found in a configuration override file.

@@ -23,6 +23,7 @@ DomainSpec is a description of a domain.
| `domainHome` | string | The folder for the WebLogic Domain. Not required. Defaults to /shared/domains/domains/domainUID if domainHomeInImage is false Defaults to /u01/oracle/user_projects/domains/ if domainHomeInImage is true |
| `domainHomeInImage` | Boolean | True if this domain's home is defined in the docker image for the domain. Defaults to true. |
| `domainUID` | string | Domain unique identifier. Must be unique across the Kubernetes cluster. Not required. Defaults to the value of metadata.name |
| `failBootOnSituationalError` | Boolean | If true (the default), on WebLogic server that supports this feature, the WebLogic server boot would fail if any errors occur when applying situational configuration during server startup. If false, WebLogic server would start if there are errors in the situational configuration files, but some configuration overrides may be skipped. |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(suggested re-write) In WebLogic Server versions that support this feature (greater than 12.2.1.3.0):

  • If true (the default), then WebLogic Server would fail to boot if any errors occurred when applying the situational configuration during server startup.
  • If false, and if there were errors in the situational configuration files, then WebLogic Server would start but some configuration overrides may be skipped.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The failBootOnSituationalError attribute has been removed in the latest update

@@ -366,6 +371,14 @@ Incorrectly formatted override files may be accepted without warnings or errors
* This directory, or a subdirectory within this directory, should contain each of your custom situational configuration files.
* If it doesn't, then this likely indicates your domain resource `configOverrides` was not set to match your custom override configuration map name, or that your custom override configuration map does not contain your override files.

* If the Administration Server pod do start but fails to reach ready state:
* Check if there is a message ` WebLogic server failed to start due to missing or invalid situational configuration files` in the Administration Server pod's `kubectl log`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check if there is a message -> Check for this message

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -338,6 +338,11 @@ spec:

Incorrectly formatted override files may be accepted without warnings or errors and will not prevent WebLogic pods from booting. So, it is important to make sure that the template files are correct in a QA environment, otherwise your WebLogic Servers may start even though critically required overrides are failing to take effect.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change 'will not prevent' to 'may not prevent'. (Due to the new fail-boot-on-error behavior)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -366,6 +371,14 @@ Incorrectly formatted override files may be accepted without warnings or errors
* This directory, or a subdirectory within this directory, should contain each of your custom situational configuration files.
* If it doesn't, then this likely indicates your domain resource `configOverrides` was not set to match your custom override configuration map name, or that your custom override configuration map does not contain your override files.

* If the Administration Server pod does start but fails to reach ready state:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe 'ready state' --> 'ready state or tries to restart:'

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -366,6 +371,14 @@ Incorrectly formatted override files may be accepted without warnings or errors
* This directory, or a subdirectory within this directory, should contain each of your custom situational configuration files.
* If it doesn't, then this likely indicates your domain resource `configOverrides` was not set to match your custom override configuration map name, or that your custom override configuration map does not contain your override files.

* If the Administration Server pod does start but fails to reach ready state:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, please move this bullet up above the previous bullet, so it appears just after the other 'does start' bullet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean move this bullet to before line 370?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -137,6 +149,9 @@ function copySitCfg() {
if [ $? = 0 ]; then
for local_fname in ${src_dir}/${fil_prefix}*.xml ; do
copyIfChanged $local_fname $tgt_dir/`basename ${local_fname/${fil_prefix}//}`
trace "Printing contents of situational configuration file $local_fname:"
file_content=`cat $local_fname`
echo "$file_content"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can

file_content=`cat $local_fname`
echo "$file_content"``` 
be replaced with 

cat $local_fname```?

Copy link
Member Author

@alai8 alai8 Jun 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reduced to one line:
echo `cat $local_fname`

"For details, please search your pod log or"
"${SERVER_OUT_FILE} for the keyword 'situational'."
)
trace "${msg[*]}"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two comments:

  1. Looks like spaces are missing? Do you need a space just before each trailing double-quote?
  2. I keep going back and forth on this one - not sure if it's a concern - but I assume calling grep on the .out log every three seconds might actually incur significant overhead - maybe even enough to cause a big enough blip so as to affect latency sensitive apps. Maybe call this less often then 'doShutdown' is checked?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Tom's #2, what about running another script in the background like tailLog.sh that does the grep at periodic interval and that interval could even have an environment variable to change the sleep window?

I'm not suggesting this to make things more complicated but from trying to decipher the whole server lifecycle handling the process looks rather intricate so having a script to check the log means that same script can also be run or even exec'd independent of the lifecycle or other k8s probes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. No need to add spaces in each line
  2. Created monitorLog.sh script. Since we only monitor the fail to start server due to sit config error message, the script also monitor BEA-000360 (Server started) and exit when server has started. Sleep window by default is 30 seconds, and is configurable via environment variable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! This looks much cleaner too...


* Look in your `DOMAIN_HOME/optconfig` directory.
* This directory, or a subdirectory within this directory, should contain each of your custom situational configuration files.
* If it doesn't, then this likely indicates your domain resource `configOverrides` was not set to match your custom override configuration map name, or that your custom override configuration map does not contain your override files.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this bullet still belongs in its original location.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E.g. current lines 379-381 are more clear if they stay in their original location...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move lines 379-381 back to before line 371?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

back to being a sub-bullet of the 'If WebLogic pods do start, then:' bullet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved

@@ -137,6 +142,8 @@ function copySitCfg() {
if [ $? = 0 ]; then
for local_fname in ${src_dir}/${fil_prefix}*.xml ; do
copyIfChanged $local_fname $tgt_dir/`basename ${local_fname/${fil_prefix}//}`
trace "Printing contents of situational configuration file $local_fname:"
echo `cat $local_fname`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why echo `cat $local_fname` instead of just cat $local_fname?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree. not sure why didn't do that earlier.

@@ -102,6 +102,11 @@ function waitUntilShutdown() {
trace "Showing the server out file from ${SERVER_OUT_FILE}"
${SCRIPTPATH}/tailLog.sh ${SERVER_OUT_FILE} &
fi
FAIL_BOOT_ON_SITUATIONAL_CONFIG_ERROR=${FAIL_BOOT_ON_SITUATIONAL_CONFIG_ERROR:-true}
SERVER_OUT_MONITOR_INTERVAL=${SERVER_OUT_MONITOR_INTERVAL:-30}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the monitor script only runs until the server reports that it started successfully - (I don't recall the original change did that - maybe I missed that) - it seems like it'd be OK to run the monitor more frequently than every 30 seconds.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was added in the latest change. any suggestion on interval? 10 seconds?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe even back to 3 seconds? Boots can occur pretty quick these days.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated interval back to 3 seconds.

Copy link

@tbarnes-us tbarnes-us left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@rjeberhard rjeberhard merged commit aed2764 into develop Jul 1, 2019
@alai8 alai8 deleted the owls-70496 branch July 1, 2019 23:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants