-
Notifications
You must be signed in to change notification settings - Fork 13
Description
We consistently see issues when we restart all services in a cluster using the ambari ui (stop all/start all) where locks are not cleared on HDFS:
org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Index dir 'hdfs://example.com:8020/solr/testindex/core_node4/data/index/' of core 'testindex_shard4_replica1' is already locked.
The most likely cause is another Solr server (or another solr core in this server) also configured to use this directory; other possible causes may be specific to lockType: hdfs
To correct the issue we must stop solr and manually clear the locks on HDFS for each core, for example:
hadoop fs -rm /solr/testindex/core_node1/data/index/write.lock
hadoop fs -rm /solr/testindex/core_node1/data/index/write.lock
hadoop fs -rm /solr/testindex/core_node1/data/index/write.lock
hadoop fs -rm /solr/testindex/core_node1/data/index/write.lock
I saw a post on the hortonworks community board that suggested increasing the timeout for the stop process to prevent this problem:
Since all Solr data will be stored in the Hadoop Filesystem, it is important to adjust the time Solr will take to shutdown or "kill" the Solr process (whenever you execute "service solr stop/restart"). If this setting is not adjusted, Solr will try to shutdown the Solr process and because it takes a bit more time when using HDFS, Solr will simply kill the process and most of the time lock the Solr Indexes of your collections. If the index of a collection is locked the following exception is shown after the startup routine "org.apache.solr.common.SolrException: Index locked for write"
Increase the sleep time from 5 to 30 seconds in /opt/lucidworks-hdpsearch/solr/bin/solr
sed -i 's/(sleep 5)/(sleep 30)/g' /opt/lucidworks-hdpsearch/solr/bin/solr
Can this problematic behavior be accounted for in the stop lifecycle for the solr service instead?
I am also curious on how SOLR is actually clearing the locks when the service is configured to stop the datanodes prior to SOLR stop. This behavior is due to the content below in the role_command_order.json:
"SOLR_SERVER-STOP": [
"DATANODE-STOP",
"NAMENODE-STOP",
"SECONDARY_NAMENODE-STOP"
]
Our environment Details:
SOLR 5.5.2
HDP 2.6.2.0
solr-ambari-mpack-2.2.8