Skip to content

SOLR service doesn't stop cleanly.  #24

@jackson-chris

Description

@jackson-chris

We consistently see issues when we restart all services in a cluster using the ambari ui (stop all/start all) where locks are not cleared on HDFS:

 org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Index dir 'hdfs://example.com:8020/solr/testindex/core_node4/data/index/' of core 'testindex_shard4_replica1' is already locked. 
The most likely cause is another Solr server (or another solr core in this server) also configured to use this directory; other possible causes may be specific to lockType: hdfs

To correct the issue we must stop solr and manually clear the locks on HDFS for each core, for example:

hadoop fs -rm /solr/testindex/core_node1/data/index/write.lock
hadoop fs -rm /solr/testindex/core_node1/data/index/write.lock
hadoop fs -rm /solr/testindex/core_node1/data/index/write.lock
hadoop fs -rm /solr/testindex/core_node1/data/index/write.lock

I saw a post on the hortonworks community board that suggested increasing the timeout for the stop process to prevent this problem:

Since all Solr data will be stored in the Hadoop Filesystem, it is important to adjust the time Solr will take to shutdown or "kill" the Solr process (whenever you execute "service solr stop/restart"). If this setting is not adjusted, Solr will try to shutdown the Solr process and because it takes a bit more time when using HDFS, Solr will simply kill the process and most of the time lock the Solr Indexes of your collections. If the index of a collection is locked the following exception is shown after the startup routine "org.apache.solr.common.SolrException: Index locked for write"

Increase the sleep time from 5 to 30 seconds in /opt/lucidworks-hdpsearch/solr/bin/solr
sed -i 's/(sleep 5)/(sleep 30)/g' /opt/lucidworks-hdpsearch/solr/bin/solr

Can this problematic behavior be accounted for in the stop lifecycle for the solr service instead?

I am also curious on how SOLR is actually clearing the locks when the service is configured to stop the datanodes prior to SOLR stop. This behavior is due to the content below in the role_command_order.json:

    "SOLR_SERVER-STOP": [
      "DATANODE-STOP",
      "NAMENODE-STOP",
      "SECONDARY_NAMENODE-STOP"
    ]

Our environment Details:
SOLR 5.5.2
HDP 2.6.2.0
solr-ambari-mpack-2.2.8

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions