SOLR service doesn't stop cleanly. 

We consistently see issues when we restart all services in a cluster using the ambari ui (stop all/start all) where locks are not cleared on HDFS:

```
 org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Index dir 'hdfs://example.com:8020/solr/testindex/core_node4/data/index/' of core 'testindex_shard4_replica1' is already locked. 
The most likely cause is another Solr server (or another solr core in this server) also configured to use this directory; other possible causes may be specific to lockType: hdfs
```

To correct the issue we must stop solr and manually clear the locks on HDFS for each core, for example:
```
hadoop fs -rm /solr/testindex/core_node1/data/index/write.lock
hadoop fs -rm /solr/testindex/core_node1/data/index/write.lock
hadoop fs -rm /solr/testindex/core_node1/data/index/write.lock
hadoop fs -rm /solr/testindex/core_node1/data/index/write.lock
```

I saw a [post](https://community.hortonworks.com/articles/15159/securing-solr-collections-with-ranger-kerberos.html) on the hortonworks community board that suggested increasing the timeout for the stop process to prevent this problem:

> Since all Solr data will be stored in the Hadoop Filesystem, it is important to adjust the time Solr will take to shutdown or "kill" the Solr process (whenever you execute "service solr stop/restart"). If this setting is not adjusted, Solr will try to shutdown the Solr process and because it takes a bit more time when using HDFS, Solr will simply kill the process and most of the time lock the Solr Indexes of your collections. If the index of a collection is locked the following exception is shown after the startup routine "org.apache.solr.common.SolrException: Index locked for write"
>
> Increase the sleep time from 5 to 30 seconds in /opt/lucidworks-hdpsearch/solr/bin/solr
> `sed -i 's/(sleep 5)/(sleep 30)/g' /opt/lucidworks-hdpsearch/solr/bin/solr`

Can this problematic behavior be accounted for in the stop lifecycle for the solr service instead?

I am also curious on how SOLR is actually clearing the locks when the service is configured to stop the datanodes prior to SOLR stop. This behavior is due to the content below in the `role_command_order.json`:

```
    "SOLR_SERVER-STOP": [
      "DATANODE-STOP",
      "NAMENODE-STOP",
      "SECONDARY_NAMENODE-STOP"
    ]
```


Our environment Details:
SOLR 5.5.2
HDP 2.6.2.0
solr-ambari-mpack-2.2.8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SOLR service doesn't stop cleanly. #24

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

SOLR service doesn't stop cleanly. #24

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions