A python script to detect duplicate documents in Elasticsearch. Once duplicates have been detected, it is straightforward to call a delete operation to remove duplicates.
For a full description on how this script works including an analysis of the memory requirements, see: https://alexmarquardt.com/2018/07/23/deduplicating-documents-in-elasticsearch/