Skip to content

Commit

Permalink
Merge branch 'main' into bump_example_compile_version
Browse files Browse the repository at this point in the history
  • Loading branch information
elasticmachine authored Sep 13, 2024
2 parents 59d1da8 + 86a88d7 commit 34fae4f
Show file tree
Hide file tree
Showing 34 changed files with 1,162 additions and 216 deletions.
8 changes: 1 addition & 7 deletions .buildkite/pipelines/periodic.template.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ steps:
matrix:
setup:
ES_RUNTIME_JAVA:
- openjdk17
- openjdk21
GRADLE_TASK:
- checkPart1
- checkPart2
Expand Down Expand Up @@ -88,10 +88,7 @@ steps:
matrix:
setup:
ES_RUNTIME_JAVA:
- graalvm-ce17
- openjdk17
- openjdk21
- openjdk22
- openjdk23
GRADLE_TASK:
- checkPart1
Expand All @@ -115,10 +112,7 @@ steps:
matrix:
setup:
ES_RUNTIME_JAVA:
- graalvm-ce17
- openjdk17
- openjdk21
- openjdk22
- openjdk23
BWC_VERSION: $BWC_LIST
agents:
Expand Down
8 changes: 1 addition & 7 deletions .buildkite/pipelines/periodic.yml
Original file line number Diff line number Diff line change
Expand Up @@ -407,7 +407,7 @@ steps:
matrix:
setup:
ES_RUNTIME_JAVA:
- openjdk17
- openjdk21
GRADLE_TASK:
- checkPart1
- checkPart2
Expand Down Expand Up @@ -449,10 +449,7 @@ steps:
matrix:
setup:
ES_RUNTIME_JAVA:
- graalvm-ce17
- openjdk17
- openjdk21
- openjdk22
- openjdk23
GRADLE_TASK:
- checkPart1
Expand All @@ -476,10 +473,7 @@ steps:
matrix:
setup:
ES_RUNTIME_JAVA:
- graalvm-ce17
- openjdk17
- openjdk21
- openjdk22
- openjdk23
BWC_VERSION: ["8.15.2", "8.16.0", "9.0.0"]
agents:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@
"CRUD",
"Client",
"Cluster Coordination",
"Codec",
"Data streams",
"DLM",
"Discovery-Plugins",
Expand Down
5 changes: 5 additions & 0 deletions docs/changelog/111684.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pr: 111684
summary: Write downloaded model parts async
area: Machine Learning
type: enhancement
issues: []
5 changes: 5 additions & 0 deletions docs/changelog/112652.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pr: 110399
summary: "[Inference API] alibabacloud ai search service support chunk infer to support semantic_text field"
area: Machine Learning
type: enhancement
issues: []
14 changes: 14 additions & 0 deletions docs/changelog/112665.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
pr: 112665
summary: Remove zstd feature flag for index codec best compression
area: Codec
type: enhancement
issues: []
highlight:
title: Enable ZStandard compression for indices with index.codec set to best_compression
body: |-
Before DEFLATE compression was used to compress stored fields in indices with index.codec index setting set to
best_compression, with this change ZStandard is used as compression algorithm to stored fields for indices with
index.codec index setting set to best_compression. The usage ZStandard results in less storage usage with a
similar indexing throughput depending on what options are used. Experiments with indexing logs have shown that
ZStandard offers ~12% lower storage usage and a ~14% higher indexing throughput compared to DEFLATE.
notable: true
5 changes: 5 additions & 0 deletions docs/changelog/112850.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pr: 112850
summary: Fix synthetic source field names for multi-fields
area: Mapping
type: bug
issues: []
139 changes: 136 additions & 3 deletions docs/internal/DistributedArchitectureGuide.md

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/reference/ilm/actions/ilm-forcemerge.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ Number of segments to merge to. To fully merge the index, set to `1`.
`index_codec`::
(Optional, string)
Codec used to compress the document store. The only accepted value is
`best_compression`, which uses {wikipedia}/DEFLATE[DEFLATE] for a higher
`best_compression`, which uses {wikipedia}/Zstd[ZSTD] for a higher
compression ratio but slower stored fields performance. To use the default LZ4
codec, omit this argument.
+
Expand Down
12 changes: 7 additions & 5 deletions docs/reference/index-modules.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -76,14 +76,16 @@ breaking change].

The +default+ value compresses stored data with LZ4
compression, but this can be set to +best_compression+
which uses {wikipedia}/DEFLATE[DEFLATE] for a higher
compression ratio, at the expense of slower stored fields performance.
which uses {wikipedia}/Zstd[ZSTD] for a higher
compression ratio, at the expense of slower stored fields read performance.
If you are updating the compression type, the new one will be applied
after segments are merged. Segment merging can be forced using
<<indices-forcemerge,force merge>>. Experiments with indexing log datasets
have shown that `best_compression` gives up to ~18% lower storage usage in
the most ideal scenario compared to `default` while only minimally affecting
indexing throughput (~2%).
have shown that `best_compression` gives up to ~28% lower storage usage and
similar indexing throughput (sometimes a bit slower or faster depending on other used options) compared
to `default` while affecting get by id latencies between ~10% and ~33%. The higher get
by id latencies is not a concern for many use cases like logging or metrics, since
these don't really rely on get by id functionality (Get APIs or searching by _id).

[[index-mode-setting]] `index.mode`::
+
Expand Down
2 changes: 1 addition & 1 deletion docs/reference/query-dsl/knn-query.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -241,7 +241,7 @@ to <<nested-knn-search, top level nested kNN search>>:
* kNN search over nested dense_vectors diversifies the top results over
the top-level document
* `filter` over the top-level document metadata is supported and acts as a
post-filter
pre-filter
* `filter` over `nested` field metadata is not supported

A sample query can look like below:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,6 @@
char_filter:
- type: html_strip
escaped_tags: ["xxx", "yyy"]
read_ahead: 1024
- length: { tokens: 1 }
- match: { tokens.0.token: "\ntest<yyy>foo</yyy>\n" }

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -433,7 +433,7 @@ public MatchOnlyTextFieldType fieldType() {

@Override
protected SyntheticSourceSupport syntheticSourceSupport() {
var loader = new StringStoredFieldFieldLoader(fieldType().storedFieldNameForSyntheticSource(), leafName()) {
var loader = new StringStoredFieldFieldLoader(fieldType().storedFieldNameForSyntheticSource(), fieldType().name(), leafName()) {
@Override
protected void write(XContentBuilder b, Object value) throws IOException {
b.value((String) value);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -582,7 +582,7 @@ protected void write(XContentBuilder b, Object value) throws IOException {

var kwd = TextFieldMapper.SyntheticSourceHelper.getKeywordFieldMapperForSyntheticSource(this);
if (kwd != null) {
return new SyntheticSourceSupport.Native(kwd.syntheticFieldLoader(leafName()));
return new SyntheticSourceSupport.Native(kwd.syntheticFieldLoader(fullPath(), leafName()));
}

return super.syntheticSourceSupport();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -527,6 +527,9 @@ public String[] resolveNodes(String... nodes) {
* Returns the changes comparing this nodes to the provided nodes.
*/
public Delta delta(DiscoveryNodes other) {
if (this == other) {
return new Delta(this.masterNode, this.masterNode, localNodeId, List.of(), List.of());
}
final List<DiscoveryNode> removed = new ArrayList<>();
final List<DiscoveryNode> added = new ArrayList<>();
for (DiscoveryNode node : other) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -53,15 +53,11 @@ public CodecService(@Nullable MapperService mapperService, BigArrays bigArrays)
}
codecs.put(LEGACY_DEFAULT_CODEC, legacyBestSpeedCodec);

codecs.put(
BEST_COMPRESSION_CODEC,
new PerFieldMapperCodec(Zstd814StoredFieldsFormat.Mode.BEST_COMPRESSION, mapperService, bigArrays)
);
Codec legacyBestCompressionCodec = new LegacyPerFieldMapperCodec(Lucene99Codec.Mode.BEST_COMPRESSION, mapperService, bigArrays);
if (ZSTD_STORED_FIELDS_FEATURE_FLAG.isEnabled()) {
codecs.put(
BEST_COMPRESSION_CODEC,
new PerFieldMapperCodec(Zstd814StoredFieldsFormat.Mode.BEST_COMPRESSION, mapperService, bigArrays)
);
} else {
codecs.put(BEST_COMPRESSION_CODEC, legacyBestCompressionCodec);
}
codecs.put(LEGACY_BEST_COMPRESSION_CODEC, legacyBestCompressionCodec);

codecs.put(LUCENE_DEFAULT_CODEC, Codec.getDefault());
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1037,13 +1037,13 @@ protected SyntheticSourceSupport syntheticSourceSupport() {
}

if (fieldType.stored() || hasDocValues) {
return new SyntheticSourceSupport.Native(syntheticFieldLoader(leafName()));
return new SyntheticSourceSupport.Native(syntheticFieldLoader(fullPath(), leafName()));
}

return super.syntheticSourceSupport();
}

public SourceLoader.SyntheticFieldLoader syntheticFieldLoader(String simpleName) {
public SourceLoader.SyntheticFieldLoader syntheticFieldLoader(String fullFieldName, String leafFieldName) {
assert fieldType.stored() || hasDocValues;

var layers = new ArrayList<CompositeSyntheticFieldLoader.Layer>();
Expand Down Expand Up @@ -1081,6 +1081,6 @@ protected void writeValue(Object value, XContentBuilder b) throws IOException {
});
}

return new CompositeSyntheticFieldLoader(simpleName, fullPath(), layers);
return new CompositeSyntheticFieldLoader(leafFieldName, fullFieldName, layers);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -19,19 +19,25 @@
import static java.util.Collections.emptyList;

public abstract class StringStoredFieldFieldLoader implements SourceLoader.SyntheticFieldLoader {
private final String name;
private final String storedFieldLoaderName;
private final String fullName;
private final String simpleName;

private List<Object> values = emptyList();

public StringStoredFieldFieldLoader(String name, String simpleName) {
this.name = name;
public StringStoredFieldFieldLoader(String fullName, String simpleName) {
this(fullName, fullName, simpleName);
}

public StringStoredFieldFieldLoader(String storedFieldLoaderName, String fullName, String simpleName) {
this.storedFieldLoaderName = storedFieldLoaderName;
this.fullName = fullName;
this.simpleName = simpleName;
}

@Override
public final Stream<Map.Entry<String, StoredFieldLoader>> storedFieldLoaders() {
return Stream.of(Map.entry(name, newValues -> values = newValues));
return Stream.of(Map.entry(storedFieldLoaderName, newValues -> values = newValues));
}

@Override
Expand Down Expand Up @@ -72,6 +78,6 @@ public final DocValuesLoader docValuesLoader(LeafReader reader, int[] docIdsInLe

@Override
public String fieldName() {
return name;
return fullName;
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -1462,7 +1462,7 @@ protected void write(XContentBuilder b, Object value) throws IOException {

var kwd = SyntheticSourceHelper.getKeywordFieldMapperForSyntheticSource(this);
if (kwd != null) {
return new SyntheticSourceSupport.Native(kwd.syntheticFieldLoader(leafName()));
return new SyntheticSourceSupport.Native(kwd.syntheticFieldLoader(fullPath(), leafName()));
}

return super.syntheticSourceSupport();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,6 @@
public class CodecIntegrationTests extends ESSingleNodeTestCase {

public void testCanConfigureLegacySettings() {
assumeTrue("Only when zstd_stored_fields feature flag is enabled", CodecService.ZSTD_STORED_FIELDS_FEATURE_FLAG.isEnabled());

createIndex("index1", Settings.builder().put("index.codec", "legacy_default").build());
var codec = client().admin().indices().prepareGetSettings("index1").execute().actionGet().getSetting("index1", "index.codec");
assertThat(codec, equalTo("legacy_default"));
Expand All @@ -29,8 +27,6 @@ public void testCanConfigureLegacySettings() {
}

public void testDefaultCodecLogsdb() {
assumeTrue("Only when zstd_stored_fields feature flag is enabled", CodecService.ZSTD_STORED_FIELDS_FEATURE_FLAG.isEnabled());

var indexService = createIndex("index1", Settings.builder().put("index.mode", "logsdb").build());
var storedFieldsFormat = (Zstd814StoredFieldsFormat) indexService.getShard(0)
.getEngineOrNull()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,6 @@ public void testDefault() throws Exception {
}

public void testBestCompression() throws Exception {
assumeTrue("Only when zstd_stored_fields feature flag is enabled", CodecService.ZSTD_STORED_FIELDS_FEATURE_FLAG.isEnabled());
Codec codec = createCodecService().codec("best_compression");
assertEquals(
"Zstd814StoredFieldsFormat(compressionMode=ZSTD(level=3), chunkSize=245760, maxDocsPerChunk=2048, blockShift=10)",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
import org.elasticsearch.inference.SimilarityMeasure;
import org.elasticsearch.inference.TaskType;
import org.elasticsearch.rest.RestStatus;
import org.elasticsearch.xpack.inference.chunking.EmbeddingRequestChunker;
import org.elasticsearch.xpack.inference.external.action.alibabacloudsearch.AlibabaCloudSearchActionCreator;
import org.elasticsearch.xpack.inference.external.http.sender.DocumentsOnlyInput;
import org.elasticsearch.xpack.inference.external.http.sender.HttpRequestSender;
Expand All @@ -49,6 +50,7 @@
import static org.elasticsearch.xpack.inference.services.ServiceUtils.removeFromMapOrDefaultEmpty;
import static org.elasticsearch.xpack.inference.services.ServiceUtils.removeFromMapOrThrowIfNull;
import static org.elasticsearch.xpack.inference.services.ServiceUtils.throwIfNotEmptyMap;
import static org.elasticsearch.xpack.inference.services.alibabacloudsearch.AlibabaCloudSearchServiceFields.EMBEDDING_MAX_BATCH_SIZE;

public class AlibabaCloudSearchService extends SenderService {
public static final String NAME = AlibabaCloudSearchUtils.SERVICE_NAME;
Expand Down Expand Up @@ -253,7 +255,20 @@ protected void doChunkedInfer(
TimeValue timeout,
ActionListener<List<ChunkedInferenceServiceResults>> listener
) {
listener.onFailure(new ElasticsearchStatusException("Chunking not supported by the {} service", RestStatus.BAD_REQUEST, NAME));
if (model instanceof AlibabaCloudSearchModel == false) {
listener.onFailure(createInvalidModelException(model));
return;
}

AlibabaCloudSearchModel alibabaCloudSearchModel = (AlibabaCloudSearchModel) model;
var actionCreator = new AlibabaCloudSearchActionCreator(getSender(), getServiceComponents());

var batchedRequests = new EmbeddingRequestChunker(input, EMBEDDING_MAX_BATCH_SIZE, EmbeddingRequestChunker.EmbeddingType.FLOAT)
.batchRequestsWithListeners(listener);
for (var request : batchedRequests) {
var action = alibabaCloudSearchModel.accept(actionCreator, taskSettings, inputType);
action.execute(new DocumentsOnlyInput(request.batch().inputs()), timeout, request.listener());
}
}

/**
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License
* 2.0; you may not use this file except in compliance with the Elastic License
* 2.0.
*/

package org.elasticsearch.xpack.inference.services.alibabacloudsearch;

public class AlibabaCloudSearchServiceFields {
/**
* Taken from https://help.aliyun.com/zh/open-search/search-platform/developer-reference/text-embedding-api-details
*/
static final int EMBEDDING_MAX_BATCH_SIZE = 32;
}
Loading

0 comments on commit 34fae4f

Please sign in to comment.