Skip to content

[BUG] geoip() function fails with IP type field reference #4468

@alexey-temnikov

Description

@alexey-temnikov

Query Information

PPL Command/Query:

source=opensearch_dashboards_sample_data_logs
| eval IPToCountry=geoip("my-datasource", clientip, "country_iso_code")

Expected Result:
The query should successfully execute and return geolocation data for each IP address in the clientip field, similar to when a literal IP string is provided.

Actual Result:

{
  "error": {
    "reason": "There was internal problem at backend",
    "details": "java.sql.SQLException: Error while preparing plan [LogicalSystemLimit(fetch=[10000], type=[QUERY_SIZE_LIMIT])\n  LogicalProject(referer=[$0], request=[$1], agent=[$2], extension=[$3], memory=[$4], ip=[$5], index=[$6], message=[$7], url=[$8], tags=[$9], geo=[$10], @timestamp=[$15], utc_time=[$16], bytes=[$17], machine=[$18], response=[$21], clientip=[$22], host=[$23], event=[$24], phpmemory=[$26], timestamp=[$27], IPToCountry=[GEOIP('my-datasource':VARCHAR, $22, 'country_iso_code':VARCHAR)])\n    CalciteLogicalIndexScan(table=[[OpenSearch, opensearch_dashboards_sample_data_logs]])\n]",
    "type": "RuntimeException"
  },
  "status": 500
}

Dataset Information

Dataset/Schema Type

  • Custom (OpenSearch Dashboards sample data)

Index Mapping

{
  "mappings": {
    "properties": {
      "clientip": { "type": "ip" },
      "host": { 
        "type": "text",
        "fields": {
          "keyword": { "type": "keyword" }
        }
      },
      "geo": {
        "properties": {
          "coordinates": { "type": "geo_point" },
          "dest": { "type": "keyword" },
          "src": { "type": "keyword" }
        }
      }
    }
  }
}

Sample Data

{
  "clientip": "223.87.60.27",
  "host": "artifacts.opensearch.org",
  "geo": {
    "srcdest": "IN:US",
    "src": "IN",
    "dest": "US",
    "coordinates": {
      "lat": 39.41042861,
      "lon": -88.8454325
    }
  }
}

Bug Description

Issue Summary:
The geoip() function fails when passed a field reference of type ip as the second parameter, but works correctly when passed a literal string IP address. This appears to be a type compatibility issue in the Calcite query planner where the IP type is not being accepted by the function's operand type checker which expects SqlTypeFamily.CHARACTER.

Steps to Reproduce:

  1. Load the OpenSearch Dashboards sample data (opensearch_dashboards_sample_data_logs)
  2. Configure a geospatial datasource (e.g., "my-datasource")
  3. Execute the failing query:
source=opensearch_dashboards_sample_data_logs
| eval IPToCountry=geoip("my-datasource", clientip, "country_iso_code")
  1. Observe the 500 error with "Error while preparing plan"

Comparison with Working Query:

source=opensearch_dashboards_sample_data_logs
| eval IPToCountry=geoip("my-datasource", "50.68.18.229", "country_iso_code")

This query succeeds because the literal string "50.68.18.229" is of type VARCHAR/CHARACTER.

Impact:
This bug prevents users from using the geoip() function with IP-typed fields in their indices, which is the primary use case for geolocation enrichment in log analysis and security analytics scenarios. Users must resort to explicit type casting as a workaround.

Environment Information

OpenSearch Version: 3.3.0-SNAPSHOT

Additional Details:

  • The issue is specific to the Calcite-based query engine
  • The clientip field has OpenSearch mapping type ip which maps to ExprCoreType.IP
  • The geoip function's operand type checker in GeoIpFunction.java only accepts SqlTypeFamily.CHARACTER types

Tentative Root Cause Analysis

Preliminary Analysis - Requires Verification

The root cause appears to be in /opensearch/src/main/java/org/opensearch/sql/opensearch/functions/GeoIpFunction.java at the getOperandMetadata() method (lines 68-74):

@Override
public UDFOperandMetadata getOperandMetadata() {
  return UDFOperandMetadata.wrap(
      (CompositeOperandTypeChecker)
          OperandTypes.CHARACTER_CHARACTER.or(
              OperandTypes.family(
                  SqlTypeFamily.CHARACTER, SqlTypeFamily.CHARACTER, SqlTypeFamily.CHARACTER)));
}

The operand type checker restricts the second parameter to SqlTypeFamily.CHARACTER types only. However, when a field of OpenSearch type ip is referenced, it is represented as ExprIPType in Calcite, which extends ExprJavaType and is backed by the ExprIpValue Java class. This IP type does not belong to the CHARACTER type family, causing the query planner to fail during type validation.

The IP type in OpenSearch is defined as:

  • OpenSearch mapping: "ip"ExprCoreType.IP (defined as IP(STRING) in ExprCoreType)
  • Calcite representation: ExprIPType (a custom UDT extending ExprJavaType)

While ExprCoreType.IP has STRING as its base type, the Calcite type system doesn't automatically recognize ExprIPType as belonging to SqlTypeFamily.CHARACTER.

Tentative Proposed Fix

Tentative and Preliminary Solution - Requires Verification:

Modify the getOperandMetadata() method in GeoIpFunction.java to accept IP types in addition to CHARACTER types:

@Override
public UDFOperandMetadata getOperandMetadata() {
  return UDFOperandMetadata.wrap(
      (CompositeOperandTypeChecker)
          OperandTypes.CHARACTER_CHARACTER.or(
              OperandTypes.family(
                  SqlTypeFamily.CHARACTER, SqlTypeFamily.CHARACTER, SqlTypeFamily.CHARACTER))
          .or(
              OperandTypes.family(
                  SqlTypeFamily.CHARACTER, SqlTypeFamily.ANY, SqlTypeFamily.CHARACTER)));
}

Alternatively, implement a custom operand type checker that explicitly accepts both CHARACTER and IP types for the second parameter, similar to how other IP-related functions like cidrmatch might handle this.

The implementation should also ensure that the fetchIpEnrichment method can handle ExprIpValue objects and convert them to strings appropriately.

Workaround

Users can work around this issue by explicitly casting the IP field to STRING:

source=opensearch_dashboards_sample_data_logs
| eval ip_str=CAST(clientip AS STRING)
| eval IPToCountry=geoip("my-datasource", ip_str, "country_iso_code")

This workaround successfully executes and returns the expected geolocation data.

Metadata

Metadata

Assignees

Labels

PPLPiped processing languagebugSomething isn't working

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions