Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PPL geoip function #781

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

PPL geoip function #781

wants to merge 1 commit into from

Conversation

salyh
Copy link
Contributor

@salyh salyh commented Oct 16, 2024

Description

PPL geoip function

Issues Resolved

#672

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Hendrik Saly <hendrik.saly@eliatra.com>
@salyh
Copy link
Contributor Author

salyh commented Oct 16, 2024

@YANG-DB @penghuo @lukasz-soszynski-eliatra @kt-eliatra @dr-lilienthal PPL geoip function syntax proposal

Please review and comment if needed

@YANG-DB
Copy link
Member

YANG-DB commented Oct 16, 2024

@salyh thanks
on my end looking good - only question is the datasource definition - how does the user creates a new datasource, can/should the datasource` be one of the following:

  • external API
  • table
  • file
    Lets add these capabilities ...

@YANG-DB
Copy link
Member

YANG-DB commented Oct 16, 2024

@LantaoJin @penghuo @dai-chen please add u'r comments here

@LantaoJin LantaoJin added Lang:PPL Pipe Processing Language support 0.6 labels Oct 17, 2024
@salyh
Copy link
Contributor Author

salyh commented Oct 17, 2024

@salyh thanks on my end looking good - only question is the datasource definition - how does the user creates a new datasource, can/should the datasource` be one of the following:

* external API

* table

* file
  Lets add these capabilities ...

@YANG-DB @LantaoJin @penghuo @dai-chen

datasource refers to the datasources provided by the ip2geo processor https://opensearch.org/docs/latest/ingest-pipelines/processors/ip2geo/

My idea was to leveage the already present capabilities of the ip2geo processor and extend it as necessary. Because it would avoid code duplication. When I read the docs of the processor correct, GeoLite2 is supported by now. I propose to add the above requested functionality to the processor. Then we try to use the code from the processor (as a library) to solve this issue. Otherwise we would reinvent the wheel.

And there is also https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/geoip/ in data prepper which could be facilitated

@YANG-DB YANG-DB self-requested a review October 17, 2024 21:58
@YANG-DB
Copy link
Member

YANG-DB commented Oct 17, 2024

@salyh thanks on my end looking good - only question is the datasource definition - how does the user creates a new datasource, can/should the datasource` be one of the following:

* external API

* table

* file
  Lets add these capabilities ...

@YANG-DB @LantaoJin @penghuo @dai-chen

datasource refers to the datasources provided by the ip2geo processor https://opensearch.org/docs/latest/ingest-pipelines/processors/ip2geo/

My idea was to leveage the already present capabilities of the ip2geo processor and extend it as necessary. Because it would avoid code duplication. When I read the docs of the processor correct, GeoLite2 is supported by now. I propose to add the above requested functionality to the processor. Then we try to use the code from the processor (as a library) to solve this issue. Otherwise we would reinvent the wheel.

And there is also https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/geoip/ in data prepper which could be facilitated

@salyh this sounds like a good idea !
@LantaoJin @penghuo - any comments ?

- See https://opensearch.org/docs/latest/ingest-pipelines/processors/ip2geo/


### New syntax definition in ANTLR
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a user doc, could we remove the ANTLR related information, what here needs is just a user readable syntax definition.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, the file is located ppl-lang/planning folder. I think we need an user doc instead of design doc. We can copy this design information to the PR description or issue description. And you can easily update your design without any code changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@YANG-DB I did unterstand that the ppl-lang/planning folder is for design docs. When the design is approved it will be converted to a userdoc and moved out of the planning sub-folder. Am I wrong with that assumption?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it was for roadmap. I prefer to using GitHub PR description and comments. It's fine to add design docs in code if you prefer to. But these docs are not for users, we need to write a new one later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@salyh yes - once design doc is approved we will move it with necessary updates to the regular commands folder

geoip function to add information about the geographical location of an IPv4 or IPv6 address

1. **Proposed syntax**
- `... | eval geoinfo = geoip([datasource,] ipAddress [,properties])`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does datasource mean? Could you use other meaningful usage rather that abc? abc is not straightforward user case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

datasource refers to the datasources provided by the ip2geo processor, see #781 (comment)

@@ -781,6 +782,10 @@ coalesceFunctionName
: COALESCE
;

geoipFunction
: GEOIP LT_PRTHS (datasource = functionArg COMMA)? ipAddress = functionArg (COMMA properties = stringLiteral)? RT_PRTHS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

datasource = functionArg

Any case for supporting functionArg instead of stringLiteral. Again, what does datasource mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no particular case for functionArg. We can change it to stringLiteral for sure

datasource refers to the datasources provided by the ip2geo processor, see #781 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.6 Lang:PPL Pipe Processing Language support
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants