Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Filebeat apache access vhost #12778

Merged
merged 21 commits into from
Sep 5, 2019

Conversation

wixaw
Copy link
Contributor

@wixaw wixaw commented Jul 4, 2019

Hello

It's a PR for add vhost name in document :)

It's linked with this issue : #4859

++

@wixaw wixaw requested review from a team as code owners July 4, 2019 08:04
@elasticmachine
Copy link
Collaborator

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

1 similar comment
@elasticmachine
Copy link
Collaborator

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

@wixaw wixaw changed the title asFilebeat apache access vhost Add Filebeat apache access vhost Jul 4, 2019
@wixaw
Copy link
Contributor Author

wixaw commented Aug 19, 2019

I think my PR has been forgotten? :) ( cc @jsoriano )

@jsoriano
Copy link
Member

@wixaw oops, sorry 😬 Let me take a look.

@jsoriano
Copy link
Member

ok to test

Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @wixaw for this PR, I have added some comments, let me know what you think.

There seems to be some conflicts with master, could you also update your branch?

@@ -4,6 +4,7 @@
"grok": {
"field": "message",
"patterns":[
"%{IPORHOST:http.vhost} %{IPORHOST:source.address} - %{DATA:user.name} \\[%{HTTPDATE:apache.access.time}\\] \"(?:%{WORD:http.request.method} %{DATA:url.original} HTTP/%{NUMBER:http.version}|-)?\" %{NUMBER:http.response.status_code:long} (?:%{NUMBER:http.response.body.bytes:long}|-)( \"%{DATA:http.request.referrer}\")?( \"%{DATA:user_agent.original}\")?",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To follow ECS, I think that the vhost should go to the source.domain.
And I wonder if in this case we can consider that the source address is always going to be an IP, then we could directly store it in source.ip.
Do you know of any case where both fields are domains?

Suggested change
"%{IPORHOST:http.vhost} %{IPORHOST:source.address} - %{DATA:user.name} \\[%{HTTPDATE:apache.access.time}\\] \"(?:%{WORD:http.request.method} %{DATA:url.original} HTTP/%{NUMBER:http.version}|-)?\" %{NUMBER:http.response.status_code:long} (?:%{NUMBER:http.response.body.bytes:long}|-)( \"%{DATA:http.request.referrer}\")?( \"%{DATA:user_agent.original}\")?",
"%{IPORHOST:source.domain} %{IPORHOST:source.ip} - %{DATA:user.name} \\[%{HTTPDATE:apache.access.time}\\] \"(?:%{WORD:http.request.method} %{DATA:url.original} HTTP/%{NUMBER:http.version}|-)?\" %{NUMBER:http.response.status_code:long} (?:%{NUMBER:http.response.body.bytes:long}|-)( \"%{DATA:http.request.referrer}\")?( \"%{DATA:user_agent.original}\")?",

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the second field is never a domain

CHANGELOG.next.asciidoc Outdated Show resolved Hide resolved
filebeat/docs/modules/apache.asciidoc Show resolved Hide resolved
filebeat/module/apache/access/test/test.log Outdated Show resolved Hide resolved
@wixaw
Copy link
Contributor Author

wixaw commented Aug 19, 2019

Hummm
I'm sorry, I do not understand why Travis and Jenkins growl. Maybe my branch update is broken ?

@wixaw
Copy link
Contributor Author

wixaw commented Sep 4, 2019

Thanks,
It is not easy !
My two clusters (test and production) are in https. I can not find anything in the PR to establish this connection
( https://github.com/elastic/beats/pull/3214/files/d3a4a7eb5271b8096fdf3d691c20627c61ba6521 )

I wanted to use "make start" but i have this error :/

Step 3/11 : RUN wget https://beats-nightlies.s3.amazonaws.com/filebeat/filebeat-6.0.0-alpha1-SNAPSHOT-linux-x86_64.tar.gz -O filebeat.tar.gz
 ---> Running in f01f3080cd08

--2019-09-04 08:56:29--  https://beats-nightlies.s3.amazonaws.com/filebeat/filebeat-6.0.0-alpha1-SNAPSHOT-linux-x86_64.tar.gz
Resolving beats-nightlies.s3.amazonaws.com (beats-nightlies.s3.amazonaws.com)... 52.216.88.187
Connecting to beats-nightlies.s3.amazonaws.com (beats-nightlies.s3.amazonaws.com)|52.216.88.187|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2019-09-04 08:56:30 ERROR 404: Not Found.

we can see in "Dockerfile-filebeat" that it is not up to date

wixaw and others added 4 commits September 4, 2019 11:39
Co-Authored-By: Jaime Soriano Pastor <jaime.soriano@elastic.co>
Co-Authored-By: Jaime Soriano Pastor <jaime.soriano@elastic.co>
Co-Authored-By: Jaime Soriano Pastor <jaime.soriano@elastic.co>
@jsoriano
Copy link
Member

jsoriano commented Sep 4, 2019

It is not easy !

Yeah sorry, I agree that this could be more intuitive 😬

My two clusters (test and production) are in https. I can not find anything in the PR to establish this connection

Please don't use a real cluster for this, it can remove pipelines and other things, a local elasticsearch is enough, you can start it as you would start any other local elasticsearch.

If you want to use the same version as used in CI, an option to start it is using docker is to run this from the filebeat directory:

make start-environment

And then use docker inspect to get the IP of the elasticsearch container.

jsoriano
jsoriano previously approved these changes Sep 4, 2019
Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Waiting for green

@Verdoso
Copy link

Verdoso commented Sep 4, 2019

A bit late to the party, but have you guys noticed that the name you are using for the new variable (the virtual host) is not really intuitive?
We will have now
source.domain -> This is not a domain, this the host name (one of them) of the SERVER
source.ip -> The IP of the CLIENT
source.address -> The host name, if available, of the CLIENT
So, all of the variables start with source. but the new one is a value from the server receiving the request, not the client generating the request as the other two. On top of that, the value is a host name, not a domain.
I fear the new name is going to confuse people more often than not.
My 2ec

@jsoriano
Copy link
Member

jsoriano commented Sep 4, 2019

@Verdoso thanks, actually this is a very valuable comment.

Regarding storing it as a hostname or as a domain, there have been many discussions about that in ECS, and at the end we use domain for most hostnames, except when identifying a host, that then host.name is used.

But it is true that using source for one of the server names is confusing

After some internal discussions we think that we should use destination.domain for these vhosts.

@wixaw would you like to make the change from source.domain to destination.domain?

@Verdoso
Copy link

Verdoso commented Sep 4, 2019

@jsoriano, thanks to you. If domain is used all over the place to mean hostnames, then at least it is consistent. destination.domain sounds fair enough.
Thanks!

@wixaw
Copy link
Contributor Author

wixaw commented Sep 5, 2019

Hello @jsoriano
I changed the field. :)
However, I still have a problem with the "tests/system/test_modules.py"
All tests are errored and in the list I do not have "test-vhost.log"

Exemple :

(python-env) 09:00:03 root@proxytest1:~/go/src/github.com/elastic/beats/filebeat# GENERATE=1 INTEGRATION_TESTS=1 ES_HOST=172.18.0.4 BEAT_STRICT_PERMS=false TESTING_FILEBEAT_MODULES=apache nosetests ./tests/system/test_modules.py
EEEEEEEE.
======================================================================
ERROR: test_fileset_file_0_apache (test_modules.Test)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/root/go/src/github.com/elastic/beats/filebeat/build/python-env/lib/python2.7/site-packages/parameterized/parameterized.py", line 392, in standalone_func
    return func(*(a + p.args), **p.kwargs)
  File "/root/go/src/github.com/elastic/beats/filebeat/tests/system/test_modules.py", line 106, in test_fileset_file
    cfgfile=cfgfile)
  File "/root/go/src/github.com/elastic/beats/filebeat/tests/system/test_modules.py", line 150, in run_on_file
    bufsize=0).wait()
  File "/usr/lib64/python2.7/subprocess.py", line 711, in __init__
    errread, errwrite)
  File "/usr/lib64/python2.7/subprocess.py", line 1327, in _execute_child
    raise child_exception
OSError: [Errno 2] No such file or directory
-------------------- >> begin captured stdout << ---------------------
Using elasticsearch: http://172.18.0.4:9200
Testing apache/access on /root/go/src/github.com/elastic/beats/filebeat/tests/system/../../module/apache/access/test/darwin-2.4.23.log

@wixaw
Copy link
Contributor Author

wixaw commented Sep 5, 2019

I do not understand this, I do not see any suggestion of modification
image

@jsoriano
Copy link
Member

jsoriano commented Sep 5, 2019

OSError: [Errno 2] No such file or directory

Oh sorry 🤦‍♂️ I forgot to mention that you also need a test binary to execute. You can build it with go test -c, it will be built as filebeat.test.

My Waiting for green comment meant to wait for green CI builds before merging 🙂

Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for renaming the field!

@jsoriano jsoriano added the v7.5.0 label Sep 5, 2019
@jsoriano
Copy link
Member

jsoriano commented Sep 5, 2019

Failures in CI are not related, merging this, thanks!

@jsoriano jsoriano merged commit 685f0ab into elastic:master Sep 5, 2019
@wixaw
Copy link
Contributor Author

wixaw commented Sep 5, 2019

yeah !!! thanks !

See you soon 😄

jsoriano added a commit that referenced this pull request Nov 20, 2019
It is common to modify the default log format in nginx to include
the http host used by the client to make the logged request.

This is similar to what we already did for Apache in #12778.

It also fixes an issue found on source address parsing when
source addresses are not IPs. For that, the pipeline works
with addresses as strings instead of IPs, and it only tries to
convert to IP to finally store the it in `source.ip` if it parses
as an IP.

Co-authored-by: poma <Semenov.Roman@mail.ru>
jsoriano added a commit to jsoriano/beats that referenced this pull request Nov 20, 2019
It is common to modify the default log format in nginx to include
the http host used by the client to make the logged request.

This is similar to what we already did for Apache in elastic#12778.

It also fixes an issue found on source address parsing when
source addresses are not IPs. For that, the pipeline works
with addresses as strings instead of IPs, and it only tries to
convert to IP to finally store the it in `source.ip` if it parses
as an IP.

Co-authored-by: poma <Semenov.Roman@mail.ru>
(cherry picked from commit 1967bc9)
jsoriano added a commit that referenced this pull request Nov 26, 2019
…ebeat module (#14654)

It is common to modify the default log format in nginx to include
the http host used by the client to make the logged request.

This is similar to what we already did for Apache in #12778.

It also fixes an issue found on source address parsing when
source addresses are not IPs. For that, the pipeline works
with addresses as strings instead of IPs, and it only tries to
convert to IP to finally store the it in `source.ip` if it parses
as an IP.

(cherry picked from commit 1967bc9)

Co-authored-by: poma <Semenov.Roman@mail.ru>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants