Closed
Description
Following up from #44468: I run a git hosting service which has, in the past few weeks, received elevated levels of traffic from Google-owned IP addresses performing git clones. The shape of the requests is something like this:
74.125.182.164 - - [23/Feb/2021:00:41:34 +0000] "GET /~yoink00/zaplog/info/refs?service=git-upload-pack HTTP/2.0" 200 553 "-" "git/2.30.0" "-"
74.125.182.161 - - [23/Feb/2021:00:41:34 +0000] "GET /~yoink00/zaplog/info/refs?service=git-upload-pack HTTP/2.0" 200 553 "-" "git/2.30.0" "-"
74.125.182.161 - - [23/Feb/2021:00:41:34 +0000] "POST /~yoink00/zaplog/git-upload-pack HTTP/2.0" 200 56 "-" "git/2.30.0" "-"
74.125.182.161 - - [23/Feb/2021:00:41:34 +0000] "POST /~yoink00/zaplog/git-upload-pack HTTP/2.0" 200 9434 "-" "git/2.30.0" "-"
What is the purpose of this traffic?
If it's crawling, it should set an appropriate user-agent and respect robots.txt. The traffic is coming in at a rate which I would not consider reasonable for a crawler, up to several times per second - and git clones are more expensive than other HTTP requests.