-
Notifications
You must be signed in to change notification settings - Fork 1
Add BunnyCDN log format support for SciELO Brasil collection #54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@copilot Use o fixture disponível em https://github.com/pitangainnovare/scielo_usage_counter/blob/v0.6.1/tests/fixtures/usage.bunny.log a solução e parte dos testes. Lembre-se de que o repositório https://github.com/pitangainnovare/scielo_usage_counter/blob/v0.6.1 implementou uma solução para um sistema legado. Então, patterns e coisas a eles relacionados podem ser reaproveitadas, caso você julge pertinente. |
- Updated requirements.txt to use scielo_log_validator@0.5.1 - Added PATTERN_BUNNYCDN_LOG_FORMAT to values.py for pipe-delimited logs - Created opac_bunnynet.py translator (delegates to OPAC translator) - Updated log_handler.py to detect and parse bunnynet format - Added Unix timestamp date handling in format_date() - Enhanced match_with_best_pattern() to detect pipe-delimited format - Modified parse_line() to handle bunnynet-specific fields - Created test fixture usage.bunnynet.log with sample logs - Added comprehensive tests in test_opac_bunnynet.py and test_log_handler.py - All new tests passing (6/6) Co-authored-by: pitangainnovare <158627036+pitangainnovare@users.noreply.github.com>
Co-authored-by: pitangainnovare <158627036+pitangainnovare@users.noreply.github.com>
|
@copilot Look at this content formatted as bunnynet lines MISS 200 1755473649 29321 4339610 185.29.10.0 - http://www.scielo.br/j/rbb/a/qvkmfPDpQk4zZfSnWXJHrVQ/?lang=pt SE Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/111.0 5dc9f2b1416a10af31321b8aad30b8f4 SE |
SciELO Brasil apache logs now arrive in BunnyCDN format (pipe-delimited, Unix timestamps) for files dated after 2025-08-16. This adds parsing and translation support for the new format while maintaining backward compatibility.
Changes
Log Format Detection & Parsing
PATTERN_BUNNYCDN_LOG_FORMATtovalues.py- matches 12-field pipe-delimited format:CACHE|STATUS|UNIX_TS|BYTES|ZONE|IP|REFERER|URL|GEO|UA|REQ_ID|GEOlog_handler.pywith auto-detection (counts pipe delimiters) and Unix timestamp parsingURL Translation
BunnynetOPACBridgetranslator intranslator/opac_bunnynet.pyURLTranslatorOPACSitesince URL structure is identical between formatsTesting
tests/fixtures/usage.bunnynet.logwith representative samplestests/translator/test_opac_bunnynet.pyfor translator validationDependencies
scielo_log_validatorfrom 0.4.0 to 0.5.1Example bunnynet log line:
Warning
Firewall rules blocked me from connecting to one or more addresses (expand for details)
I tried to connect to the following addresses, but was blocked by firewall rules:
docs.bunny.net/home/REDACTED/work/_temp/ghcca-node/node/bin/node /home/REDACTED/work/_temp/ghcca-node/node/bin/node --enable-source-maps /home/REDACTED/work/_temp/copilot-developer-action-main/dist/index.js(dns block)https://api.github.com/repos/pitangainnovare/scielo_usage_counter/git/trees/impl/add-bunny-format/usr/bin/curl curl -s REDACTED 53 -j ACCEPT(http block)https://api.github.com/repos/scieloorg/scielo_log_validator/contents//usr/bin/curl curl -s REDACTED(http block)If you need me to access, download, or install something from one of these locations, you can either:
Original prompt
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.