Skip to content

Conversation

Copy link

Copilot AI commented Feb 6, 2026

Extends URL translation pipeline to support SciELO Books platform logs for COUNTER R5 metrics computation. Implementation based on actual production log samples (Apache and Bunnynet formats). All code uses English naming conventions for methods, variables, and comments.

Implementation

New translator class (translator/books.py)

  • URLTranslatorBooksSite handles book and chapter URL patterns from production logs
  • Generates PIDs in format BOOK:{id}/CHAPTER:{id} for hierarchical content
  • Uses DEFAULT_SCIELO_ISSN (0000-0000) since books lack ISSNs
  • Intelligent chapter extraction from PDF filenames (e.g., author-ISBN-18.pdf → chapter 18)

Supported URL patterns (from production logs):

/id/{book_id}                              → abstract (investigations only)
/id/{book_id}/{chapter_number}             → full_text (requests + investigations)
/id/{book_id}/pdf/{filename}.pdf           → full_text (requests + investigations)

Examples from actual logs:

  • /id/q7gtd - Book landing page
  • /id/vdywc/03 - Chapter 03 page
  • /id/y742k/pdf/magalhaes-9788578791889-18.pdf - Chapter 18 PDF
  • /id/82r9t/pdf/sadek-9788579820342.pdf - Full book PDF (no chapter)

Integration (url_translator.py)

  • Added PATTERNS_BOOKS_SITE with 3 regex patterns matching production URLs
  • Prioritized before Classic Site patterns to avoid /pdf/ conflicts
  • Registered in translator identification pipeline alongside Classic, OPAC, Preprints, and Dataverse

Tests

  • 24 tests total (13 translator + 2 basic COUNTER + 9 comprehensive access counting)
  • Test coverage: ID extraction, PID generation, format detection, content classification, PDF chapter extraction
  • Updated fixtures with actual Apache log entries from production
  • New comprehensive access counting tests demonstrating real-world scenarios

Smart Chapter Detection

  • Extracts chapter numbers from PDF filenames using pattern: author-ISBN-CHAPTER.pdf
  • Handles both chapter-specific PDFs and full book PDFs
  • Examples:
    • magalhaes-9788578791889-18.pdf → Book: y742k, Chapter: 18
    • sadek-9788579820342.pdf → Book: 82r9t, Chapter: None (full book)

Comprehensive Access Counting Tests

New test class TestBooksAccessCounting with 9 concrete scenarios:

  1. Book Landing Page - Investigation only (no Request)
  2. Chapter HTML Page - Both Request and Investigation
  3. Chapter PDF Download - Both Request and Investigation
  4. Full Book PDF - Both Request and Investigation
  5. Multiple Accesses Same Book - Unique vs Total counting
  6. Double-Click Deduplication - 30-second rule (valid clicks)
  7. Rapid Click Filtering - 30-second rule (filtered clicks)
  8. Different Chapters Separate - Independent tracking per chapter
  9. Real-World Mixed Accesses - Multiple users, books, and chapters

Demonstration Script (tests/test_books_access_examples.py):

  • Interactive script showing 6 concrete examples with real URLs
  • Visual output demonstrating how each access is counted
  • Clear distinction between Requests (full-text) and Investigations (metadata)
  • Shows COUNTER R5 deduplication rules in action
  • Summary of counting rules

Run demo: PYTHONPATH=. python tests/test_books_access_examples.py

Access Counting Rules

INVESTIGATIONS ONLY (metadata/abstract view):

  • Book landing page: /id/{book_id} → 0 Requests, 1 Investigation

REQUESTS + INVESTIGATIONS (full-text access):

  • Chapter HTML page: /id/{book_id}/{chapter_number} → 1 Request, 1 Investigation
  • Chapter PDF: /id/{book_id}/pdf/filename-{chapter}.pdf → 1 Request, 1 Investigation
  • Full book PDF: /id/{book_id}/pdf/filename.pdf → 1 Request, 1 Investigation

DEDUPLICATION:

  • Clicks within 30 seconds are filtered (COUNTER R5 rule)
  • Each book/chapter combination tracked separately

Log Format Support

Apache logs: Standard NCSA format (already supported by log_handler)
Bunnynet logs: Pipe-delimited format can be handled when PR #54 is merged

Example Usage

from scielo_usage_counter.url_translator import URLTranslationManager

tm = URLTranslationManager(journals_metadata, articles_metadata)
result = tm.translate('/id/y742k/pdf/magalhaes-9788578791889-18.pdf')

# Returns:
# {
#   'pid_generic': 'BOOK:Y742K/CHAPTER:18',
#   'book_id': 'y742k',
#   'chapter_id': '18',
#   'content_type': 'full_text',
#   'media_format': 'pdf',
#   'scielo_issn': '0000-0000',
#   ...
# }
Original prompt

This section details on the original issue you should resolve

<issue_title>Adicionar suporte a logs originários do SciELO Livros</issue_title>
<issue_description>### Descrição da nova funcionalidade
O site SciELO Livros passará a ter acessos computados por meio da abordagem COUNTER R5. Este repositório é uma forma de ler arquivos de logs apache (ou bunnynet, que é outro issue em resolução) e transformá-los em unidades contáveis. Parte da estratégia para converter é usar classes translate, que, neste momento, suporta os logs originários do SciELO Classic, SciELO novo (OPAC), SciELO novo em versão alpha (SciELO Saúde Pública), SciELO Preprints e SciELO Data.

Critérios de aceitação

  • Implementar a classe translate para SciELO Livros
  • Criar testes para a nova classe translate
  • Adotar fixtures (logs do SciELO LivroS) oficiais a serem compartilhados pelo usuário @pitangainnovare
  • Criar testes específicos para SciELO Livros para a calculadora COUNTER, em counter.py

Anexos

  • Módulo principal é o arquivo url_translator.py
  • Em translate há as classes de tradução para outros sites
  • Módulos utilitários devem ser reaproveitados e estão em utils

Referências

N/A</issue_description>

<agent_instructions>Follow coding best practices. Generate tests for all created methods using official fixtures. Add clear, concise English docstrings for every generated function or class. Keep code concise, non-repetitive, and well-organized. Match the style of the existing repository unless it conflicts with best practices.</agent_instructions>

Comments on the Issue (you are @copilot in this section)


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

@pitangainnovare
Copy link
Contributor

@copilot, I will provide fixtures in the format of SciELO Books logs. You will likely need to wait for these resources, but you can start working with the knowledge you already have. Crawling the SciELO Books site and understanding which links should be classified as Item Request or Item Investigation is a good task. The goal is to count accesses for books and book chapters according to COUNTER R5 (Unique Item Requests, Unique Item Investigations, Total Item Requests, and Total Item Investigations). Each URL in the logs should therefore be mapped to a specific Book or Chapter object and, depending on the page, marked as Request or Investigation. See https://cop5.countermetrics.org/en/5.0.3/03-specifications/03-counter-report-common-attributes-and-elements.html for more information.

@pitangainnovare
Copy link
Contributor

@copilot Adote nomes de métodos, variáveis e comentários em idioma inglês.

Copilot AI and others added 3 commits February 6, 2026 01:01
Co-authored-by: pitangainnovare <158627036+pitangainnovare@users.noreply.github.com>
…arity

Co-authored-by: pitangainnovare <158627036+pitangainnovare@users.noreply.github.com>
Co-authored-by: pitangainnovare <158627036+pitangainnovare@users.noreply.github.com>
Copilot AI changed the title [WIP] Add support for logs from SciELO Livros Add SciELO Livros (Books) translator for COUNTER R5 metrics Feb 6, 2026
Copilot AI requested a review from pitangainnovare February 6, 2026 01:13
Copy link
Author

Copilot AI commented Feb 6, 2026

@copilot Adote nomes de métodos, variáveis e comentários em idioma inglês.

Understood. I'll rename all Portuguese terms to English. This includes:

  • Class name: URLTranslatorLivrosSiteURLTranslatorBooksSite
  • Pattern names: REGEX_LIVROS_SITE_*REGEX_BOOKS_SITE_* and PATTERNS_LIVROS_SITEPATTERNS_BOOKS_SITE
  • File name: livros.pybooks.py
  • Test class: TestTranslatorLivrosTestTranslatorBooks
  • Test file: test_livros.pytest_books.py

Will keep "SciELO Livros" only in comments where referring to the actual product name.

@copilot, I will provide fixtures in the format of SciELO Books logs. You will likely need to wait for these resources, but you can start working with the knowledge you already have. Crawling the SciE...

The implementation follows COUNTER R5 specifications for Item Requests and Item Investigations:

Item Investigations (abstract content):

  • Book landing pages: /b/{book_id}, /book/{book_id}

Item Requests (full-text content):

  • Book chapters: /c/{book_id}/{chapter_id}, /chapter/{book_id}/{chapter_id}
  • PDF downloads: /pdf/{book_id}[/{chapter_id}]
  • EPUB downloads: /epub/{book_id}
  • Generic downloads: /download/{book_id}[/{chapter_id}]

Each URL is mapped to a book or chapter PID (e.g., BOOK:BOOK001 or BOOK:BOOK001/CHAPTER:CHAP01). The translator computes all four metrics (Unique/Total Item Requests and Unique/Total Item Investigations) according to COUNTER R5 standards.

Ready to integrate official fixtures when available. Commit: 2fd6f0e

Co-authored-by: pitangainnovare <158627036+pitangainnovare@users.noreply.github.com>
Copilot AI changed the title Add SciELO Livros (Books) translator for COUNTER R5 metrics Add SciELO Books translator for COUNTER R5 metrics Feb 6, 2026
@pitangainnovare
Copy link
Contributor

@copilot Tenho amostras de logs do SciELO Books. São em dois formatos: Bunnynet e Apache. Seguem linhas dos dois exemplares:

Linhas em formato bunnynet
MISS|200|1767562641|3975|4384504|2804:52a4:780:f03b::|https://books.scielo.org/id/vdywc/03|https://books.scielo.org/id/vdywc/03|BR|Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/143.0.0.0 Safari/537.36|3cfdeef1939e1a9edfd5d6b945446f6c|BR
MISS|200|1767565867|8800423|4384504|23.97.109.0|-|https://books.scielo.org/id/47zrm/pdf/torres-9789978106815.pdf|BR|Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot|78b7cce6e3df9aa1623d1a2b28c70ef1|BR
MISS|200|1767558028|947834|4384504|66.249.74.0|-|https://books.scielo.org/id/bm9qp/pdf/machado-9788575412695-12.pdf|IL|Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/141.0.7390.122 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)|f3e494cbc0957f0a6703ae364ed357d4|US
MISS|200|1767558027|3599037|4384504|66.249.74.0|-|https://books.scielo.org/id/3hwrh/pdf/lourenco-9788570785169-04.pdf|IL|Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/141.0.7390.122 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)|5242a66af8f4f610f299723fd1d5a0b5|US
HIT|200|1767558021|300188|4384504|66.249.74.0|-|https://books.scielo.org/id/rmgfg/pdf/massi-9788568334560-05.pdf|IL|Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/141.0.7390.122 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)|0ddf1196ae1361dea4fc90dda91a2532|US
MISS|200|1767560235|807|4384504|3.18.169.0|-|http://books.scielo.org/id/mc9xp/pdf/tapias-9789587385434-23.pdf|IL|axios/1.13.2|ffa407e14619f09babcc42a0538868ee|US
HIT|302|1767549011|789|4384504|191.235.99.0|-|https://books.scielo.org/id/mqfvk/pdf/leote-9788568334652.pdf|BR|Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot|73f879053b3271e9b2a27530543f5314|BR
MISS|200|1767545900|315173|4384504|23.97.109.0|-|https://books.scielo.org/id/6pdyn/pdf/sousa-9788578791247-04.pdf|BR|Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot|0347b8a2b5e019b430e0696054ef0bf0|BR
HIT|302|1767545899|792|4384504|23.97.109.0|-|https://books.scielo.org/id/qfgmr/pdf/bonilla-9788523212063.pdf|BR|Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot|539fcd67cf3e38f1848d00b3188505ad|BR
HIT|200|1767566982|1086|4384504|179.106.178.0|https://books.scielo.org/id/pz254/pdf/rabelo-9788575412664-06.pdf|https://books.scielo.org/favicon.ico|BR|Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/142.0.0.0 Safari/537.36|2e2d9472bf8c3ae515af6dc0fb3bf865|BR
MISS|200|1767566982|1259781|4384504|179.106.178.0|https://www.google.com/|https://books.scielo.org/id/pz254/pdf/rabelo-9788575412664-06.pdf|BR|Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/142.0.0.0 Safari/537.36|5580ebfdaacd2a024a0eabd49c4b3266|BR
HIT|200|1767548728|19103|4384504|64.251.196.0|-|http://books.scielo.org/|IL|Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.67 Safari/537.36|30ea19ffaebfb5e9cdd7ee6d36537f4f|US
MISS|200|1767567550|899360|4384504|72.208.151.0|-|https://books.scielo.org/id/5dcq3/pdf/alavarce-9788579830259-04.pdf|LA|Mozilla/5.0 (Linux; Android 3.2.2) AppleWebKit/532.1 (KHTML, like Gecko) Chrome/32.0.835.0 Safari/532.1|367006231d920aae0776578f244b2e3c|US
MISS|200|1767551051|4110503|4384504|74.7.36.0|-|https://books.scielo.org/id/n6jhb/pdf/abello-9789587845044-04.pdf|IL|Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot|ca2d89bbc8abdf76667f7c4035817a4d|US
MISS|200|1767553401|10516437|4384504|191.235.66.0|-|https://books.scielo.org/id/7f83x/pdf/porto-9788575415993-11.pdf|BR|Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot|bdcc91624064af605863b5fddee39e06|BR
MISS|200|1767544058|4333|4384504|52.167.144.0|-|https://books.scielo.org/id/mj4jm/11|LA|Mozilla/5.0 (iPhone; CPU iPhone OS 7_0 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) Version/7.0 Mobile/11A465 Safari/9537.53 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)|00fdbf4838971fb972353c3196e046dd|US
MISS|200|1767548911|10848130|4384504|51.8.155.0|-|https://books.scielo.org/id/5pk6k/pdf/salgado-9789978106822-17.pdf|IL|Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot|5723b10283301bd33307ddecc104402a|US
MISS|200|1767548910|10864874|4384504|51.8.155.0|-|https://books.scielo.org/id/89vk3/pdf/cabrera-9786287548633.pdf|IL|Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot|84dc173b2a05dac986fffd102217f4d6|US
MISS|200|1767548910|4044384|4384504|51.8.155.0|-|https://books.scielo.org/id/955tc/pdf/cortes-9786289572506.pdf|IL|Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot|d094072d677946054fd7fc8327e1a197|US
HIT|200|1767548908|255580|4384504|66.249.74.0|-|https://books.scielo.org/id/bywgd/pdf/mazzola-9788579836718-05.pdf|IL|Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/141.0.7390.122 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)|829e576dc04951b8c3376eb6e9aa041b|US
HIT|200|1767548908|8740453|4384504|51.8.155.0|-|https://books.scielo.org/id/w9k2y/pdf/behar-9786287501751.pdf|IL|Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot|055508b7860ddd82745d617cd0622de3|US
MISS|200|1767565463|2755803|4384504|80.79.6.0|-|https://books.scielo.org/id/mj4jm/pdf/rocha-9786559543830.pdf|FR|Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.1958|eafd37ac1993ecde4a4cd79aea0c928c|NL
MISS|200|1767551694|22121|4384504|169.150.236.0|http://books.scielo.org/|https://books.scielo.org/wp-content/uploads/2025/05/eugenia_site_pt-800x418.png|IL|Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/143.0.0.0 Safari/537.36|ee2f362f3bf8dd5e052d18ed7e4b4a24|US
MISS|200|1767556636|23841|4384504|2001:41d0:203:9596::|-|http://books.scielo.org/id/96y5f|FR|GuzzleHttp/7|6798322ca8f8354b263cb02b03fb60c6|FR
MISS|200|1767565701|18745|4384504|15.229.52.0|-|http://books.scielo.org/|BR|Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.67 Safari/537.36|6cdb8795da6a22e15738f128891c6d46|BR
MISS|302|1767562905|862|4384504|191.235.66.0|-|https://books.scielo.org/id/8xvf4/pdf/almeida-9788574554402.pdf|BR|Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot|a11badde28767360be734f5e4117f228|BR
HIT|200|1767562905|2132|4384504|45.234.65.0|https://blog.scielo.org/|https://books.scielo.org/wp-content/uploads/2024/09/mastodon_26x26.png|BR|Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/143.0.0.0 Safari/537.36|4f2f8dc3f528be56c511f78e24484676|BR
HIT|200|1767562905|2168|4384504|45.234.65.0|https://blog.scielo.org/|https://books.scielo.org/wp-content/uploads/2024/09/bluesky_26x26.png|BR|Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/143.0.0.0 Safari/537.36|58678ac8855e66820650d94578f58907|BR
MISS|200|1767560815|10517870|4384504|191.235.66.0|-|https://books.scielo.org/id/7f83x/pdf/porto-9788575415993-11.pdf|BR|Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot|800c5a63178a17a9cef5eced5f02b733|BR
MISS|200|1767534834|3798672|4384504|52.167.144.0|-|https://books.scielo.org/id/3hwrh/pdf/lourenco-9788570785169-06.pdf|LA|Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/116.0.1938.76 Safari/537.36|d58552b8b0fd4cb50a7702af914feabe|US

Linhas em formato Apache
45.229.23.28 - - [20/May/2023:23:59:49 -0300] "GET /id/y742k/pdf/magalhaes-9788578791889-18.pdf HTTP/2.0" 200 342341 "https://www.google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36"
45.229.23.28 - - [20/May/2023:23:59:50 -0300] "GET /favicon.ico HTTP/2.0" 200 894 "https://books.scielo.org/id/y742k/pdf/magalhaes-9788578791889-18.pdf" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36"
66.249.72.48 - - [20/May/2023:23:59:50 -0300] "GET /id/82r9t/pdf/sadek-9788579820342.pdf HTTP/1.1" 302 239 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.5672.126 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
179.0.70.35 - - [20/May/2023:23:59:53 -0300] "GET /id/5v9s3/pdf/rivera-9788575413036.pdf HTTP/2.0" 200 1873485 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36"
192.168.2.230 - - [20/May/2023:23:59:56 -0300] "GET /api/v1/changes/?since=103544 HTTP/1.1" 200 35 "-" "python-requests/2.2.1 CPython/2.7.14 Linux/3.10.0-1160.81.1.el7.x86_64"
192.168.169.125 - - [20/May/2023:23:59:56 -0300] "GET / HTTP/1.1" 200 14129 "-" "Zabbix"
179.0.70.35 - - [21/May/2023:00:00:13 -0300] "GET /id/sq6d8/pdf/deslandes-9788575413296.pdf HTTP/2.0" 200 1073438 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36"
181.67.82.230 - - [21/May/2023:00:00:16 -0300] "GET /favicon.ico HTTP/2.0" 200 894 "https://books.scielo.org/id/yjxdq/pdf/mororo-9788574554938-01.pdf" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36"
181.67.82.230 - - [21/May/2023:00:00:17 -0300] "GET /id/yjxdq/pdf/mororo-9788574554938-01.pdf HTTP/2.0" 200 408451 "https://www.google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36"
69.162.124.230 - - [21/May/2023:00:00:18 -0300] "HEAD / HTTP/1.1" 200 0 "http://books.scielo.org" "Mozilla/5.0+(compatible; UptimeRobot/2.0; http://www.uptimerobot.com/)"
201.8.90.180 - - [21/May/2023:00:00:25 -0300] "GET /id/q7gtd HTTP/2.0" 200 4298 "https://www.google.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36"
192.168.1.96 - - [21/May/2023:00:00:26 -0300] "POST /wp-cron.php?doing_wp_cron=1684637990.0746459960937500000000 HTTP/1.1" 200 0 "-" "WordPress/6.2.1; https://books.scielo.org"
152.70.222.62 - - [21/May/2023:00:00:26 -0300] "GET /feed HTTP/1.1" 301 0 "-" "Mozilla/5.0 (X11; Linux i686) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.65 Safari/535.11"
152.70.222.62 - - [21/May/2023:00:00:26 -0300] "GET /feed/ HTTP/1.1" 200 8469 "-" "Mozilla/5.0 (X11; Linux i686) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.65 Safari/535.11"
152.70.222.62 - - [21/May/2023:00:00:28 -0300] "GET / HTTP/1.1" 200 14129 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)"
187.19.244.135 - - [21/May/2023:00:00:28 -0300] "GET /id/4ndgv HTTP/2.0" 200 4112 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36"
192.168.1.96 - - [21/May/2023:00:00:39 -0300] "GET /search/index.php?output=rss&site=livros&col=&lang=en&sort=publication_date+desc HTTP/1.1" 200 34222 "-" "WordPress/6.2.1; https://books.scielo.org"
192.168.169.125 - - [21/May/2023:00:00:39 -0300] "GET / HTTP/1.1" 200 14129 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.104 Safari/537.36"
192.168.169.125 - suporte.scielo [21/May/2023:00:00:40 -0300] "GET /id/gbvb4 HTTP/1.1" 200 3934 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.104 Safari/537.36"
192.168.169.125 - suporte.scielo [21/May/2023:00:00:40 -0300] "GET /id/jnkmq HTTP/1.1" 200 4680 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.104 Safari/537.36"
192.168.169.125 - suporte.scielo [21/May/2023:00:00:40 -0300] "GET /eduepb/ HTTP/1.1" 302 0 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.104 Safari/537.36"
192.168.169.125 - - [21/May/2023:00:00:41 -0300] "GET / HTTP/1.1" 200 14129 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.104 Safari/537.36"
187.19.244.135 - - [21/May/2023:00:00:42 -0300] "GET /id/4ndgv/pdf/paim-9788575413593-05.pdf HTTP/2.0" 200 195911 "https://books.scielo.org/id/4ndgv" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36"
192.168.169.125 - suporte.scielo [21/May/2023:00:00:43 -0300] "GET /search/?lang=pt&index=tw&q=&filter=&submit=Pesquisar&sort=publication_date+desc HTTP/1.1" 200 30408 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.104 Safari/537.36"
192.168.169.125 - - [21/May/2023:00:00:43 -0300] "GET / HTTP/1.1" 200 14129 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.104 Safari/537.36"
192.168.169.125 - - [21/May/2023:00:00:44 -0300] "GET / HTTP/1.1" 200 14129 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.104 Safari/537.36"
192.168.169.125 - suporte.scielo [21/May/2023:00:00:45 -0300] "GET /id/gbvb4 HTTP/1.1" 200 3934 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.104 Safari/537.36"
192.168.169.125 - suporte.scielo [21/May/2023:00:00:45 -0300] "GET /id/jnkmq HTTP/1.1" 200 4680 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.104 Safari/537.36"
192.168.169.125 - suporte.scielo [21/May/2023:00:00:45 -0300] "GET /eduepb/ HTTP/1.1" 302 0 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.104 Safari/537.36"
192.168.169.125 - - [21/May/2023:00:00:46 -0300] "GET / HTTP/1.1" 200 14129 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.104 Safari/537.36"

Por favor, ajuste o PR para acomodar esses dados (precisaria criar um pattern para bunnynet e outro que creio já existir). O PR #54 implementa a questão do formato bunnynet, porém, no contexto de artigos.

…rom logs

Co-authored-by: pitangainnovare <158627036+pitangainnovare@users.noreply.github.com>
Copy link
Author

Copilot AI commented Feb 9, 2026

@copilot Tenho amostras de logs do SciELO Books. São em dois formatos: Bunnynet e Apache. Seguem linhas dos dois exemplares:

Linhas em formato bunnynet
...

Updated the translator to support actual SciELO Books URL patterns from the log samples. The implementation now handles:

Apache log format (already supported - no special parser needed, uses standard Apache NCSA format)

URL patterns from production logs:

  1. /id/{book_id} - Book landing page (abstract/investigations)
  2. /id/{book_id}/{chapter_number} - Chapter page (full_text/requests)
  3. /id/{book_id}/pdf/{filename}.pdf - PDF download (full_text/requests)

Smart chapter extraction from PDF filenames:

  • Detects pattern: author-ISBN-CHAPTER.pdf
  • Examples:
    • magalhaes-9788578791889-18.pdf → chapter 18
    • mororo-9788574554938-01.pdf → chapter 01
    • sadek-9788579820342.pdf → no chapter (full book PDF)

Bunnynet format: The log_handler already parses Apache-style logs. For Bunnynet-specific parsing (pipe-delimited format as shown in your samples), PR #54's approach can be adapted when that's merged.

All URL patterns validated with actual log samples. Tests updated with real URLs. Commit: 0061d00

Copilot AI changed the title Add SciELO Books translator for COUNTER R5 metrics Add SciELO Books translator for COUNTER R5 metrics with production URL support Feb 9, 2026
@pitangainnovare
Copy link
Contributor

@copilot Por favor, crie ou adeque os testes para detectar acessos no SciELO Books. A meta é contar os acessos reais, por livro e por capítulo de livro, com exemplos concretos, nos dois moldes (request ou investigation).

…examples

Co-authored-by: pitangainnovare <158627036+pitangainnovare@users.noreply.github.com>
Copy link
Author

Copilot AI commented Feb 9, 2026

@copilot Por favor, crie ou adeque os testes para detectar acessos no SciELO Books. A meta é contar os acessos reais, por livro e por capítulo de livro, com exemplos concretos, nos dois moldes (requ...

Created comprehensive tests demonstrating real SciELO Books access counting with concrete examples:

9 new integration tests (TestBooksAccessCounting):

  1. Book landing page → 0 Requests, 1 Investigation
  2. Chapter HTML → 1 Request, 1 Investigation
  3. Chapter PDF → 1 Request, 1 Investigation
  4. Full book PDF → 1 Request, 1 Investigation
  5. Multiple accesses (Total vs Unique counting)
  6. 30-second deduplication (valid clicks)
  7. Rapid click filtering (filtered duplicates)
  8. Different chapters tracked separately
  9. Real-world mixed access scenario

Demonstration script (test_books_access_examples.py):

  • 6 concrete examples with actual production URLs
  • Shows exact counting for books and chapters
  • Distinguishes Requests (full-text) vs Investigations (metadata)
  • Demonstrates COUNTER R5 30-second deduplication rule

Results: All 24 tests passing (13 translator + 2 basic + 9 comprehensive)

Run demo: PYTHONPATH=. python tests/test_books_access_examples.py

Commit: cd98266

Copilot AI changed the title Add SciELO Books translator for COUNTER R5 metrics with production URL support Add SciELO Books translator for COUNTER R5 metrics with comprehensive access counting tests Feb 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Adicionar suporte a logs originários do SciELO Livros

2 participants