-
Notifications
You must be signed in to change notification settings - Fork 5
Fix out of date issues and add export funcitonality #15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Update click to >=8.2.1 (from unversioned) - Update requests to >=2.32.3 (from unversioned) - Keep sqlite-utils at >=2.4.4 (already up to date) 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
Update dependencies to latest versions
- Replace dict access with .get() method to handle missing keys gracefully - Add comprehensive tests for both missing 'list' and 'since' key scenarios - Prevents crashes when Pocket API returns unexpected response format Fixes the stack trace: KeyError: 'list' at utils.py line 119 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
- Add check for items table existence before enabling FTS - Add comprehensive tests for ensure_fts function edge cases: * When no items table exists (should not crash) * When items table exists (should create FTS) * When FTS already exists (should skip creation) Fixes the stack trace: sqlite3.OperationalError: no such table: items at utils.py line 68 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
- Change requests.get() to requests.post() for /v3/get endpoint - Change requests.get() to requests.post() for /v3/stats endpoint - Restore full progress bar functionality with total item count - /v3/stats endpoint is functional but undocumented The core issue was HTTP method, not deprecated endpoints. All API calls now use POST as required by Pocket API. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
- Add --debug flag to enable detailed logging - Log API requests, responses, and item processing - Track offset progression and item counts - Help identify why articles aren't being fetched 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
- Add proper Content-Type headers for API requests
- Use 'data' parameter instead of passing args directly to requests.post()
- Add error detection and logging for API error responses
- Update tests to use requests.post instead of requests.get
- Add test coverage for API error handling
Fixes the issue where API returns {'error': '...'} instead of data.
The Pocket API requires proper form-encoded requests with headers.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Reduce default page size from 500 to 50 items - Add automatic fallback mechanism for 413 errors: * Detect 'Payload Too Large' errors * Automatically reduce page size by half (minimum 10) * Retry with smaller page size - Add comprehensive test coverage for 413 error handling - Continue processing with reduced page size instead of crashing Fixes the issue where large accounts cause 413 errors due to excessive payload size when requesting complete item details. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
- Change error detection from checking key existence to checking value
- Only treat as error if error key has non-None value
- Add test for success case with 'error': None in response
- Fixes false positive error detection when API returns success
The Pocket API returns {'error': None, 'list': {...}} for successful
responses, so we need to check the error value, not just key presence.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Handle numeric author_ids normally (existing schema) - For string author_ids (alternative schema): * Treat the string as the author name * Generate deterministic integer ID using MD5 hash * Maintain integer author_id constraint in database - Add comprehensive test coverage for: * String author IDs become names with generated IDs * Mixed numeric/string author IDs in same item * Consistent ID generation for same string values Supports ~5-10% of Pocket items that use alternative author schema without breaking existing database structure or functionality. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
…tamps - Replace timestamp-based 'since' parameter with offset-based approach - Track number of existing items in database for incremental fetching - Start fetching from offset = count of existing items - Remove problematic since/timestamp logic that wasn't working - Add test coverage for start_offset functionality This properly resumes fetching from where it left off by continuing pagination from the number of items already stored in the database. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
Comprehensive fixes: API connectivity, error handling, retry logic, and progress tracking
Implement complete export system to sync Pocket items to Karakeep: **New Export Command:** - `pocket-to-sqlite export` with full CLI interface - Supports filtering by status (unread/archived/deleted) and favorites - Includes limit/offset for batching and resume capability - Dry-run mode for preview without API calls - Progress bar with accurate tracking - Comprehensive error handling and logging **KarakeepClient API Integration:** - REST API client with Bearer token authentication - Configurable base URL (defaults to localhost:3000) - Progressive retry logic for timeouts, rate limits (429), and server errors - 30-second request timeout to prevent hanging - Automatic rate limiting with 1-second delays between requests **Authentication Extension:** - Extends existing auth.json to include karakeep_token and karakeep_base_url - Maintains compatibility with existing Pocket authentication - Validates required Karakeep credentials before export **Data Mapping & Validation:** - Maps Pocket fields to Karakeep bookmark format: - resolved_title/given_title → title - excerpt → summary - resolved_url/given_url → url - type → "link" (constant) - Skips items without URLs with proper logging - Handles missing/null fields gracefully **Robust Error Handling:** - Retry logic for network timeouts and server errors - Graceful handling of malformed data - Comprehensive error reporting with item-level details - Resume capability via offset parameter **Comprehensive Test Coverage:** - 7 new test cases covering all functionality - Tests for KarakeepClient retry logic and error handling - Export function tests with filters and edge cases - Mock-based testing for isolated unit tests - Tests for dry-run preview functionality **CLI Features:** - `--filter-status [0|1|2]` - Filter by Pocket status - `--filter-favorite` - Export only favorited items - `--limit` / `--offset` - Batching and resume support - `--dry-run` - Preview mode without API calls - `--silent` - Suppress progress output - `--debug` - Enable detailed logging - `--auth FILE` - Custom auth file path **Example Usage:** ```bash # Export all items with progress bar pocket-to-sqlite export pocket.db # Export only favorites with custom auth pocket-to-sqlite export pocket.db --filter-favorite --auth myauth.json # Preview first 10 unread items pocket-to-sqlite export pocket.db --filter-status 0 --limit 10 --dry-run # Resume export from offset 1000 pocket-to-sqlite export pocket.db --offset 1000 ``` 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
Improve KarakeepClient to properly handle real Karakeep API responses:
**API Response Handling:**
- Handle 201 success responses with full bookmark object structure
- Parse 400 error responses with {code, message} format
- Add proper error handling for client errors (400-level) without retry
- Log successful bookmark creation with bookmark ID
**Enhanced Error Handling:**
- Client errors (400-499) now show proper error codes and messages
- No retry attempts for validation errors and other client errors
- Maintain retry logic only for server errors and rate limits
- Improved error messages with Karakeep error codes
**Updated Test Coverage:**
- Tests now use actual Karakeep API response structure
- Added test for 400 error response format handling
- Mock responses match real API format with proper fields
- Verified both success and error path handling
**Verified Working Integration:**
- Successfully tested against real Karakeep API
- Confirmed 201 responses with bookmark ID generation
- Debug logging shows proper API interaction flow
- All authentication and request formatting working correctly
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
Add detailed section covering Karakeep export functionality: **New Documentation Sections:** - Authentication setup for Karakeep integration - Basic export command usage and examples - Comprehensive filtering options (status, favorites) - Batching and resuming capabilities for large exports - Dry-run preview functionality - Additional command options (auth file, silent mode, debug) - Notes about built-in retry logic and error handling **Clear Examples for:** - Status filtering (unread/archived/deleted items) - Favorite filtering - Batched exports with limit/offset - Preview mode with dry-run - Custom authentication file usage - Debug and silent operation modes The documentation follows existing README style and provides users with complete information for using the new export functionality. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
Add Karakeep export command with comprehensive functionality
- Enhanced KarakeepClient with get_all_tags() and add_tags_to_bookmark() methods - Added tag caching and automatic ID generation for new tags - Updated export logic to parse existing tags column and attach to Karakeep bookmarks - Added graceful handling for databases with/without tags column - Maintains backward compatibility with existing functionality - All tests passing 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
- Fixed tag parsing to properly handle Pocket's nested JSON format: {"tag_name": {"tag": "tag_name", "item_id": "123"}}
- Extract tag names from the nested 'tag' field within each tag object
- Maintain fallback to key names if nested structure not found
- Updated comments with appropriate examples
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Add logging to show tags_data from database for each item - Add logging to show parsed tag names and count - Add specific logging for items with no tags vs no bookmark_id - Help troubleshoot tag processing during export 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
Add tagging support to Karakeep export
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR updates dependency constraints, refactors the incremental fetch logic to use offsets, and introduces a new export command for exporting items to Karakeep.
- Bump
clickandrequestsversions to restore build compatibility - Replace “since”-based fetching with offset-based logic and add a
--debugflag - Add
exportcommand (with filtering, dry-run, and progress support) and corresponding README docs
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| pyproject.toml | Tighten click and requests version requirements to latest compatible releases |
| pocket_to_sqlite/cli.py | Refactor fetch to use offset-based fetching; add --debug to both fetch and new export command |
| README.md | Document new export command for Karakeep integration and usage examples |
Comments suppressed due to low confidence (3)
pocket_to_sqlite/cli.py:85
- [nitpick] The parameter name
allshadows the built-inall()function. Consider renaming it tofetch_allfor clarity.
def fetch(db_path, auth, all, silent, debug):
pocket_to_sqlite/cli.py:146
- The README mentions
karakeep_base_urlbut the code does not validate or use it. Add a check forkarakeep_base_urland ensure it’s passed to the export logic.
if "karakeep_token" not in auth_data:
pocket_to_sqlite/cli.py:114
- New
exportfunctionality lacks accompanying tests. Consider adding unit or integration tests to cover filtering, dry-run, and actual export flows.
@cli.command()
Fix improper usage of count() method Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Fixes:
As you might know Pocket is ending their service in almost exactly a month from today. See their notice here: https://support.mozilla.org/en-US/kb/future-of-pocket
At this time on the 9th of July, the service will be in 'export only mode'. They have said that the API will continue to be operational until October 8th. This means this repo will not be useful after this date.
However, I've prepared a fix for some build issues I was getting on the current
mainas it had stopped working for me. I also added an export functionality to an open source self-host pocket alternative called Karakeep, see https://github.com/karakeep-app/karakeep (I recently contributed something to it, there is a good contributor community).For anyone who still might be using your project and wishing to preserve their collection, the update in this PR would be useful. Have a look and see if you want to accept it.
In any case, thanks for this, I found it a good way to connect my Pocket articles to some coding projects I was working on locally.