Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Greader API : get modified items #2566

Closed
Shinokuni opened this issue Oct 12, 2019 · 15 comments
Closed

Greader API : get modified items #2566

Shinokuni opened this issue Oct 12, 2019 · 15 comments
Labels
API 🤝 API for other clients Documentation 📚
Milestone

Comments

@Shinokuni
Copy link

Hello,

I am writing an Android RSS client with FreshRSS support and I encountered a problem when syncing items using Greader API.

I would like to get from the server items which read state has been modified. Let's say I synchronize my items list with my Android client. Then, I mark some items as read in the web client. I would like to make notice the mobile client that several items have been marked as read.

I already know that if an item contains user/-/state/com.google/read in the categories field in the returned json, like that :

"categories": [
    "user/-/state/com.google/reading-list",  
    "user/-/label/IT",
    "user/-/state/com.google/read"
]

it is read an I can mark it as read in the mobile client database.

When fetching items with reader/api/0/stream/contents/user/-/state/com.google/reading-list end point and the parameter ot (unix timestamp), I can get new items, and, if they have been read before the mobile client fetch them, they will have user/-/state/com.google/read in their categories field. But if I mark an item older than the ot parameter as read, I won't get it.

Am I missing something which could solve this problem ? If not, Is there anything doable for this ?

Otherwise, thanks for this awesome project that is FreshRSS !

@Alkarex Alkarex added API 🤝 API for other clients Documentation 📚 labels Oct 12, 2019
@Alkarex
Copy link
Member

Alkarex commented Oct 12, 2019

Hello @Shinokuni and welcome :-)

First, I would like to say that getting the synchronisation strategy right is essential for a good client. Except News+ and to a lower extent EasyRSS, the other clients I have tested all have inefficient synchronisation strategies (in some cases very bad). By inefficient, I mean far too many requests, redundant requests, as well as expensive requests for the client and/or the server (leading to slow synchronisation, high battery consumption, high bandwidth consumption, high CPU usage on client and server, high database usage on server, etc.).

Therefore, I am always very pleased to provide the exact API calls to perform.
The following seven requests are what News+ does for its global synchronisation (see also full log below), which is both robust and efficient. No need to make a single additional request for that phase. I can also provide logs for other phases such as login, posting changes, etc. In case of doubt, I suggest you install News+ and check on your server the exact calls that are made, and do the same.

  1. /reader/api/0/tag/list
    • Full list of categories/folders and tags/labels - and for InnoReader compatibility, including the number of unread items in each tags/labels
  2. /reader/api/0/subscription/list
    • Full list of subscriptions/feeds, including their category/folder.
    • This is where you get a distinction between categories/folders and tags/labels
  3. /reader/api/0/stream/contents/user/-/state/com.google/reading-list (with some filters in parameter to exclude read items with xt, and get only the new ones with ot, cf. log below)
    • List of new unread items and their content
    • The response contains among other things the read/unread state, the starred/not-starred state, and the tags/labels for each entry.
    • Since this request is very expensive for the client, the network, and the server, it is important to use the filters appropriately.
    • If there is no new item since the last synchronisation, the response should be empty, and therefore efficient
  4. /reader/api/0/stream/items/ids (with a filter in parameter to exclude read items with xt)
    • Longer list of unread items IDs
    • This allows updating the read/unread status of the local cache of articles - assuming the ones not in the list are read
  5. /reader/api/0/stream/contents/user/-/state/com.google/starred (with some filters in parameter to exclude read items with xt, and get only the new ones with ot)
    • List of new unread starred items and their content
    • If there is no new unread starred item since the last synchronisation, the response should be empty, and therefore efficient
    • This is a bit redundant with request 3 and 6, but with the advantage of being able to retrieve a larger amount of unread starred items.
  6. /reader/api/0/stream/contents/user/-/state/com.google/starred (with some other filters, which includes read starred items)
    • List of starred items (also read ones) and their content
  7. /reader/api/0/stream/items/ids (with a filter to get only starred ones)
    • Longer list of starred items IDs
    • This allows updating the starred/non-starred status of the local cache of articles - assuming the ones not in the list are not starred
    • Similar than request 4 but for the starred status

It is also possible in News+ to synchronise / "pull for refresh" a specific category/folder, or feed, or tag/label, but that is only necessary when the user wants to get read items or more/older items than the global limit.

Full log:

[Mon, 08 Oct 2018 09:02:46 +0200] [debug] --- Array
(
  [date] => 2018-10-08T09:02:46+02:00
  [headers] => Array
    (
      [Connection] => Keep-Alive
      [Accept-Encoding] => gzip
      [Authorization] => GoogleLogin auth=test/ABCDEF0123456789
    )
  [_SERVER] => Array
    (
      [PATH_INFO] => /reader/api/0/tag/list
      [REQUEST_URI] => /api/greader.php/reader/api/0/tag/list?client=newsplus&output=json&ck=1538982165918
      [QUERY_STRING] => client=newsplus&output=json&ck=1538982165918
      [REQUEST_METHOD] => GET
      [HTTP_AUTHORIZATION] => GoogleLogin auth=test/ABCDEF0123456789
      [PHP_SELF] => /api/greader.php/reader/api/0/tag/list
    )
  [_GET] => Array
    (
      [client] => newsplus
      [output] => json
      [ck] => 1538982165918
    )
)

[Mon, 08 Oct 2018 09:02:46 +0200] [debug] --- Array
(
  [date] => 2018-10-08T09:02:46+02:00
  [headers] => Array
    (
      [Connection] => Keep-Alive
      [Accept-Encoding] => gzip
      [Authorization] => GoogleLogin auth=test/ABCDEF0123456789
    )
  [_SERVER] => Array
    (
      [PATH_TRANSLATED] => /usr/share/FreshRSS/reader/api/0/subscription/list
      [PATH_INFO] => /reader/api/0/subscription/list
      [REQUEST_URI] => /api/greader.php/reader/api/0/subscription/list?client=newsplus&output=json&ck=1538982165918
      [QUERY_STRING] => client=newsplus&output=json&ck=1538982165918
      [REQUEST_METHOD] => GET
      [HTTP_AUTHORIZATION] => GoogleLogin auth=test/ABCDEF0123456789
      [PHP_SELF] => /api/greader.php/reader/api/0/subscription/list
    )
  [_GET] => Array
    (
      [client] => newsplus
      [output] => json
      [ck] => 1538982165918
    )
)

[Mon, 08 Oct 2018 09:02:49 +0200] [debug] --- Array
(
  [date] => 2018-10-08T09:02:49+02:00
  [headers] => Array
    (
      [Connection] => Keep-Alive
      [Accept-Encoding] => gzip
      [Authorization] => GoogleLogin auth=test/ABCDEF0123456789
    )
  [_SERVER] => Array
    (
      [PATH_TRANSLATED] => /usr/share/FreshRSS/reader/api/0/stream/contents/user/-/state/com.google/reading-list
      [PATH_INFO] => /reader/api/0/stream/contents/user/-/state/com.google/reading-list
      [REQUEST_URI] => /api/greader.php/reader/api/0/stream/contents/user%2F-%2Fstate%2Fcom.google%2Freading-list?client=newsplus&ck=1538982165918&xt=user/-/state/com.google/read&ot=1538978853&n=1000&r=n
      [QUERY_STRING] => client=newsplus&ck=1538982165918&xt=user/-/state/com.google/read&ot=1538978853&n=1000&r=n
      [REQUEST_METHOD] => GET
      [HTTP_AUTHORIZATION] => GoogleLogin auth=test/ABCDEF0123456789
      [PHP_SELF] => /api/greader.php/reader/api/0/stream/contents/user/-/state/com.google/reading-list
    )
  [_GET] => Array
    (
      [client] => newsplus
      [ck] => 1538982165918
      [xt] => user/-/state/com.google/read
      [ot] => 1538978853
      [n] => 1000
      [r] => n
    )
)

[Mon, 08 Oct 2018 09:02:50 +0200] [debug] --- Array
(
  [date] => 2018-10-08T09:02:50+02:00
  [headers] => Array
    (
      [Connection] => Keep-Alive
      [Accept-Encoding] => gzip
      [Authorization] => GoogleLogin auth=test/ABCDEF0123456789
    )
  [_SERVER] => Array
    (
      [PATH_TRANSLATED] => /usr/share/FreshRSS/reader/api/0/stream/items/ids
      [PATH_INFO] => /reader/api/0/stream/items/ids
      [REQUEST_URI] => /api/greader.php/reader/api/0/stream/items/ids?output=json&s=user%2F-%2Fstate%2Fcom.google%2Freading-list&xt=user/-/state/com.google/read&n=10000&r=n
      [QUERY_STRING] => output=json&s=user%2F-%2Fstate%2Fcom.google%2Freading-list&xt=user/-/state/com.google/read&n=10000&r=n
      [REQUEST_METHOD] => GET
      [HTTP_AUTHORIZATION] => GoogleLogin auth=test/ABCDEF0123456789
      [PHP_SELF] => /api/greader.php/reader/api/0/stream/items/ids
    )
  [_GET] => Array
    (
      [output] => json
      [s] => user/-/state/com.google/reading-list
      [xt] => user/-/state/com.google/read
      [n] => 10000
      [r] => n
    )
)

[Mon, 08 Oct 2018 09:02:50 +0200] [debug] --- Array
(
  [date] => 2018-10-08T09:02:50+02:00
  [headers] => Array
    (
      [Connection] => Keep-Alive
      [Accept-Encoding] => gzip
      [Authorization] => GoogleLogin auth=test/ABCDEF0123456789
    )
  [_SERVER] => Array
    (
      [PATH_TRANSLATED] => /usr/share/FreshRSS/reader/api/0/stream/contents/user/-/state/com.google/starred
      [PATH_INFO] => /reader/api/0/stream/contents/user/-/state/com.google/starred
      [REQUEST_URI] => /api/greader.php/reader/api/0/stream/contents/user%2F-%2Fstate%2Fcom.google%2Fstarred?client=newsplus&ck=1538982165918&xt=user/-/state/com.google/read&ot=1538978853&n=1000&r=n
      [QUERY_STRING] => client=newsplus&ck=1538982165918&xt=user/-/state/com.google/read&ot=1538978853&n=1000&r=n
      [REQUEST_METHOD] => GET
      [HTTP_AUTHORIZATION] => GoogleLogin auth=test/ABCDEF0123456789
      [PHP_SELF] => /api/greader.php/reader/api/0/stream/contents/user/-/state/com.google/starred
    )
  [_GET] => Array
    (
      [client] => newsplus
      [ck] => 1538982165918
      [xt] => user/-/state/com.google/read
      [ot] => 1538978853
      [n] => 1000
      [r] => n
    )
)

[Mon, 08 Oct 2018 09:02:51 +0200] [debug] --- Array
(
  [date] => 2018-10-08T09:02:51+02:00
  [headers] => Array
    (
      [Connection] => Keep-Alive
      [Accept-Encoding] => gzip
      [Authorization] => GoogleLogin auth=test/ABCDEF0123456789
    )
  [_SERVER] => Array
    (
      [PATH_TRANSLATED] => /usr/share/FreshRSS/reader/api/0/stream/contents/user/-/state/com.google/starred
      [PATH_INFO] => /reader/api/0/stream/contents/user/-/state/com.google/starred
      [REQUEST_URI] => /api/greader.php/reader/api/0/stream/contents/user%2F-%2Fstate%2Fcom.google%2Fstarred?client=newsplus&ck=1538982165918&n=1000&r=n
      [QUERY_STRING] => client=newsplus&ck=1538982165918&n=1000&r=n
      [REQUEST_METHOD] => GET
      [HTTP_AUTHORIZATION] => GoogleLogin auth=test/ABCDEF0123456789
      [PHP_SELF] => /api/greader.php/reader/api/0/stream/contents/user/-/state/com.google/starred
    )
  [_GET] => Array
    (
      [client] => newsplus
      [ck] => 1538982165918
      [n] => 1000
      [r] => n
    )
)

[Mon, 08 Oct 2018 09:02:52 +0200] [debug] --- Array
(
  [date] => 2018-10-08T09:02:52+02:00
  [headers] => Array
    (
      [Connection] => Keep-Alive
      [Accept-Encoding] => gzip
      [Authorization] => GoogleLogin auth=test/ABCDEF0123456789
    )
  [_SERVER] => Array
    (
      [PATH_TRANSLATED] => /usr/share/FreshRSS/reader/api/0/stream/items/ids
      [PATH_INFO] => /reader/api/0/stream/items/ids
      [REQUEST_URI] => /api/greader.php/reader/api/0/stream/items/ids?output=json&s=user%2F-%2Fstate%2Fcom.google%2Fstarred&n=10000&r=n
      [QUERY_STRING] => output=json&s=user%2F-%2Fstate%2Fcom.google%2Fstarred&n=10000&r=n
      [REQUEST_METHOD] => GET
      [HTTP_AUTHORIZATION] => GoogleLogin auth=test/ABCDEF0123456789
      [PHP_SELF] => /api/greader.php/reader/api/0/stream/items/ids
    )
  [_GET] => Array
    (
      [output] => json
      [s] => user/-/state/com.google/starred
      [n] => 10000
      [r] => n
    )
)

Do not hesitate to ask again, but please consider this synchronisation strategy.

@Frenzie
Copy link
Member

Frenzie commented Oct 12, 2019

@Alkarex Sounds like a good thing to stick in the docs which should make it easier to find through a search engine, maybe somewhere in https://freshrss.github.io/FreshRSS/en/developers/01_First_steps.html?

@Shinokuni
Copy link
Author

First of all, thank you for your answer.

Here is the way Readrops handle synchronization.

Initial sync

One the main functionalities of Readrops is to provide an offline experience. Therefore, a large quantity of items is fetched and stored locally when doing the initial synchronization.

Steps :

  1. Fetch feeds /reader/api/0/subscription/list
  2. Fetch folders /reader/api/0//tag/list
  3. Fetch only unread items to a maximum of 10k reader/api/0/stream/contents/user/-/state/com.google/reading-list

Classic sync

Steps :

  1. Push read items /reader/api/0/edit-tag
  2. Push unread items /reader/api/0/edit-tag
  3. Fetch feeds, to get new and updated feeds and know which feeds were deleted /reader/api/0/subscription/list
  4. Fetch folders, the same as for feeds /reader/api/0//tag/list
  5. Fetch new unread items since last synchronization reader/api/0/stream/contents/user/-/state/com.google/reading-list

The way Readrops handles synchronization is more or less the same as what you described expect Readrops doesn't fetch starred items and makes one query per item read state.

The initial point of my issue was to know if there is a way to get modified items since a precise time. This would allow to have a coherent read/unread state for all items and all platforms.

Do I have to conclude that there is no way to do this ?

@Frenzie
Copy link
Member

Frenzie commented Oct 12, 2019

One the main functionalities of Readrops is to provide an offline experience.

Speaking just for myself of course, but I doubt I'd even consider using a third-party client except for the offline experience. ;-)

(That's why I currently use EasyRSS.)

@Shinokuni
Copy link
Author

Speaking just for myself of course, but I doubt I'd even consider using a third-party client except for the offline experience. ;-)

(That's why I currently use EasyRSS.)

Year, I believe too that it is important to have an offline access to its feeds. Personally, not having an offline access wouldn't bother me that much because the situations where I don't have any connexion (RER A de ses morts) are infrequent and I can do something else.

@Alkarex
Copy link
Member

Alkarex commented Oct 12, 2019

makes one query per item read state

@Shinokuni Could you please explain that again?

Please check requests 4 and 7.

@Shinokuni
Copy link
Author

makes one query per item read state

@Shinokuni Could you please explain that again?

Year, sorry. I meant one request to mark items as read and one request to mark items as unread with /reader/api/0/edit-tag.

Please check requests 4 and 7.

This is interesting. If I use the parameter ot (newer than), do you know if I will get the ids of the latest modified items or only the latest items ? Using ot would allow to avoid fetching an arbitrary number of ids to get all modified items ids, just the new and modified items ids.

@Alkarex
Copy link
Member

Alkarex commented Oct 12, 2019

No, it is not the date when the items where modified, but the date when they were discovered / added to database. They are still the best calls to get the states as they only retrieve IDs.

@Shinokuni
Copy link
Author

Does this mean that if I change the read state of a item created three months ago, I will have to fetch three months of items ids to get it ? In this case, it won't be useful because too expensive.

@Alkarex
Copy link
Member

Alkarex commented Oct 12, 2019

I agree that the API could surely be improved. We could make some additions (I am open to that), but changing the behaviour of existing calls risk breaking other clients obeying the Google Reader API.
In any case, there are many more items on the server than on the client, so the client need to make reasonable calls.

When you want the states and ask the IDs, you ask only the unread ones (The IDs not in the list are read). The length of that list is at max the number of unread articles on the server, and can be limited by number and date, so it is not that bad.

In practice, I have in general between 1k and 4k unread articles, 300k+ read articles, ~160 feeds, ~17 categories, 400+ favourites, ~10 tags. A full sync in News+ takes about ~3s.

@Shinokuni
Copy link
Author

When you want the states and ask the IDs, you ask only the unread ones (The IDs not in the list are read). The length of that list is at max the number of unread articles on the server, and can be limited by number and date, so it is not that bad.

You are right, it is not that bad. But I will still see for a limitation to avoid fetching all unread items ids. I have with my personal account 4k unread items, so 4k local items to update if I fetch all of them. I don't mind when doing the initial sync, but for a classic one, it is not insignificant.

I agree that the API could surely be improved. We could make some additions (I am open to that)

That's nice !
I see here two cases to handle read state synchronization :

  • return new and modified items with reader/api/0/stream/contents/user/-/state/com.google/reading-list. The items list would be sorted by last modified date, insertion date being the first last modified time. This change could break existing API implementations by returning all ready existing in local, items. If the client doesn't have any kind of upsert strategy, this will create duplicates.
  • return new and modified items ids with /reader/api/0/stream/items/ids. Apply the same strategy as the first point. This wouldn't break anything because only the order would be modified.

Anyway, a big thanks for taking the time to answer me. I will investigate the /reader/api/0/stream/items/ids solution.

@Alkarex
Copy link
Member

Alkarex commented Feb 29, 2020

@Shinokuni I have tested your client today, and it looks very good already, congrats :-)
#2798
Closing here, but do not hesitate to ask again, especially if you need any documentation / feedback

@Shinokuni
Copy link
Author

Hello, as promised in readrops/Readrops#53 (comment), here is a post about FreshRSS synchronization in Readrops. Due to a lack of time, I wasn't able to make it sooner.

Issues

I recently (more or less) worked on the addition of FreshRSS starred items in Readrops and it made me work again on item read state synchronization. If I didn't really encounter problems for managing requests, it was on the other hand more difficult on the db side.

First, SQLite restricts to 999 the number of arguments you can give to an IN operator. It means that when you get more than a thousand items ids with /reader/api/0/stream/items/ids, you will have to split them and make multiple requests to update the state of each item, which would be really slow. This also affects the star state synchronization.

D/FreshRSSRepository: FreshRSS sync timer:      704 ms, server queries
D/FreshRSSRepository: FreshRSS sync timer:      9 ms, folders insertion
D/FreshRSSRepository: FreshRSS sync timer:      84 ms, feeds insertion
D/FreshRSSRepository: FreshRSS sync timer:      0 ms, items insertion
D/FreshRSSRepository: FreshRSS sync timer:      495 ms, starred items insertion
D/FreshRSSRepository: FreshRSS sync timer:      528 ms, update starred items state
D/FreshRSSRepository: FreshRSS sync timer:      1071 ms, reset read changes
D/FreshRSSRepository: FreshRSS sync timer:      2322 ms, reset star changes
D/FreshRSSRepository: FreshRSS sync timer: end, 5213 ms

Here is a log of the synchronization after I implemented the fetch of the starred items. The starred items insertion checks for each item if it already exists in db and inserts it if not, which is pretty slow as the test data was about 10 starred items. Then items star state is updated with the ids from /reader/api/0/stream/items/ids which is also very slow. Finally, local read/star state which indicates if an item had one of these states modified is reset.

The synchronization doesn't contains the update of the read state and doesn't fetch any new items but lasts 5 seconds which is way too much. I had to improve all of this.

Solution

Requests strategy

Here is the new request strategy:

  1. folders: reader/api/0/tag/list
  2. feeds: reader/api/0/subscription/list
  3. new items: reader/api/0/stream/contents/user/-/state/com.google/reading-list
    - exclude starred items
    - only the new ones
  4. unread items ids: reader/api/0/stream/items/ids
    • only unread and not starred items ids
    • only the 5000 newest
  5. starred items: reader/api/0/stream/contents/user/-/state/com.google/starred
    • all starred items (read/unread)
    • only the 1000 newest

I don't make any further calls as it's not needed with the database strategy below.

Database strategy

I added a few new tables:

  • A table to store unread items ids from /reader/api/0/stream/items/ids
  • A table to store starred items from reader/api/0/stream/contents/user/-/state/com.google/starred

Instead of directly updating each item read state with the ids (limited to 999) in the query, all unread items ids are stored in a new table and then used to update the read state. Before inserting unread items ids in the new table, all old items ids from the previous synchronization are deleted. This process makes the update faster even if it's not perfect.

Instead of dealing with starred items ids, only starred items are fetched and stored in a separate table. This ensures not to have to do any request and query to update starred items read and star state. Before the insertion, all previous inserted items from the last synchronization are deleted. The fetch of starred items is limited to 1000 for performance.

Result

D/FreshRSSRepository: FreshRSS sync timer:      530 ms, server queries
D/FreshRSSRepository: FreshRSS sync timer:      10 ms, folders insertion
D/FreshRSSRepository: FreshRSS sync timer:      72 ms, feeds insertion
D/FreshRSSRepository: FreshRSS sync timer:      0 ms, items insertion
D/FreshRSSRepository: FreshRSS sync timer:      7 ms, starred items insertion
D/FreshRSSRepository: FreshRSS sync timer:      760 ms, insert and update items ids
D/FreshRSSRepository: FreshRSS sync timer: end, 1384 ms

The result is a lot better. Some steps were removed and other improved. Of course, this result was made with good conditions: fast WI-FI connection, fast phone (OP 6), no new items and very few starred items. A synchronization with less good variables would last around 3 seconds.

Things left to do

I didn't write anything about pushing read/star state changes from Readrops. I have two solutions here:

  • Create a new table which will store all state changes. The changes will be pushed when synchronizing.
  • Push the update just after a change. For example, when a user clicks on an item, a request is made to push the read state change. This has several drawbacks: a lot of tiny requests which could be unified in a single one and if internet isn't available, we would need to store somewhere the change to be able to push it when internet is back, otherwise the change would never reach the server

Feel free to suggest changes, I'm totally open to modifications.

@Alkarex
Copy link
Member

Alkarex commented Dec 28, 2020

@Shinokuni Thanks for the update; that looks very good, congrats 👍
Regarding pushing state changes, I suggest a hybrid approach: you need to maintain a list of state changes anyway (e.g. to change state of multiple articles at once, in the case a request does not work, telephone offline, etc.), and pushing regularly (when synchronising, but also at significant events, for instance changing view, or before closing the app - if you can catch that-).
Keep up the good work!

@Shinokuni
Copy link
Author

Thanks for the suggestion, I will think about it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API 🤝 API for other clients Documentation 📚
Projects
None yet
Development

No branches or pull requests

3 participants