Duplicate published doc #700

speed2exe · 2024-07-24T07:16:27Z

No description provided.

Horusiath · 2024-08-29T10:05:45Z

libs/database/src/publish.rs

+
+  if res.rows_affected() != item_count as u64 {
+    tracing::warn!(
+      "Failed to insert or replace publish collab meta batch, workspace_id: {}, publisher_uuid: {}, rows_affected: {}",


This is a partial failure - some rows might have been updated, some might not:

If you want to be strongly consistent (either all collabs were duplicated or none of them are), you should rollback transaction at this point.

If you want to be elastic, we should probably return row_affected count and inform user how many collabs have we duplicated.

Also log both {rows_affected} and {item_count} so that we have a clue how many were missed.

@Horusiath iirc {rows_affected} account for both modified and inserted. my intent for every collab, it is either inserted or modified. I shall modify the code to log both the {rows_affected} and {item_count}`, if they differ.

Horusiath · 2024-08-29T10:09:47Z

libs/database/src/publish.rs

+}
+
+#[inline]
+pub async fn insert_or_replace_publish_collabs<'a, E: Executor<'a, Database = Postgres>>(


Generally speaking this might be a very heavy query. We know that individual collabs can have MBs in size. If there's no upper bound on their size/count, we might have as well lock the table for undefined time.

there is an upper bound 4M for metadata, 128MB for blob. that was check before the request came in post_publish_collabs_handler. do you think these upper bounds are reasonable?

Tbh. I think that 128MB could be used to DDoS us. But we can adjust it accordingly in the future.

There's also a max limit of 256MB for a single request in our load balancer on this particular endpoint. in additional, this also consumes workspace storage quota, so it would be a matter of tries before the user's limit is reached and will be blocked.

libs/database/src/publish.rs

Horusiath · 2024-08-29T10:20:28Z

services/appflowy-collaborate/src/group/cmd.rs

+
+    if let Some(group) = self.group_manager.get_group(&object_id).await {
+      let (collab_message_sender, _collab_message_receiver) = futures::channel::mpsc::channel(1);
+      let (mut message_by_oid_sender, message_by_oid_receiver) = futures::channel::mpsc::channel(1);


Use tokio::sync::mpsc::channel: tokio is often more mature than futures. If generic constraints allow that you could also use tokio::sync::oneshot.

i have tried. but based on the subscribe function signature, it's not possible to use the tokio channel

pub async fn subscribe<Sink, Stream>( &self, user: &RealtimeUser, subscriber_origin: CollabOrigin, sink: Sink, stream: Stream, ) where Sink: SinkExt<CollabMessage> + Clone + Send + Sync + Unpin + 'static, Stream: StreamExt<Item = MessageByObjectId> + Send + Sync + Unpin + 'static, <Sink as futures_util::Sink<CollabMessage>>::Error: std::error::Error + Send + Sync,

Horusiath · 2024-08-29T10:22:06Z

src/biz/collab/folder_view.rs

+use shared_entity::dto::workspace_dto::FolderView;
+use uuid::Uuid;
+
+pub fn collab_folder_to_folder_view(folder: &Folder, depth: u32) -> FolderView {


This looks like method that could be implemented directly on FolderView.

apparently FolderView is in shared-entity crate to share structure between server and client. putting the implementation as where you mentioned will mean that the shared-entity crate will have to depend on the collab-folder for Folder, which we want to avoid having.

Horusiath · 2024-08-29T10:28:16Z

src/biz/workspace/publish.rs

+  for publish_item in &publish_items {
+    check_collab_publish_name(publish_item.meta.publish_name.as_str())?;
+  }
+  insert_or_replace_publish_collabs(pg_pool, workspace_id, publisher_uuid, publish_items).await?;


Since there's no explicit Postgres transaction intiialization I can only assume that every sqlx query executed here will be wrapped and committed independently.

Another issue is that publish_items has no upper bound check. We basically copy unlimited number of collabs of unlimited size. With no timeouts.

there's only 1 sqlx operation for this api call, which is the insert_or_replace_publish_collabs. so there's no need for transaction. as mentioned above, upper bound is checked when api is call before it reaches the this part

Horusiath · 2024-08-29T10:38:25Z

src/biz/workspace/publish_dup.rs

+    // new view after deep copy
+    // this is the root of the document/database duplicated
+    let mut root_view = match self
+      .deep_copy(&mut txn, uuid::Uuid::new_v4().to_string(), publish_view_id)


It's hard to say how CPU intensive the deep copy is. Ideally we don't want to hold transactions for too long (as we have upper limit of concurrent connections in PgPool).

If possible we could prepare blobs first, then begin transaction and insert them.

Alternatively if pt.1 uses too much memory, insert collabs one-by-one each one in separate transaction in their view parent order.

Maybe it would be a good idea to also add metrics to measure time spent on duplicating?

I have changed to accumulating all the collabs to insert without holding on to a postgres txn as you mentioned. We already have the metrics to measure the frequency and latency of each http endpoint (including this one), that would suffice for now. when we need more thorough analysis I'll add them to be included to the /metrics

src/biz/workspace/publish_dup.rs

services/appflowy-collaborate/src/collab/storage.rs

services/appflowy-collaborate/src/command.rs

services/appflowy-collaborate/src/group/cmd.rs

src/biz/workspace/publish_dup.rs

appflowy · 2024-09-02T13:22:26Z

src/biz/workspace/publish_dup.rs

+      .await?;
+
+      collab_storage
+        .insert_new_collab_with_transaction(


how about using insert_new_collab? it will save the collab in memory and the write to disk in the background.

Horusiath and others added 30 commits July 11, 2024 14:07

chore: test collab sync

a0fb00f

feat: folder view for user workspace

c0956ff

feat: add private indicator

436c300

Merge branch 'main' into duplicate-published-doc

86910bb

chore: use collab cache instead

d884d6b

Merge branch 'main' into duplicate-published-doc

6a0990d

chore: initial impl

b2e4c8d

chore: cargo sqlx

c8e1b71

fix: write back to collab cache

431d714

fix: assign page id

3b580a3

fix: text map

73c08ed

chore: connect api and refactor

ab3d538

chore: switch to using mem storage

e48d8ed

fix: collab type

9434c22

feat: use group manager to manage sync

6b0d0e9

feat: try add send command instead

1fcfa8b

chore: add client api code

c332e23

Merge branch 'duplicate-published-doc-2' into duplicate-published-doc

0e77106

feat: try use sink and stream from group collab

841882d

chore: disable sync first

e1e2c4e

fix: insert page before stopping group

bd9f48b

Merge branch 'main' into duplicate-published-doc

8d1e38f

feat: add extra for view

ac81976

feat: add metadata to doc

40c0c86

fix: icon

405033b

chore: merge with main

ecc5721

fix: page circular dep

4be8d4b

fix: page circular dep

3fb6941

fix: live update

2adee6b

fix: live update

a149667

speed2exe added 7 commits August 28, 2024 15:39

fix: dep count

940b1f3

chore: add test for doc ref and inline doc database

386d5b5

chore: merge with main

8b1016b

chore: cargo clippy

1575ede

chore: add more test scenarios

c278f7e

Merge branch 'main' into duplicate-published-doc

1662fb9

chore: fix tests

8321623

Horusiath reviewed Aug 29, 2024

View reviewed changes

chore: get database id

99b761a

appflowy reviewed Aug 29, 2024

View reviewed changes

appflowy and others added 12 commits August 29, 2024 20:44

chore: update collab

bca0bee

chore: add more assert and database row id checks

f950541

fix: suggestions from review

4ad49e8

chore: merge with main

b085695

chore: sqlx

f5ffaa4

fix: accumulate collab before insert

ba00b35

Merge branch 'main' into duplicate-published-doc

cdbc51d

Merge branch 'main' into duplicate-published-doc

5d425df

Merge branch 'main' into duplicate-published-doc

8ecc236

Merge branch 'main' into duplicate-published-doc

4572ebe

chore: add tokio spawn blocking for encoding

9fb513c

Merge branch 'main' into duplicate-published-doc

c185513

appflowy reviewed Sep 2, 2024

View reviewed changes

appflowy and others added 2 commits September 2, 2024 21:28

Merge branch 'main' into duplicate-published-doc

d9bb337

fix: reduce limit for publish collabs

7e16db4

appflowy merged commit 826546c into main Sep 3, 2024
9 checks passed

appflowy deleted the duplicate-published-doc branch September 3, 2024 01:12

lilpoozie2005 approved these changes Sep 4, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Duplicate published doc #700

Duplicate published doc #700

speed2exe commented Jul 24, 2024

Horusiath Aug 29, 2024 •

edited

Loading

speed2exe Aug 29, 2024

Horusiath Aug 29, 2024

speed2exe Aug 29, 2024

Horusiath Aug 30, 2024

speed2exe Aug 30, 2024 •

edited

Loading

Horusiath Aug 29, 2024

speed2exe Aug 29, 2024

Horusiath Aug 29, 2024

speed2exe Aug 29, 2024

Horusiath Aug 29, 2024

speed2exe Aug 29, 2024

Horusiath Aug 29, 2024

speed2exe Aug 30, 2024

appflowy Sep 2, 2024

Duplicate published doc #700

Duplicate published doc #700

Conversation

speed2exe commented Jul 24, 2024

Horusiath Aug 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

speed2exe Aug 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Horusiath Aug 29, 2024 •

edited

Loading

speed2exe Aug 30, 2024 •

edited

Loading