Skip to content
This repository has been archived by the owner on Feb 12, 2024. It is now read-only.

intermitent ERR_PUTTING_TO_ROUTING on IPFS.name.publish #4267

Closed
algorista opened this issue Dec 15, 2022 · 5 comments
Closed

intermitent ERR_PUTTING_TO_ROUTING on IPFS.name.publish #4267

algorista opened this issue Dec 15, 2022 · 5 comments
Assignees
Labels
help wanted Seeking public contribution on this issue kind/bug A bug in existing code (including security flaws) kind/not-helia-issue need/analysis Needs further analysis before proceeding

Comments

@algorista
Copy link

the problem

This is an exception that happens practically all the time when trying to publish on IPNS and it makes the use of IPNS unfeasible if there is no reasonable solution.

the debug

I debugged the source within the limits of my time, and I found the source of the problem, which is this line where a .put is made in the DHTDataStore that forwards the .put to all peers in the libp2p DHT.

await drain(this._dht.put(key, value, options))

What happens is that almost always there will be one or another peer that gives TIMEOUT, and this will trigger an EXCEPTION, which is propagated without any treatment to the upper layer that generalizes the error as ERR_DB_WRITE_FAILED on Errors.dbWriteFailedError, and the error is generalized again on another higher layer as ERR_PUTTING_TO_ROUTING

https://github.com/ipfs/js-datastore-core/blob/6851f672c3c60b5fff8131a7c450164aef50ffe9/src/tiered.js#L58

 async put (key, value, options) {
    try {
      await Promise.all(this.stores.map(store => store.put(key, value, options)))
    } catch (err) {
      throw Errors.dbWriteFailedError()
    }
  }

Below is my crazy monkey style debug code (I don't have time, I need this working now)

  async put (key, value, options) {
    try {
      //await drain(this._dht.put(key, value, options))

      console.log("\n\n\n\n\nLOG: DHTDatastore.put->item START");
      var source = this._dht.put(key, value, options);
      for await (const item of source) {
        console.log("\nLOG: DHTDatastore.put->item", this.log_count++, "->", item, "\n----------------------------------");
      }
      console.log("\nLOG: DHTDatastore.put->item END");
    } catch (/** @type {any} */ err) {
      console.log("EXCEPTION: DHTDatastore.put", err.stack)
      log.error(err)
      console.log("\\===========================/")

      throw err
    }
  }
LOG: DHTDatastore.put->item 957 -> {
  from: Ed25519PeerIdImpl [PeerId(12D3KooWBCNW4SjEjSYy5KgMDMPRp2eAs2PFSnMfASf6REh2DgAJ)] {
    type: 'Ed25519',
    multihash: Digest {
      code: 0,
      size: 36,
      digest: [Uint8Array],
      bytes: [Uint8Array]
    },
    privateKey: undefined,
    publicKey: Uint8Array(36) [
        8,   1,  18,  32,  20, 126, 215,  94, 102,
      237, 135,  58, 179,  38, 127,  31, 137, 206,
       63,  57, 203, 216, 240,  94, 238, 108, 115,
       56,  93,  38, 162,  58,  10, 252,   4, 199
    ]
  },
  error: [AggregateError: All promises were rejected] {
    code: 'ERR_TIMEOUT',
    [errors]: [ [AbortError], [Error], [Error] ]
  },
  name: 'QUERY_ERROR',
  type: 3
} 

----------------------------------
EXCEPTION: DHTDatastore.put Error: Query aborted
    at AbortSignal.<anonymous> (file:///home/algorista/prj-xxxxxx/xxxxxx-poc/node_modules/@libp2p/kad-dht/dist/src/query/query-path.js:145:37)
    at AbortSignal.[nodejs.internal.kHybridDispatch] (node:internal/event_target:736:20)
    at AbortSignal.dispatchEvent (node:internal/event_target:678:26)
    at abortSignal (node:internal/abort_controller:292:10)
    at AbortController.abort (node:internal/abort_controller:323:5)
    at AbortSignal.onAbort (/home/algorista/prj-xxxxxx/xxxxxx-poc/node_modules/any-signal/index.js:11:16)
    at AbortSignal.[nodejs.internal.kHybridDispatch] (node:internal/event_target:736:20)
    at AbortSignal.dispatchEvent (node:internal/event_target:678:26)
    at abortSignal (node:internal/abort_controller:292:10)
    at AbortController.abort (node:internal/abort_controller:323:5)
\===========================/
Uncaught:
Error: ipns record for f5uxa3ttf4ackcacciqqgqlnqgn647kdrz3alsdluldttcxd6orqsepev64b2zgmnnt36zfh could not be stored in the routing - Error: Write failed
    at Module.dbWriteFailedError (file:///home/algorista/prj-xxxxxx/xxxxxx-poc/node_modules/datastore-core/src/errors.js:23:16)
    at TieredDatastore.put (file:///home/algorista/prj-xxxxxx/xxxxxx-poc/node_modules/datastore-core/src/tiered.js:58:20)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at runNextTicks (node:internal/process/task_queues:65:3)
    at processTimers (node:internal/timers:499:9)
    at async IpnsPublisher._publishEntry (file:///home/algorista/prj-xxxxxx/xxxxxx-poc/node_modules/ipfs-core/src/ipns/publisher.js:90:19)
    at async IpnsPublisher._putRecordToRouting (file:///home/algorista/prj-xxxxxx/xxxxxx-poc/node_modules/ipfs-core/src/ipns/publisher.js:77:5)
    at async IPNS.publish (file:///home/algorista/prj-xxxxxx/xxxxxx-poc/node_modules/ipfs-core/src/ipns/index.js:47:7)
    at async publish (file:///home/algorista/prj-xxxxxx/xxxxxx-poc/node_modules/ipfs-core/src/components/name/publish.js:92:20)
    at IpnsPublisher._publishEntry (file:///home/algorista/prj-xxxxxx/xxxxxx-poc/node_modules/ipfs-core/src/ipns/publisher.js:99:21)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
    at runNextTicks (node:internal/process/task_queues:65:3)
    at processTimers (node:internal/timers:499:9)
    at async IpnsPublisher._putRecordToRouting (file:///home/algorista/prj-xxxxxx/xxxxxx-poc/node_modules/ipfs-core/src/ipns/publisher.js:77:5)
    at async IPNS.publish (file:///home/algorista/prj-xxxxxx/xxxxxx-poc/node_modules/ipfs-core/src/ipns/index.js:47:7)
    at async publish (file:///home/algorista/prj-xxxxxx/xxxxxx-poc/node_modules/ipfs-core/src/components/name/publish.js:92:20)
    at async REPL215:1:33 {
  code: 'ERR_PUTTING_TO_ROUTING'
}

I tried to identify if the error happened in a specific item of the 'drain', but that's not how it happens, the exception appears at the end, before the last item (I suppose it's the last one, I can't guarantee that)

other potential implications

I believe this same bug also introduces a memory leak, a big one, but that's just a guess from my experience with js.

other 'issues' presenting the same problem

I will list here two other issues that have presented the same problem, where the palliative solution found by the users was a loop of attempts or restarting the node, where for a few moments all the peers are newly connected and do not have a high TIMEOUT chance. In this second case I tested and could confirm that the DHT is not connected to any peer and therefore the publish will not do anything, not even a timeout, so it is not a solution.

#4202

#4023

the solution

I made a fork and introduced these modifications. It's not an elegant solution, but it does the trick and I'll be able to get my app live later today.

algorista@0ee74f2

 async put (key, value, options) {
    try {
      //REMOVED: await drain(this._dht.put(key, value, options))
      let responses = this._dht.put(key, value, options);
      while(true) {
        let item;
        try {
          item = await responses.next();
        } catch(e) {
          if (e.message != "Query aborted") throw e;
        }
        if (item && item.done) break;
      }
    } catch (/** @type {any} */ err) {
      log.error(err)
      throw err
    }
  }

I believe that the elegant solution would be to properly handle the AbortEvent in layers closest to libp2p and ensure that at least some peers replicated the ipns, and only if none were reached, then send a throw. I also think there's room for more parallelization in that .put. With a little time I will study this calmly.

final considerations

If anyone arrives here like I did, looking for an urgent solution, you might consider temporarily installing from my fork, or monkeypatch from there.

npm install 'https://github.com/algorista/js-ipfs.git#algorista'
@welcome
Copy link

welcome bot commented Dec 15, 2022

Thank you for submitting your first issue to this repository! A maintainer will be here shortly to triage and review.
In the meantime, please double-check that you have provided all the necessary information to make this process easy! Any information that can help save additional round trips is useful! We currently aim to give initial feedback within two business days. If this does not happen, feel free to leave a comment.
Please keep an eye on how this issue will be labeled, as labels give an overview of priorities, assignments and additional actions requested by the maintainers:

  • "Priority" labels will show how urgent this is for the team.
  • "Status" labels will show if this is ready to be worked on, blocked, or in progress.
  • "Need" labels will indicate if additional input or analysis is required.

Finally, remember to use https://discuss.ipfs.io if you just need general support.

@algorista
Copy link
Author

I want to add that I couldn't get the github fork to work on npm, nor could I build it through lerna. So all I have left is to do the monkeypatch directly in the node_modules folder, which solves the problem on the server but does not solve the problem in the bundle that goes to the web user.

It's still a problem.

  Lerna (powered by Nx)   Cannot read properties of undefined (reading 'data')

@algorista
Copy link
Author

I also want to add that this message must be the same bug in the .put in the libp2p DHT, because each timeout will insert an abort event and overflow the normal limit of the node. after all, there are almost 300 connections going on, dozens of timeouts can occur at all times.

(node:14920) MaxListenersExceededWarning: Possible EventTarget memory leak detected. 11 abort listeners added to [AbortSignal]. Use events.setMaxListeners() to increase limit
(Use `node --trace-warnings ...` to show where the warning was created)

@achingbrain achingbrain added kind/bug A bug in existing code (including security flaws) help wanted Seeking public contribution on this issue need/analysis Needs further analysis before proceeding labels Dec 16, 2022
@RangerMauve
Copy link
Contributor

Was running into this too and ended up switching to Kubo with ipfs-http-client (which introduced another set of bugs 😛

@achingbrain
Copy link
Member

achingbrain commented May 22, 2023

js-IPFS is in the process of being deprecated, the replacement is Helia - please see the State of IPFS in JS post for a bit of background and the migration guide for how to port your app over.

When running a network query to publish IPNS records, a number of sub-queries are run to find KAD-close peers - individual sub-queries can fail with a timeout or other error without the whole query being aborted, but if they all fail then the whole query will fail.

Unfortunately the set of peers you will query depends on the KAD-ID of the key you are trying to publish so it's very hard to replicate in a deterministic manner.

What will happen though, is that some peers will be harder to dial than others as they might be publishing private (e.g. un-dialable) addresses or they might not support the right set of network transports.

Helia ships with the latest transports and will be where fixes are deployed from now on so please can you upgrade your app and see if the problem persists?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
help wanted Seeking public contribution on this issue kind/bug A bug in existing code (including security flaws) kind/not-helia-issue need/analysis Needs further analysis before proceeding
Projects
No open projects
Status: Done
Development

No branches or pull requests

4 participants