Skip to content

Conversation

@B4nan
Copy link
Member

@B4nan B4nan commented May 4, 2021

Packages:

  • @apify/consts
  • @apify/datastructures
  • @apify/git
  • @apify/hubspot_client
  • @apify/image_proxy_client
  • @apify/input_schema
  • @apify/log
  • @apify/markdown
  • @apify/salesforce_client
  • @apify/utilities

Other notable changes:

  • some of the typings are rather low effort (lots of any in some places, mainly salesforce and hubspot clients)
  • uses named exports except log where we keep the default export for logger instance
  • removes underscore from most of the packages (except salesforce/hubspot clients)
  • removed startsWith polyfill and newPromise and requestPromised methods
  • truncate method is duplicated so the log package does not depend on utilities (as those bring a lot of 3rd party deps)

Closes #131
Closes #95

BREAKING CHANGE:

  • old apify-shared package is now gone in favour of new @apify/* packages
  • all exports are now done via named exports instead of default exports (with exception of logger instance)
  • removed startsWith polyfill and newPromise and requestPromised methods

@B4nan B4nan force-pushed the monorepo branch 5 times, most recently from e09168b to 957e3e6 Compare May 4, 2021 15:31
@B4nan B4nan mentioned this pull request May 4, 2021
@mnmkng
Copy link
Member

mnmkng commented May 5, 2021

@mtrunkat @fnesveda I think it's important now to finally agree on the structure and principles behind this PR before we do a full code review. We haven't really done so in the connected issue.

I'm ok with the way it's done.

@B4nan B4nan force-pushed the monorepo branch 5 times, most recently from b457e4e to fe7d57b Compare May 5, 2021 11:46
@jancurn
Copy link
Member

jancurn commented May 5, 2021

@mtrunkat @fnesveda I think it's important now to finally agree on the structure and principles behind this PR before we do a full code review. We haven't really done so in the connected issue.

I'm ok with the way it's done.

I guess this decision also depends on the overall strategy for the naming of our open source NPM packages. For example, to say we're going to use @apify/xxx mainly internal tools and packages, and generic names for public reusable ones. But that means the "apify" brand will not be present in the generic public packages, which might or might not be cool.

What do you think?

@B4nan
Copy link
Member Author

B4nan commented May 5, 2021

So I managed to make the tests work with node < 15 too, and have some concrete points on what is needed for what.

  • Current setup uses NPM without workspaces, linking packages via lerna. It works well enough, but it requires us to have lock files (not just for the root project, but for all packages too), to be able to install the project in CI (don't ask me why, afaik its a clash between lerna and NPM, as lerna is fiddling with the lock files too).
  • If we use yarn (v1), we should be probably fine without lockfiles (it would definitely work with just the root level one - had that setup working elsewhere - and probably even without that one too).
  • If we use NPM workspaces, it works only on node 15+ (even if we manually upgrade NPM, again don't ask me why 🤷).
  • There is one small gotcha with the way it is now (NPM without workspaces) - if we want to use a dependency of a package in tests, we need to explicitly include it in the main project dev deps (with workspaces this works automatically).

Will focus on the linting and publishing setup now. Also planning to make the readme much more verbose, adding info about how to add new package, where to put dev deps, etc.

@mnmkng
Copy link
Member

mnmkng commented May 5, 2021

@jancurn We already had this discussion and figured that generic packages don't really take off unless someone spends non-trivial time marketing them. And since we neither want to market the packages nor keep up with potential community feature requests, we decided to just place them all under @apify. But that concerns only apify-shared. We still want to release puppeteer-scraper cheerio-scraper and so on under the hood of the SDK in the future.

@B4nan
Copy link
Member Author

B4nan commented May 11, 2021

Added contributing guide and implemented automatic releasing via lerna. Should be mostly ready for review, I have only few small ideas where to improve (namely precommit hook to check the commit message format).

Few words about the releasing:

  • package versions are independent
  • version bumps work based on the commit messages (see the contributing guide for details)
  • if we have a version bump in package that other packages depend on, we also get a version bump in the dependents (e.g. if we update consts package, we also get a bump in utilities and others)
  • root changelog is now automatically updated and consists only of a table that points to the child packages changelogs
  • we will need to add GH personal access token to the secrets to allow pushing from the CI (as we create the realease commits there)
  • we should disallow merge commits in the repo, in favour of squash merging (this way the changelogs will be generated based on the commit message, that the person merging the PR will have control over)
  • I have a testing repository for this here, if you are interested in how it looks in the wild: https://github.com/B4nan/lerna-test-v1
    • please ignore releases from yesterday or older (so look only at those from 11.5. onwards), as I was trying different lerna modes and things might look broken

Might be worth to split the utilities package, not sure where and how it is used, but it is kind of a mixture that depend on a lot of 3rd party packages as well as on 3 internal packages.

Lastly, I managed to make the NPM 7 workspaces work, so there is only one (root) lock file. This means that NPM 7 is needed to install the project locally.

@B4nan B4nan marked this pull request as ready for review May 11, 2021 14:15
Copy link
Member

@mnmkng mnmkng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not really read the code, because the tests are passing and it would take me a week. But I don't think that's a problem, because the packages are completely new, so a manual migration is needed anyway. Just commented on some nitpicks and typos.

@B4nan
Copy link
Member Author

B4nan commented May 14, 2021

Looking at the utilities package, as it is also used in SDK and now it still contains way too many 3rd party dependencies. Some ideas:

  • there is the deprecated request package used for a single method - requestPromised, I'd propose to get rid of that, if we use it somewhere, we should stop anyway as it is deprecated and we now have many good replacements
  • ImageProxyClient has an external dependency on create-hmac (which itself has 5 transitive deps, havent checked deeper), so it could go to its own package
  • validateProxyField method uses countries-list which is also huge, so worth separating (maybe there should be other methods moved to such new package?)
  • validateInputUsingValidator depends on ajv, maybe that one belongs more to @apify/input_schema package (where we depends on ajv)?
  • traverseObject depends on is-buffer, node already have Buffer.isBuffer() and that package is basically a one liner using it (although in a bit obfuscated way, e.g. obj.constructor.isBuffer === 'function' && obj.constructor.isBuffer(obj))

Thoughts?

@mnmkng
Copy link
Member

mnmkng commented May 14, 2021

request - yes, remove it
ImageProxyClient - makes sense to me to separate it
validateProxyField - is this releated to the proxy client above?
validateInputUsingValidator - not sure about this one, please check apify-cli, I have a hunch it might be used there
traverseObject - I think that check is there for browser compatibility, we can copy paste it and drop the dependency

@mtrunkat @jbartadev @gippy @jancurn any thoughts on those?

@B4nan
Copy link
Member Author

B4nan commented May 14, 2021

There are more candidates to be moved to input_schema package. All of those are dependent on that package, if we move them all, we would get rid of the dependency utilities -> input_schema (which again has some transitive deps)

  • validateProxyField
  • validateInputUsingValidator

(validateProxyField is where the countries-list is used, so that would go to input_schema dependencies, making the utilities package depend only on log and consts packages)

validateProxyField - is this releated to the proxy client above?

Not sure, probably not. It's a separate method, it is used only in validateInputUsingValidator in the repo.

validateInputUsingValidator - not sure about this one, please check apify-cli, I have a hunch it might be used there

Does it matter? We are talking about moving the code to different package, it will still be somewhere.

traverseObject - I think that check is there for browser compatibility, we can copy paste it and drop the dependency

Ok will inline it then, at least for now.

@B4nan
Copy link
Member Author

B4nan commented May 14, 2021

There is also makeInputJsFieldsReadable that adds cherow as dependency (which is JS parser, by the way a bit outdated - so dated that the repository is already removed on GH - with more recent replacements from the very same author, namely there is escaya that supports TS too).

Sounds like another things that should go to the input_schema package.

@mnmkng
Copy link
Member

mnmkng commented May 14, 2021

Does it matter? We are talking about moving the code to different package, it will still be somewhere.

I'm not saying we should not do it. I'm just saying we should have an idea about the usage first. But yeah, cli is pretty easy to update (if we actually do update it at some point).

It would be worth it to pull at least the apify-core repo and do a search for all those methods there, to have an idea about how widespread their usage is and how painful the migration from apify-shared to @apify/... would be.

@gippy
Copy link
Member

gippy commented May 14, 2021

The validateInputUsingValidator is used in app to correctly validate input in input UI and are required in storage. Should be no problem to import in from somewhere else then apify-shared.
The validateProxyField is probably only used in the validateInputUsingValidator function.

@B4nan
Copy link
Member Author

B4nan commented May 14, 2021

Looking at the create-hmac dependency, why don't we use native crypto package instead?

https://nodejs.org/api/crypto.html#crypto_class_hmac

It was like that originally, but changed in favour of the 3rd party package here:

0d4d024

But no reasoning behind it in the commit message 😢 Apparently the 3rd party one is browser ready? Maybe we should have a separate package for browser code so we have clear separation of what can use node API's and what not?

edit: just for reference, the 3rd party one is also quite fat:

➜  apify-shared-js git:(monorepo) npx howfat create-hmac
Dependencies: 11
Size: 245.95kb
Files: 96

edit2: looks like it is also used in the browser context:

https://github.com/apify/apify-core/search?q=ImageProxyClient

But it seems weird, as the class also uses Buffer which is a node only API, probably polyfilled...

@B4nan
Copy link
Member Author

B4nan commented May 14, 2021

Lol, cherow is really FAT :D

➜  apify-shared-js git:(monorepo) npx howfat cherow
Dependencies: 1270
Size: 56.69mb
Files: 15682

(yes, of course we have that pulled in everywhere, like in the SDK and all its dependents)

For comparison, here is escaya (from the same author, TS ready):

➜  mikro-orm git:(master) ✗ npx howfat escaya
Dependencies: 0
Size: 420.38kb
Files: 144

@B4nan
Copy link
Member Author

B4nan commented May 14, 2021

Managed to replace cherow with escaya, tests are passing. Will still make it a separate package, same for ImageProxyClient.

Or maybe the makeInputJsFieldsReadable could go into input_schema package, WDYT?

@B4nan
Copy link
Member Author

B4nan commented May 21, 2021

Added summary of what changed with regards to the monorepo redesign and automatic publishing into the repo wiki:

https://github.com/apify/apify-shared-js/wiki/The-monorepo-rewrite-story

@mnmkng
Copy link
Member

mnmkng commented May 24, 2021

@B4nan Can I merge this?

@B4nan
Copy link
Member Author

B4nan commented May 24, 2021

We need to add the personal access token to secrets first (with write permissions as it will be used for commiting to the repo). Should be called GH_TOKEN.

Ideally we should use some service account for that, I think I saw one recently in one of the repositories.

@B4nan B4nan merged commit 4a20c24 into master May 25, 2021
@B4nan B4nan deleted the monorepo branch May 25, 2021 14:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Monorepo rewrite deprecated request@2.88.2

6 participants