Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🚢 Deploy! #1742

Closed
wants to merge 83 commits into from
Closed

🚢 Deploy! #1742

wants to merge 83 commits into from

Conversation

greysteil
Copy link
Contributor

There are 82 commits on master waiting to be deployed.

If there's anything blocking this going out then I'd love to help :octocat:.

ayatk and others added 30 commits March 25, 2018 22:31
This resolves the following error occurring in production:

```
TypeError: Cannot read property 'slice' of null
/home/m/shields/lib/suggest.js in findSuggestions at line 40:33
  const userRepo = url.pathname.slice(1).split('/');
```
This resolves the following error occurring in production:

```
TypeError: Cannot read property 'slice' of null
/home/m/shields/lib/suggest.js in twitterPage at line 63:31
  const schema = url.protocol.slice(0, -1);
```
Instead of centralizing examples, specify them from within a service.

* Avoid duplication in service loading + refactor
* Avoid duplication in URLs, rename uri -> url in BaseService
* Added missing try-catch block

* Added tests to cover malformed responses
* fix(package): update svgo to version 1.0.5
* Update package-lock
* Update invocation for SVGO 1.x
* Remove helper
Make a clear distinction between programmer errors ("internal errors") and runtime errors, and allow configuring the server to let the programmer errors bubble up in development and unit testing. This saves a huge amount of time because it generates ordinary stack traces when things go wrong. And, if these errors occur in production, we'll catch them, and display **shields | internal error** which is the equivalent of a 500 error.
This cleans up the work from #1582, clarifying concerns, removing a bit of duplication, and renaming for clarity.
* optimize cssColor regex

* Add comment & tests

* update test to not export cssColor

* [0-9] -> \d
* Add Stack Exchange logos

* Optimize Stack Exchange logos with svgomg

closes #1637
* Add dynamic yaml badge

* Forgot package lock

* Switch tests to yaml data source

* Add yaml to the dynamic badge maker options

* Reorder to match documentation examples

* Reordered dynamic types to be alphabetical

* Removed regex as pinend commit makes it unnecessary and fixed url

* Removed unused import

* Add more YAML MIME types

* Removed duplicate tests which don't differ between data types
* set expires header corresponding to maxAge

* add tests
@shields-ci
Copy link

Warnings
⚠️

This PR targets gh-pages - It is likely that the target branch should be master

⚠️

This PR modified services/cdnjs/cdnjs.js but not services/cdnjs/cdnjs.tester.js. That's okay so long as it's refactoring existing code.

⚠️

This PR modified services/clojars/clojars.js but not services/clojars/clojars.tester.js. That's okay so long as it's refactoring existing code.

Messages
📖

✨ Thanks for your contribution to Shields, @greysteil!

📖

Thanks for contributing to our documentation. We ❤️ our documentarians!

📖

🎨 Thanks for submitting a logo. Please ensure your contribution follows our guidance for logo submissions.

Generated by 🚫 dangerJS

@RedSparr0w
Copy link
Member

Closing as unfortunately this will not deploy to production.


See #1538 [Deploy status]

Quoting @paulmelnikow:

Deploys usually happen every 1–3 weeks. Thaddée [@espadrine], who has limited time on this project, is the only sysadmin. He's working on giving me access to deploy and logs, but doing so is complicated because the hosting account (and maybe the servers too) are shared with other services he runs.

It has been almost 3 months since the last deploy (Mar 25) so hopefully there will be one soon.
🤞

@RedSparr0w RedSparr0w closed this Jun 20, 2018
@greysteil
Copy link
Contributor Author

Ah, 👍, and thanks both for all your work on this project!

@espadrine
Copy link
Member

espadrine commented Jun 20, 2018

The commit currently deployed is 57a1bf2, which is from May 30.

(I typically deploy every two weeks; although last week-end was spent with the step-family, to prepare my sister's wedding. We were in a rural area of French Bourgogne, which is known for its wine, but not for its Internet access.)

@greysteil What fundamental change are you not seeing? Is there a commit that should have been deployed, but hasn't? Maybe there was a regression?

@greysteil
Copy link
Contributor Author

The commit that I (selfishly) want deployed is ca58d84, but when I saw the badge saying there were 81 commits awaiting deploy and looked into it I figured a deploy was needed more generally, hence the PR.

Is there any way that we can tell which commit is the latest deployed and fix the badge on the readme? I wouldn't have created this PR and caused this trouble if I'd known it was only a few weeks since the last deploy.

commits to be deployed

@paulmelnikow
Copy link
Member

Thanks for deploying the servers! For some reason gh-pages is not updating:

screen shot 2018-06-20 at 11 10 55 am

Hence the status badge being wrong, as well…

@paulmelnikow
Copy link
Member

I know there's a bunch of tooling in the front-end deploy. If it's not working, let me know and I can fix whatever is broken and deploy 57a1bf2.

@espadrine
Copy link
Member

@greysteil No trouble! You're right to ask.

@paulmelnikow I didn't notice that… I'll look into what is wrong tonight.

By the way, with Sentry working, do you feel OK with making deployments yourself? It is not ideal, but you have the ability to deploy individual servers or all of them at once, so this deployment workflow may work for you (it is pretty close to what I do):

  1. Look at new commits compared to the latest deployed hash.
  2. Check that it works on localhost.
  3. Deploy a single server (say s0).
  4. Look it up by IP by loading https://192.99.59.72/index.html in your browser.
  5. Check Sentry for new errors for a bit.
  6. If it looks OK, deploy all servers.
  7. Keep an eye on Sentry.

@espadrine
Copy link
Member

Looks like there indeed is something a bit broken:

$ make website
…
$ make deploy-gh-pages
(LONG_CACHE=true BASE_URL=https://img.shields.io npm run build && \
git checkout -B gh-pages master && \
cp build/index.html index.html && \
cp -r build/_next next && \
pushd next/*/page && mv {_,}error && popd && \
sed -i 's,/_next/,./next/,g' index.html $(find next -type f) && \
sed -i 's,_error,error,g' index.html $(find next -type f) && \
git add -f build index.html next && \
git commit -m '[DEPLOY] Build index.html' && \
git push -f origin gh-pages:gh-pages) || git checkout master

> gh-badges@1.3.0 prebuild /home/tyl/file/github/gh-badges
> npm run depcheck


> gh-badges@1.3.0 depcheck /home/tyl/file/github/gh-badges
> check-node-version --node ">= 8.0"


> gh-badges@1.3.0 build /home/tyl/file/github/gh-badges
> npm run examples && next build && next export -o build/


> gh-badges@1.3.0 examples /home/tyl/file/github/gh-badges
> node lib/export-badge-examples-cli.js > badge-examples.json

> Using external babel configuration
> Location: "/home/tyl/file/github/gh-badges/package.json"
  using build directory: /home/tyl/file/github/gh-badges/.next
  copying "static" directory
  exporting path: /

Switched to and reset branch 'gh-pages'
~/file/github/gh-badges/next/c8b571e7-c012-44f1-9a20-e4eba9321dd0/page ~/file/github/gh-badges
mv: cannot stat '_error': No such file or directory
…

I don't quite know what it is; you are welcome to take a stab at it!

@RedSparr0w
Copy link
Member

Thanks for the update @espadrine 😄,
I probably should have tried some of the badges instead of relying solely on the gh-pages branch.

As for the above problem would changing this line to something like this fix it:

-pushd next/*/page && mv {_,}error && popd && \
// If the _error directory is needed:
+pushd next/*/page && (mv {_,}error || mkdir error\_error) && popd && \
// If not:
+pushd next/*/page && (mv {_,}error || true) && popd && \

@platan
Copy link
Member

platan commented Jul 9, 2018

Can we somehow update gh-pages? Or should we wait for the next deployment?
commits to be deployed
badge and Deployment status in PR feature are based on a content of this branch.

@paulmelnikow
Copy link
Member

@platan I'll look into this tonight.

By the way, with Sentry working, do you feel OK with making deployments yourself? It is not ideal, but you have the ability to deploy individual servers or all of them at once, so this deployment workflow may work for you (it is pretty close to what I do):

@espadrine,

As a stopgap, this seems okay. It's possible that things could break in a way that crashes the servers without getting a message to Sentry. Without logs, we won't know what's causing the problems. We also can't monitor server resources, see our traffic, or control the load balancer.

Could we make a plan to get this project onto dedicated hosting, so that no part of it is dependent on any one person?

@espadrine
Copy link
Member

We can switch the VPS instances one by one to a shared account. That said, SSH access will still need to be heavily restricted. Also, having a proper set up, where each deployment requires the validation of another member, and all SSH sessions are logged in a tamper-proof way, and can be inspected by other members, would require quite a bit of work.

@paulmelnikow
Copy link
Member

Security is important and your time is limited. How about we take this opportunity to move to a PaaS? Badgen is running on Zeit Now and is kicking our butt.

On a PaaS there’s basically no need for SSH. Deploys are managed through Git and logged. Config changes are logged too, or could be managed using a separate config repo. We can configure Github to require multiple signoffs before merging (either to master or to a separate production branch), and configure the PaaS to deploy from that.

Keeping things secure is one challenge. There are many others with which we’ve become intimately familiar. We need to restart the process when it dies or a long time has passed, manage configuration, deployment, and scale adjustments, rotate and distributing logs, balance requests across servers and distribute them across regions, and prevent abuse and denial of service attacks. There are more: monitoring uptime, monitoring request time, monitoring memory usage, deploying security updates, setting SSH config.

For each one I name, there’s another I haven’t thought of. For each that’s been solved, there's another that hasn’t. When we do have solutions, they aren’t fully documented. A lot of them may exist only in the file system or in one person’s head; not in documentation nor the codebase.

Because these solutions are custom, the knowledge of them is concentrated. My concerted effort and the concerted effort of other part-time volunteer maintainers isn’t enough. We simply don’t know enough about what’s happening.

Another consequence of custom solutions is that these solutions are re-solved for Shields even though they have already been solved many times before. This can be fun for learning, but is inefficient in terms of effort.

This is how I’d propose to move forward:

  1. Over the next few weeks we migrate Shields to a PaaS. This would simplify the admin work so a concerted effort of 2–3 maintainers can make virtually any changes that are needed in the future.
  2. Modify the code to accommodate a cloud data store for the app’s persistent data.
  3. Make sure post-deployment monitoring doesn’t fall on one person.
  4. Use required Github reviews so push access to the servers can be limited.
  5. Employ some automation around the deployment process to keep drudgery to a minimum.

With a PaaS, by making certain architectural concessions (such as treating storage as attached resource rather than depending on persistent disk), many of these responsibilities can be delegated to the platform or managed using an existing integration or recipe. The problems can be solved at scale, instead of once for each application, so the solutions are better tested, and the docs are better too. Because the contract between the application and the underlying OS is small, the application can be more straightforwardly ported to another platform. For Shields which is a mostly stateless application — with lots of part-time volunteer maintainers — this seems a particularly good fit.

In an ideal arrangement, contributors on all levels make changes and see the results of those changes in our “product,” and quickly. A few maintainers can work together to make a big change, and this can happen even if one person on the team may be too busy to weigh in. All the cooperative effort has a satisfying payoff: the satisfaction of seeing the results of the work. It’s exciting and fun to see the results. This drives repeat contributions, and deeper and deeper engagement with the project.

We do a good job roping in first-time contributors to make small contributions, and supporting them with documentation, testing, and review comments. In the past year people have contributed patches. However we then frustrate them. Weeks after these PRs land comes the inevitable question: when does this show up in production? Then we explain the bottleneck in our deploy process and why it’s difficult for us to scale out this work to a larger team. This has become so prevalent that the maintainers implemented a badge to show the deploy status of a PR, along with a probot workflow that posts a comment including that badge that lets them check when it’s ready. This is demotivating, both to the contributors and the maintainers providing support.

@chris48s
Copy link
Member

I'm not going to express a strong preference for a particular solution because I don't have access to the logs/monitoring etc so its difficult for me to make an informed judgement on the best way forward given traffic, finances, etc.

I would like to see us move towards better performance, more frequent deploys and the ability to share ops burden with more maintainers (without compromising security, etc). Happy to pitch in with that where appropriate.

One thing I will add to @paulmelnikow 's suggestion is that because everything is served up from behind cloudfront, it should be possible to test a new setup (e.g: PaaS) without doing a 'big bang' migration. We could load-balance some % of traffic to a new solution and some to the existing 3xVPS setup. That would allow a low-risk trial to be conducted on real-world traffic with option to back out or conduct phased migration.

@paulmelnikow
Copy link
Member

We could load-balance some % of traffic to a new solution and some to the existing 3xVPS setup. That would allow a low-risk trial to be conducted on real-world traffic with option to back out or conduct phased migration.

Love that idea!

I'm not going to express a strong preference for a particular solution because I don't have access to the logs/monitoring etc so its difficult for me to make an informed judgement on the best way forward given traffic, finances, etc.

I don't have access to logs or monitoring either, though I could speak to the finances. For a while I was arguing against new hosting for cost reasons. However things have changed. We've gotten some big donations and have some good-sized regular donations too. I took the liberty of reaching out to Zeit about donating hosting, though if that doesn't come through, we could pay our own way.

Clearing up any uncertainty about the cost would be another good reason for the load-balancer approach you're suggesting. We can get better cost numbers that way, and still have an option to back things out.

@paulmelnikow
Copy link
Member

A week ago I wrote @espadrine, following up on the above. I’m concerned. If we don't fix our reliability problems now, I'm afraid our users will go elsewhere and this project will die.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.