2018 DevOps Days Baltimore, Part 2

DevOpsDays Baltimore 2018

Where- Baltimore MD When 22 - 23 March 2018 Official Site - https://www.devopsdays.org/events/2018-baltimore/welcome/
Schedule - http://devopsdaysbaltimore2018.busyconf.com/schedule
YouTube Channel - https://www.youtube.com/channel/UCxpCqO1jg-xyTB8uZo049Bw Slides - https://www.slideshare.net/devopdsaysbaltimore/presentations

Summary

The annual DevOpsDays Conference in Baltimore took place between 22-23 March 2018 at UMBC's Columbus Center on the Inner Harbor. These notes are a supplement to Chad Koepf's notes (2018 DevOps Days Baltimore) from the same event, and not a replacement. However, they are not reflective of every event, as I was not able to attend each Session and/or take precise notes on each either, but are as comprehensive as I could make them.

Day 1

Black Mirror Season 5: DevOps (General Session 2)

Security should be integrated from the start. Move security to the left in the pipeline. It should NOT be an afterthought, and it MUST be automated.

DevOps is NOT a serious of 'automated builds'; it should enable teams to go from business concept to production environments.

Do not make security a burden on teams, but an enabler of speed. High performing teams get this! They spend 50% less time remedying security issues, and have 27% more automated tests.

Why was it possible for the Equifax hack to occur? Simple: Poor security, manually pushing security updates, few automated tests and processes.

Automated builds are NOT CI/CD.

Everyone can contribute in a true DevOps environment. There are no silos between groups/departments, only bridges, so everyone can put something in.

Bottom line from this talk - shift left. The more we force things left, we can take control of the process from end-to-end, and allow us to control our own destiny. Makes our products safer, makes our company's future safer, and makes a better environment for us to work in.

Panel Discussion - Don't Believe the Hype: How We Navigated Federal Tech Policy to Bring Modern Development Practices to the Government.

Populated by representatives from HHS Digital Service, USDS, and Nava Corp.

A lot of issues w/ATOs and government leads wanting to do CI/CD, but updating the ATO every time there's a release. They had to educate the government leaders that they can either do CI/CD, or do the ATO update, but not both.

Not having an actual build/development environment is a hindrance, and they're trying to build one now.

Ignite Talks

1 - Avoiding the Pitfalls of Non-Technical Managers (Blackstone Federal)

Common pitfalls

Communications - You don't know what you're managing and you don't know what you're talking about.
Lack of trust - Team stops talking to you, and they tell you everything is fine.
Not properly managing risk - You're not able to properly manage the risk on your project, and you need to be constantly learning to avoid this. Take an hour a week with your tech lead and go over your infrastructure and/or your code base to learn about what you do. Teach this to others, so that way others can sit down w/one another for an hour a week. This helps to create jacks of all trades, including yourself.

Know your strengths

Be a good manager
Know how to run a good meeting - have an agenda, stick to it, keep it short, make sure there's value to a meeting.
Ask questions - talk to your team members, and ask "What's one thing I can do to make this team better?"

Be good at what you're good at, and know a little about everything.

2 - Definition of Done for DevSecOps

Traditional 'Done' Model: Code is committed, builds w/out error, passes tests, code reviewed, etc. A team needs to add to that in order to be effective.

The team needs to ask itself: "Do we have a viable candidate for production?"

Examples of static analysis tools - PHP Mess Detector, SonarQube, Pylint, etc.

Peer Review the code.

User role testing - test that each kind of user can do what they're supposed to do and can't do what they aren't.

Security scanning

Repeatable, reliable deployments

Bill of materials - libraries of what you're using, what software you're using for builds & deployments, etc. Keep a list of this.

Update your system constantly. Do incremental upgrades instead of all at once.

Lock down your system (OpenSCAP, Fail2Ban, etc.).

Monitor your logs and the app and server.

Assess the risk of every change you put into the system, and accept those risks.

Reflect, consider your situation, and keep improving. No definition of 'done' is set in stone.

3 - The Internship - Running a killer summer internship program

Contrast Security is in the 4th year of their internship program.

Why do you want an internship program? Because it's your pipeline for developers. You're up against a lot of competition, and you need to establish your brand.

Establishing their brand:

Contrast has a Github page, a Stack Overflow page, and a well-built website.
They use the city of Baltimore as part of their marketing pitch.
'Cats' (Project Meow) are their branding and marketing schtick. Recruiting:
Go to university career fairs.
They look for and hire engineers that do products.
How to recruit the best & brightest at your table: FOOD. It's part of their culture.
Be cautious when hiring college kids - parents will be/get involved. Internship
Internship planning process is a year long. Everyone at the company gets involved.
Buddy up interns w/mentors (3 mentors throughout the year).
Do a Shark Tank-style event by inviting their VCs to take part and have their interns pick an idea and run with it.

4 - Weekender's Guide to On-Call

Somebody has to be on call all the time in case something breaks in the middle of the night and/or on the weekends.

Having good metrics and good ways to display said metrics (graphs/charts) help w/determining what needs to be done and when you may experience problems.

Recurring alerts - if this is happening often, this needs to be fixed.

5 - Elegant Weapons for a More Civilized Age

Play to your Team's Strengths. If you have people who know a certain language, do stuff in that language. Sse platforms you already use.

Allow for Automation - installation, configuration, updates, and usage. If it ain't automated, it's broken.

Architecture/Decision Log - Can be a as simple as a decisions.md file. Put it through code review.

Play it safe - Spend your Innovation Points wisely. Make nice and boring decisions so you won't wake anyone up at 3 am.

'Boring' is just another word for 'elegant' & 'reliable.'

Day 2

Talk #1 - Production Testing Through Monitoring

Testing is required. Most people do some level of testing (does the code compile? did it run? etc. very basic), but in some organizations the testing baseline is very rigorous. But testing is not enough.

There's all different types of testing - unit testing, functional testing, resilience testing, performance testing, etc. These aren't the only ones, but they are some of the biggest (they have whole conferences on them).

Testing can give a false sense of security. It tends to be very deterministic. It's hard to truly do randomized testing b/c of the multitude of variables and types of scenarios.

The Data Problem - the type, quality, and quantity (frequency) of the data you're collecting.

Wolfe+585 - Google this. It's a real name. And it will break your user-input forms.

Users (n) - distributed fault injection test suite for production.

You can test all you want, but you can't predict the future, so you could test all you want, but it still might fail. Lack of foresight, too many use-cases, change of assumptions, etc. You can NEVER have enough tests to fix everything.

The goal of testing is to win the confidence game. We want to be reasonably confident that the code we are going to push will not break production.

Testing is good for 'known knowns.' Test is not so good for the 'unknown unknowns.' This is where monitoring comes in.

Why monitor? B/c software is never perfect, systems are complex, external dependency worry, proactive is better than reactive, and b/c things change A LOT in production.

What to monitor? "In God we trust, all others we monitor." Systems, databases, apps, integration points, caching systems, performance, user behavior (whether features are still used, or being used in ways you didn't expect, etc). Is it enough? Is it too much? Remember, 'servers working' doesn't mean 'site is working.'

"I don't give a f*** if the datacenter is on fire as long as I am still making money." - CEO

Most people use monitoring to focus on the wrong things. How do I know the technology is working versus if my business is working. It helps us to know what to funnel our money into and where to put the focus so better maximize our efficiency.

We need to be smarter about how we talk about monitoring (versus 'monitor all the things'). Often when people say 'monitoring' they actually mean 'observability.' Add 'observability' to 'all the things'. This means that you don't necessarily have monitoring of everything at once, but you have enough information to know what's going on, and you can go find it quickly.

Sometimes we don't know there's a problem until there's a business impact. If something is impacting the business (i.e. revenue) and it's not being monitored, it should be monitored after the fact.

Instrumentation is never done.

In Conclusion:

We need testing AND monitoring, not testing OR monitoring.
You've got the understand the organization you're in. What's it's mission, what's it's purpose, what's the point of having it? Understand these.
Add observability to everything you can. It doesn't mean you have to screen everything and have years of log files. It means you have to be able to get enough information to know what's going on, and determine if the system is healthy or not.
Monitor things that are impactful.
Alert only on actionable emergencies.

Talk #3 - Disaster Resilience

Is it possible to operate in a reduced mode in the event of a specific disaster?

There's always a certain degree of 'brokenness' to our software, so the key is to not have so much that our customers notice it.

The area between 'success' and 'failure' is really nebulous and undefined.

It helps to know what to do when certain types of failures occur (loss of electricity, loss of water, loss of coolant, etc.)

Reduced service is better than no service.

Recommendations

I have two major recommendations based on my experience at this conference:

We should tell our government partners to attend this conference next year. Last year I recommended the ADV Summit to my guys, and they found it very useful, so I'll recommend this next year. Driving more government developers to conferences like this can not only help them share their experiences with other developers, but potentially sow the seeds of new ideas in their minds to take back to their offices.
We need a more robust ACES presence at DevOpsDays. There are A LOT of young and aspiring developers here, as well as industry insiders, looking for work and collaboration efforts. Taking advantage of this invites the possibility of creating a strong pipeline of new and experienced developers, while also getting ACES' name out there more prominently.

ACES Learn to Code

Provide feedback

Saved searches

Use saved searches to filter your results more quickly