Skip to content

Commit 212c9fa

Browse files
committed
finish latest tech talk post
1 parent ddc8bf9 commit 212c9fa

File tree

1 file changed

+56
-56
lines changed

1 file changed

+56
-56
lines changed

content/posts/171101-continuous-delivery-devops-you.markdown

Lines changed: 56 additions & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -230,27 +230,23 @@ own experiences handling these difficult technical situations.)
230230

231231

232232
<img src="/img/171101-devops-cd-you/devops-cd-you.023.jpg" width="100%" class="technical-diagram img-rounded" style="border: 1px solid #aaa" alt="Billing incident update blog post.">
233-
234233
One step is to figure out when the problem started and whether or not it
235234
is over. If it's not over, triage the specific problems and start
236235
communicating with customers. Be as accurate and transparent as possible.
237236

238237

239238
<img src="/img/171101-devops-cd-you/devops-cd-you.024.jpg" width="100%" class="technical-diagram img-rounded" style="border: 1px solid #aaa" alt="Redis logo.">
240-
241239
The specific technical issue in this case was due to our misconfiguration of
242240
Redis instances.
243241

244242

245243
<img src="/img/171101-devops-cd-you/devops-cd-you.025.jpg" width="100%" class="technical-diagram img-rounded" style="border: 1px solid #aaa" alt="Text that reads 'Root cause?'">
246-
247244
We know the particular technical failure was due to our Redis mishandling,
248245
but how do we look past the specific bit and get to a broader understanding
249246
of the processes that caused the issue?
250247

251248

252249
<img src="/img/171101-devops-cd-you/devops-cd-you.026.jpg" width="100%" class="technical-diagram img-rounded" style="border: 1px solid #aaa" alt="Billing incident response from Twilio developer evangelist.">
253-
254250
Let's take a look at the resolution of the situation and then learn about
255251
the concepts and tools that could prevent future problems.
256252

@@ -262,61 +258,53 @@ own environments.
262258

263259

264260
<img src="/img/171101-devops-cd-you/devops-cd-you.027.jpg" width="100%" class="technical-diagram img-rounded" style="border: 1px solid #aaa" alt="Twilio status page.">
265-
266261
Twilio became more transparent with the status of services, especially with
267262
showing partial failures and outages.
268263

269264

270265
<img src="/img/171101-devops-cd-you/devops-cd-you.028.jpg" width="100%" class="technical-diagram img-rounded" style="border: 1px solid #aaa" alt="Twilio number of production deployments.">
271-
272266
Twilio was also deliberate in avoiding the accumulation of manual processes
273267
and controls that other organizations often put in place after failures. We
274268
doubled down on resiliency through automation to increase our ability to
275269
deploy to production.
276270

277271

278272
<img src="/img/171101-devops-cd-you/devops-cd-you.029.jpg" width="100%" class="technical-diagram img-rounded" style="border: 1px solid #aaa" alt="Text that reads 'tools and concepts'.">
279-
280273
What are some of the tools and concepts we use at Twilio to prevent future
281274
failure scenarios?
282275

283276

284277
<img src="/img/171101-devops-cd-you/devops-cd-you.030.jpg" width="100%" class="technical-diagram img-rounded" style="border: 1px solid #aaa" alt="Eventually you ship code into production that breaks your application.">
285-
286278
If you do not have the right tools and processes in place, eventually you
287279
end up with a broken production environment after shipping code. What is
288280
one tool we can use to be confident that the code going into production is
289281
not broken?
290282

291283

292284
<img src="/img/171101-devops-cd-you/devops-cd-you.031.jpg" width="100%" class="technical-diagram img-rounded" style="border: 1px solid #aaa" alt="Text that reads 'automated testing' with example code coverage in the background.">
293-
294-
Automated testing, in its many forms, such as unit testing, integration
295-
testing, security testing and performance testing, helps to ensure the
296-
integrity of the code. You need to automate because manual testing is too
297-
slow.
285+
Automated [testing](/testing.html), in its many forms, such as unit testing,
286+
integration testing, security testing and performance testing, helps to
287+
ensure the integrity of the code. You need to automate because manual
288+
testing is too slow.
298289

299290
Other important tools that fall into the automated testing bucket but are
300291
not traditionally thought of as a "test case" include code coverage and
301-
code metrics (such as Cyclomatic Complexity).
292+
[code metrics](/code-metrics.html) (such as Cyclomatic Complexity).
302293

303294

304295
<img src="/img/171101-devops-cd-you/devops-cd-you.032.jpg" width="100%" class="technical-diagram img-rounded" style="border: 1px solid #aaa" alt="Automated tests in dev only deploy to production when they are successful.">
305-
306296
Awesome, now you only deploy to production when a big batch of automated
307297
test cases ensure the integrity of your code. All good, right?
308298

309299

310300
<img src="/img/171101-devops-cd-you/devops-cd-you.033.jpg" width="100%" class="technical-diagram img-rounded" style="border: 1px solid #aaa" alt="Bugs can still occur in production.">
311-
312301
Err, well no. Stuff can still break in production, espcially in environments
313302
where for various reasons you do not have the same exact data in test
314303
that you do in production. Your automated tests and code metrics will
315304
simply not catch every last scenario that could go wrong in production.
316305

317306

318307
<img src="/img/171101-devops-cd-you/devops-cd-you.034.jpg" width="100%" class="technical-diagram img-rounded" style="border: 1px solid #aaa" alt="Text that reads 'monitoring and alerting' with New Relic dashboard in the background.">
319-
320308
When something goes wrong with your application, you need monitoring to
321309
know what the problem is, and alerting to tell the right folks. Traditionally,
322310
the "right" people were in operations. But over time many organizations
@@ -325,7 +313,6 @@ developers who wrote the code that had the problem.
325313

326314

327315
<img src="/img/171101-devops-cd-you/devops-cd-you.035.jpg" width="100%" class="technical-diagram img-rounded" style="border: 1px solid #aaa" alt="When something breaks in prod, your developers know about it and can fix the problem.">
328-
329316
A critical piece to DevOps is about ensuring the appropriate developers
330317
are carrying the pagers. It sucks to carry the pager and get woken up in the
331318
middle of the night, but it's a heck of a lot easier to debug the code that
@@ -339,14 +326,12 @@ something will blow up on you later on at a less convenient time.
339326

340327

341328
<img src="/img/171101-devops-cd-you/devops-cd-you.036.jpg" width="100%" class="technical-diagram img-rounded" style="border: 1px solid #aaa" alt="When production is running smoothly with many tests, do that increase the chance of black swan-type events?">
342-
343329
Typically you find though that there are still plenty of production errors
344330
even when you have defensive code in place with a huge swath of the most
345331
important parts of your codebase being constantly tested.
346332

347333

348334
<img src="/img/171101-devops-cd-you/devops-cd-you.037.jpg" width="100%" class="technical-diagram img-rounded" style="border: 1px solid #aaa" alt="Text that reads 'Chaos engineering' with the chaos engineering monkey logo in the background.">
349-
350335
That's where a concept known as "chaos engineering" can come in. Chaos
351336
engineering breaks parts of your production environment on a schedule and
352337
even unscheduled basis. This is a very advanced technique- you are not going
@@ -355,121 +340,136 @@ or appropriate controls in place.
355340

356341

357342
<img src="/img/171101-devops-cd-you/devops-cd-you.038.jpg" width="100%" class="technical-diagram img-rounded" style="border: 1px solid #aaa" alt="Chaos engineering introduces intentional failures in your infrastructure both on a scheduled and unschedule basis.">
358-
359343
By deliberately introducing failures, especially during the day when your
360344
well-caffeinated team can address the issues and put further safeguards in
361345
place, you make your production environment more resilient.
362346

363347

364348
<img src="/img/171101-devops-cd-you/devops-cd-you.039.jpg" width="100%" class="technical-diagram img-rounded" style="border: 1px solid #aaa" alt="Text that reads '1. other peoples money' with money in the background.">
365-
366349
We talked about the failure in Twilio's payments infrastructure several years
367350
ago that led us to ultimately become more resilient to failure by putting
368351
appropriate automation in place.
369352

370353

371354
<img src="/img/171101-devops-cd-you/devops-cd-you.040.jpg" width="100%" class="technical-diagram img-rounded" style="border: 1px solid #aaa" alt="Text that reads '2. other peoples lives' with people in the background.">
372-
373355
Screwing with other people's money is really bad, and so is messing with
374356
people's lives.
375357

376358

377359
<img src="/img/171101-devops-cd-you/devops-cd-you.041.jpg" width="100%" class="technical-diagram img-rounded" style="border: 1px solid #aaa" alt="Text that reads 'War on Terror' with an exploded vehicle in the background.">
378-
379360
Let's discuss a scenario where human lives were at stake.
380361

381362
To be explicit about this next scenario, I'm only going to talk about public
382363
information, so my cleared folks in the audience can relax.
383364

384365

385366
<img src="/img/171101-devops-cd-you/devops-cd-you.042.jpg" width="100%" class="technical-diagram img-rounded" style="border: 1px solid #aaa" alt="U.S. military and civilian casualties in Iraq.">
386-
387367
During the height of U.S forces' Iraq surge in 2007, more improvised explosive
388368
devices were killing and maiming soldiers and civilians than ever before. It
389369
was an incredible tragedy that contributed to the uncertainty of the time in
390370
the country.
391371

372+
373+
<img src="/img/171101-devops-cd-you/devops-cd-you.043.jpg" width="100%" class="technical-diagram img-rounded" style="border: 1px solid #aaa" alt="Biometrics devices.">
392374
However, efforts in biometrics were one part of the puzzle that helped to
393375
prevent more attacks, as shown in this picture from General Petraeus' report
394376
to Congress.
395377

396378

397-
<img src="/img/171101-devops-cd-you/devops-cd-you.043.jpg" width="100%" class="technical-diagram img-rounded" style="border: 1px solid #aaa" alt="Biometrics devices.">
398-
399-
...
400-
401-
402379
<img src="/img/171101-devops-cd-you/devops-cd-you.044.jpg" width="100%" class="technical-diagram img-rounded" style="border: 1px solid #aaa" alt="Eclipse IDE.">
403-
404-
...
380+
One major challenge with the project was a terrible manual build process that
381+
literally involved clicking buttons in an integrated
382+
[development environment](/development-environments.html) to create the
383+
application artifacts. The process was too manual and the end result was that
384+
the latest version of the software took far too long to get into production.
405385

406386

407387
<img src="/img/171101-devops-cd-you/devops-cd-you.045.jpg" width="100%" class="technical-diagram img-rounded" style="border: 1px solid #aaa" alt="The situation did not have reasonable deployments to dev or to production.">
408-
409-
...
388+
We did not have automated deployments to a development environment, staging
389+
or production.
410390

411391

412392
<img src="/img/171101-devops-cd-you/devops-cd-you.046.jpg" width="100%" class="technical-diagram img-rounded" style="border: 1px solid #aaa" alt="Start somewhere, automate your deployments to dev environment.">
393+
Our team had to start somewhere, but with a lack of approved tools, all we
394+
had available to us was shell scripts. But shell scripts were a start. We were
395+
able to make a very brittle but repeatable, automated deployment process to
396+
a development environment?
413397

414-
...
398+
There is still a huge glaring issue though: until the code is actually
399+
deployed to production it does not provide any value for the users.
415400

416401

417402
<img src="/img/171101-devops-cd-you/devops-cd-you.047.jpg" width="100%" class="technical-diagram img-rounded" style="border: 1px solid #aaa" alt="Some environments have tricky issues with automated prod deployments like disconnected networks.">
403+
In this case, we could never fully automate the deployment because we had to
404+
burn to a CD before moving to a physically different computer network. The
405+
team could automate just about everything else though, and that really mattered
406+
for iteration and speed to deployment.
418407

419-
...
408+
You do the best you can with the tools at your disposal.
420409

421410

422411
<img src="/img/171101-devops-cd-you/devops-cd-you.048.jpg" width="100%" class="technical-diagram img-rounded" style="border: 1px solid #aaa" alt="Text that reads 'Tools and concepts'.">
423-
424-
...
412+
What are the tools and concepts behind automating deployments?
425413

426414

427415
<img src="/img/171101-devops-cd-you/devops-cd-you.049.jpg" width="100%" class="technical-diagram img-rounded" style="border: 1px solid #aaa" alt="Several development teams commit to a Git repository.">
428-
429-
...
416+
Source code is stored in a
417+
[source control (or version control)](/source-control.html) repository.
418+
Source control is the start of the automation process, but what do we need
419+
to get the code into various environments using a repeatable, automated
420+
process?
430421

431422

432423
<img src="/img/171101-devops-cd-you/devops-cd-you.050.jpg" width="100%" class="technical-diagram img-rounded" style="border: 1px solid #aaa" alt="Text that reads 'continuous integration' with a screenshot of Jenkins dashboard in the background.">
433-
434-
...
424+
This is where [continuous integration](/continuous-integration.html) comes
425+
in. Continuous integration takes your code from the version control system,
426+
builds it, tests it and calculate the appropriate code metrics before the
427+
code is deployed to an environment.
435428

436429

437430
<img src="/img/171101-devops-cd-you/devops-cd-you.051.jpg" width="100%" class="technical-diagram img-rounded" style="border: 1px solid #aaa" alt="Add a continuous integration server to build the code that is committed to your source control repository.">
438-
439-
...
431+
Now we have a continuous integration server hooked up to source control, but
432+
this picture still looks odd.
440433

441434

442435
<img src="/img/171101-devops-cd-you/devops-cd-you.052.jpg" width="100%" class="technical-diagram img-rounded" style="border: 1px solid #aaa" alt="How do we automate the building of these environments and the deployments themselves?">
443-
444-
...
436+
Technically, continuous integration does not handle the details of the build
437+
and how to configure individual execution environments.
445438

446439

447440
<img src="/img/171101-devops-cd-you/devops-cd-you.053.jpg" width="100%" class="technical-diagram img-rounded" style="border: 1px solid #aaa" alt="Text that reads 'configuration management' with a screenshot of Ansible AWX in the background.">
448-
449-
...
441+
[Configuration management](/configuration-management.html) tools handle the
442+
setup of application code and environments.
450443

451444

452445
<img src="/img/171101-devops-cd-you/devops-cd-you.054.jpg" width="100%" class="technical-diagram img-rounded" style="border: 1px solid #aaa" alt="Agile sprints deliver code to a development environment and then automate the deployment into production.">
453-
454-
...
446+
Those two scenarios provided some context for why DevOps and Continuous
447+
Delivery matter to organizations in varying industries. When you have high
448+
performing teams working via the Agile development methodology, you will
449+
encounter a set of problems that are not solvable by doing Agile "better". You
450+
need the tools and concepts we talked about today as well as a slew of other
451+
engineering practices to get that new code into production.
455452

456453

457454
<img src="/img/171101-devops-cd-you/devops-cd-you.055.jpg" width="100%" class="technical-diagram img-rounded" style="border: 1px solid #aaa" alt="Review list of continuous delivery tools.">
458-
459-
...
455+
The tools and concepts we covered today were
456+
[automated testing](/testing.html), [monitoring](/monitoring.html), chaos
457+
engineering, [continuous integration](/continuous-integration.html) and
458+
[configuration management](/configuration-management.html).
460459

461460

462461
<img src="/img/171101-devops-cd-you/devops-cd-you.056.jpg" width="100%" class="technical-diagram img-rounded" style="border: 1px solid #aaa" alt="A list of more concepts and tools for continuous delivery.">
463-
464-
...
462+
There are many other practices you will need as you continue your journey.
463+
You can learn about
464+
[all of them on Full Stack Python](/table-of-contents.html).
465465

466466

467467
<img src="/img/171101-devops-cd-you/devops-cd-you.057.jpg" width="100%" class="technical-diagram img-rounded" style="border: 1px solid #aaa" alt="Thank you slide.">
468468

469469
That's all for today. My name is [Matt Makai](/about-author.html)
470470
and I'm a software developer at [Twilio](/twilio.html) and the
471-
author of [Full Stack Python](https://www.fullstackpython.com/),
472-
thank you very much.
471+
author of [Full Stack Python](https://www.fullstackpython.com/).
472+
Thank you very much.
473473

474474

475475
----

0 commit comments

Comments
 (0)