The goal of this handbook is to give you clarity on DevOps:
- Understand whatās DevOps (in simple words)
- Know whatās possible with DevOps (in simple goals)
- Get simple āwhen-to-do-whatā DevOps guidelines
ā
I added a bonus at the bottom of the article.
It's a production-ready setup example you could take inspiration from.
ā
Who this article is for
You might be a founder who wishes to get started with DevOps the right way.
You might be a CTO of a 1,000 employees company who wishes to get simple principles.
Or, maybe youāre a Software Engineer, and you want to understand if your companyās DevOps approach is good.
If youāre looking for a simple DevOps playbook, this is it.
ā
Understand the desired result
Two things your company needs to be able to do
- Serve its product to customers
- Build and improve the product
ā
Abilities you need to build, improve, and serve software
- Run experiments and test changes
ā
DevOps has a simple meaning
Developers and Operators have shared responsibility for building and improving the system.
In practice:
- Developers are responsible to āOperateā
- DevOps Engineers are responsible to enable to āOperateā AND do some of it themselves
Operate = provision, monitor, secure, configure, deploy, scale.
ā
Choose a balance: Enabler, Doer, or Automator
The DevOps role will end up as a balance between:
- Enabler: Provides the tools and knowledge to fulfill the DevOps goals
- Doer: Does the tasks that fulfill the DevOps goals
- Automator: Automates any repeating operation
ā
Know what things you should enable, do, or automate
- Provision infrastructure
- Secure the system
- Deploy workloads
- Monitor the system
- Recover from issues
- Scale up or down
- Track & test changes
- Automate processes
ā
Choose the right tools
- Has state management = Saves time automating state-aware processes (e.g., Terraform)
- Has a big community & good docs = Saves time dealing with common issues (e.g., Kubernetes)
- Has multiple interface types: API, CLI, UI = Saves time integrating with the existing system (e.g., Vault)
ā
Set useful goals
There are DevOps goals that adopting them will focus you on the right direction:
- One-Click Environments: makes e2e tests easy and quick
- Atomic Commits: provides confidence that a tested change will work in production
- Separate the Shared & Env-Specific Parts: enables e2e tests as the company scales up
If you want to learn about more useful DevOps goals, feel free to book a free consultation here.
ā
Enablers: Choose the Tools-to-Knowledge Balance
Developers can either have the knowledge or the tools to do something.
- More knowledge-reliance: if you want the developers to contribute to the DevOps efforts
- More tools-reliance: if you want to abstract the operations from the developers
If the balance between the two is not intentional, itās accidental.
ā
Doers: Have a good reason to do it
- Is it a one-time task?
- Does it teach you how the developers work?
- Are you directly accountable for the results of the task?
ā
If you answered ānoā to the above questions, enable or automate it instead.
ā
Doing more = Learning the system's use-cases
Doing too much = Not scalable, too-much knowledge-reliance
ā
Automators: Have a good reason to automate it
- Did it happen before?
- Is it likely to happen again?
- Will automating it take less time than doing it?
- Will automating it teach you an important company process?
If you answered āyesā to 2 out of the 4 questions - automate it!
ā
More automations = Less reliance on knowledge to operate the system.
Too much automations = No system awareness.
ā
P.S. - you can also enable developers to automate it.
ā
Create available DevOps Capacity
The DevOps needs of a company have spikes.
One month you need 2 DevOps Engineers, and half of that the next month.
Switchovers between big efforts and small tasks are common.
This is true, especially for new companies.
Break the assumption: āDevOps tasks must be done by a DevOps Engineerā.
ā
There are 3 types of DevOps capacity
- Non-Flexible: A full-time DevOps Engineer on the team
- Semi-Flexible: Key developers that can contribute to the DevOps goals
- Fully-Flexible: A flexible DevOps Services company or freelancer
You can read more about calculating the DevOps capacity your company needs here.
ā
When to focus on what: Common Dilemmas
When: You work alone, and the system is simple
ā
āFocus: On simplifying the development - Dockerize your apps, Create a post-commit pipeline that runs tests
ā
When: You need to be able to create new environments quickly (for development, or for clients)
ā
āFocus: On implementing āOne-Click Environmentsā: Using IaC (e.g., Terraform) + Deployment tool (Depends on the platform).
ā
When: You want to e2e test every code modification, but there are many code modifications
ā
āFocus: On splitting the āOne-Click Envā into a ābaseā with shared resources, and āenvā with env-specific resources
ā
When: You want to unify & standardize how you deploy, monitor, scale, configure, and secure your workloads
ā
āFocus: On implementing an orchestrator such as Kubernetes
ā
When: You want you have many moving parts and wish to be certain a tested change will work
ā
āFocus: On implementing GitOps and consider a Monorepo (the sooner the better)
ā
When: You want the DevOps efforts to be done by the dev team
ā
āFocus: On using āactualā IaC tools (Pulumi Typescript/Python), Full āhow to operateā (see above) documentationā
ā
Never: - Invest lots of time in new tech without a strong reason
ā
Always:
- Have your code in Git
- Monitor the basic stuff: CPU, Memory, Disk, Network, App Logs, Cloud Costs
- Architect for high-availability
- Test before you deploy
ā
BONUS: An example setup for a CTO approaching Production

2 AWS Accounts
- One for development and staging
- Another for production
ā
Monorepo in Github
- Docker-Compose for local development
ā
2 Infrastructure-as-Code projects: 'base' & 'apps'
- base = shared resources (e.g., VPC, RDS, ECS Cluster, EKS Cluster)
- apps = env-specific resources (e.g., Lambda Functions, ECS Services, Kubernetes Namespaces)
- config file per environment
ā
Github Actions Workflow: Development workflow
- Checkout branch and locally develop + test changes
- Create a Pull Request: Deploys a Pull-Request āappsā environment on the ādevelopmentā environment ābaseā
- On merge to main: Deploys from the āmainā branch an āappsā environment onto the ādevelopmentā environment ābaseā
- Manual: Deploy from the āmainā branch onto the āstagingā / āproductionā environment ābaseā
ā
Notes:
- Avoid mentioning an environmnent's name in the code for conditional resources deployment
- Use each environmentās config file to declare if a resource should be created
- Could be implemented using Terraform, Terragrunt, Pulumi, CDK, and other IaC tools
- Production should have 2-instances of every workload for high-availability
ā
If youād like to see this setup in your startup, click here to book a call šš¼
ā
P.S. - I'll be updating this page occasionally, so you might want to visit again
ā
ā
Another Bonus: DevOps Dictionary for Human Beings
| Term | Definition | Tools |
| Environment | A working instance of the entire system | |
| CI (Continuous Integration) | Enable developers to collaborate by agreeing on a single source-of-truth (master/main) | Jenkins, Github Actions, GitlabCI |
| CD (Continuous Delivery) | Create an artifact thatās ready for production (tested, tagged) | JFrog Artifactory, Nexus, AWS ECR |
| CD (Continuous Deployment) | Every available deliverable (artifact) gets deployed automatically | ArgoCD, Jenkins, AWS CodeDeploy |
| Monitoring / Observability | Collect metrics/traces/logs from apps and infrastructure, analyze them, and display them, and setup alerts | Prometheus, Jaeger, Elasticsearch, Fluentd, OpenTelemetry |
| Infrastructure | The resources on which the workloads run, in which the data is stored, and through which the network flows | Servers, Databases, Network Routers & Switches |
| Cloud Infrastructure | Same as the above, but specifically in the cloud | AWS EC2, AWS RDS, GCP Compute Engine, Azure Virtual Machines |
| Cloud | Computing & Data services served from remote locations for you to build your system | AWS, Azure, GCP |
| Containerization & Virtualization | Technologies utilizing Kernel & OS features to create virtual machines, or isolate process (AKA run containers) | Docker, vSphere, KVM |
| Secrets Management | Storing and retrieving sensitive configurations (e.g., tokens, passwords) | Hashicorp Vault, AWS Secrets Manager, SealedSecrets |
| Configuration Management | Usually refers to preparing servers for workloads (e.g., creating directories & files, starting processes) | Ansible, Chef, Puppet |
| Version Control | Saving the code in a versioned way (Git) | Github, Gitlab |
| GitOps | Making the system is the same as itās described in Git | Flux, ArgoCD, Jenkins |
| Monorepo | All of the companyās code is in one Git Repository | NX, Turborepo |
| Polyrepo | Multiple Git repositories for different components | |
| IaC (Infrastructure-as-Code) | Creating Cloud infrastructure with idempotent code and state management | Terraform, Pulumi, CDK, Crossplane |
| Deployment | Execute, serve, or install the artifacts | ArgoCD, Jenkins, AWS CodeDeploy, Scripts (Bash, Python, etc.) |
| Orchestrator | Dynamically allocating workloads to a pool of nodes | Kubernetes, Nomad, AWS ECS |
| Authentication & Authorization | Making sure each person, workload, or resource, has access only to whatās necassary (other workloads and resources) | AWS IAM, OpenID, OpenVPN, Twingate, Istio |
| Service Discovery | Exposing available workloads using DNS | Consul, CoreDNS |




