A curated list of Site Reliability and Production Engineering resources.
-
Updated
Jun 10, 2024
Site reliability engineering (SRE) is a set of principles and practices that incorporates aspects of software engineering and applies them to infrastructure and operations problems. The main goals are to create scalable and highly reliable software systems. Site reliability engineering is closely related to DevOps, a set of practices that combine software development and IT operations, and SRE has also been described as a specific implementation of DevOps.
A curated list of Site Reliability and Production Engineering resources.
A curated collection of publicly available resources on how technology and tech-savvy organizations around the world practice Site Reliability Engineering (SRE)
A Chaos Engineering Platform for Kubernetes.
A curated list of Chaos Engineering resources.
An easy to use and powerful chaos engineering experiment toolkit.(阿里巴巴开源的一款简单易用、功能强大的混沌实验注入工具)
Litmus helps SREs and developers practice chaos engineering in a Cloud-native way. Chaos experiments are published at the ChaosHub (https://hub.litmuschaos.io). Community notes is at https://hackmd.io/a4Zu_sH4TZGeih-xCimi3Q
Chaos testing, network emulation, and stress testing tool for containers
A collection of postmortem templates
A curated list of Site Reliability and Production Engineering Tools
Web UI for Jaeger
Making on-call suck less for engineers
This repository includes resources which are more than sufficient to prepare for google interview if you are applying for a software engineer position or a site reliability engineer position
On-Call Assistant for Prometheus Alerts - Get a head start on fixing alerts with AI investigation
What to Read to Learn More About DevOps
Curated list of good SRE interview questions.
A chaos engineering platform for supporting the complete fault drill lifecycle.
Open-source AI copilot that lets you chat with your observability data and code 🧙♂️
A role-playing game for incident management training
Google Site Reliability Engineering book converted in audio
OpenShift Guide. Learn about the Red Hat OpenShift Container Platform, Data Science, Code Ready Containers, Podman, Buildah, and Kubernetes.