Site Reliability Engineer specialising in Kubernetes, cloud infrastructure, observability, and platform engineering. 5+ years of ownership, zero excuses.
🚧 GitHub is currently being refreshed.
About me
"You don't always need to have everything figured out. Sometimes growth comes from taking the next opportunity, embracing uncertainty, and being willing to learn along the way."
I grew up in Varanasi in a large family where resources were stretched thin — which, looking back, taught me to be resourceful before I ever touched a terminal. My parents' insistence on good education was the first investment in what became a career defined by ownership and reliability.
My path to engineering wasn't a straight line. After graduation, I didn't get placed through campus recruitment. I spent months preparing for government exams before realising it simply wasn't where my energy belonged. That honest self-assessment changed everything.
I enrolled at CDAC, earned a strong rank, survived a pandemic, and landed my first role. From bare-metal Kubernetes clusters to production systems serving millions of event-goers — every step has been about solving real problems, not just moving up a ladder.
Today I work at BookMyShow SEA, where reliability isn't a buzzword — it's a promise to millions of users trying to book the tickets they've been waiting months for. I own that promise end-to-end.
Career journey
By the numbers
Not vanity metrics — real outcomes that affected real systems and real teams.
Professional experience
Core expertise
From wiring up a bare-metal cluster to tuning Prometheus alert rules at 1am — these are the tools I reach for and trust.
Featured work
Challenge — Cloud spend growing faster than scale
Conducted a comprehensive audit of GKE workload resource requests vs actual usage. Identified systemic over-provisioning across namespaces — teams had set "safe" limits that no workload ever came close to hitting. Implemented right-sizing recommendations, node pool reconfiguration, and committed-use discount strategies. Result: 30% immediate reduction with a clear roadmap to 45%.
Challenge — No identity layer for microservices
Designed and deployed a high-availability Keycloak cluster on GKE with PostgreSQL backend, Infinispan session cache, and NGINX Ingress TLS termination. Built Helm charts for reproducible deployments and documented the runbook for zero-downtime upgrades.
Challenge — Flying blind during incidents
Deployed a full observability stack integrating Prometheus, Grafana, and SigNoz across the GKE cluster. Designed alert rules that fire on SLO burn rate rather than raw thresholds — cutting alert noise dramatically. Built dashboards that made the on-call experience genuinely useful instead of stressful.
Challenge — Unreliable async messaging at scale
Deployed a production-grade RabbitMQ cluster on GKE using the Operator pattern, with quorum queues for durability and cluster-level monitoring integrated into Grafana. Designed the topology to survive node failures without message loss.
Challenge — Manual infrastructure = drift and toil
Codified GCP infrastructure end-to-end using Terraform — VPCs, GKE clusters, IAM bindings, firewall rules, and service accounts. Introduced modular repo structure enabling teams to provision environments consistently, reducing drift and eliminating the "works in staging" problem.
Engineering philosophy
If you do something twice, write a script. If you do it three times, make it a service. Toil is a sign that the system needs work, not the human.
It's not a constraint on shipping velocity. A system that works 99% of the time and ships fast is inferior to one that works 99.9% of the time and ships slightly slower.
Decisions without data are guesses. Good observability isn't about having dashboards — it's about knowing what a healthy system looks like and getting paged when it doesn't.
Every layer of abstraction you add is a layer someone else has to debug at 3am. The right architecture is the simplest one that actually works.
The best platform is one developers don't think about. My job is to make the hard infrastructure problems invisible so product teams can focus on the problems they were actually hired to solve.
True ownership means writing the runbook and training your replacement. Knowledge that lives only in your head is a single point of failure. Bus factor should be greater than one.
Beyond the terminal
Engineering is what I do — but curiosity is who I am. Outside of work, I'm usually either travelling somewhere I've never been, watching a television series that has no right to be this good, or recently, getting embarrassingly competitive at pickleball.
I've started exploring writing — putting words together for the same reason I got into infrastructure: to make something that works. Maybe one day stand-up comedy too (the debugging skills transfer surprisingly well).
I'm from Varanasi. That city has a way of reminding you that not everything needs to be optimised. Some things just need to exist.
"What started as a journey without a clear destination has evolved into a career built on curiosity, accountability, and continuous learning."Wali Hasan
Currently exploring
Get in touch
Whether it's a conversation about SRE practices, a role you think I'd be great at, or just an interesting infrastructure problem — my inbox is open.
Say helloMumbai, India · Open to remote and relocation