Database Excellence Stage
Mission
Keep GitLab’s databases running reliably through proactive health management, operational excellence, and strategic enablement. We maintain operational runway by identifying and mitigating saturation points, operate infrastructure with automated and scalable processes, and provide tools and frameworks that help teams build features sustainably. While our primary focus is GitLab.com, we are expanding our scope to provide database health frameworks and tooling that benefit self-managed customers as well.
Groups
This stage consists of the following groups:
Database Architecture
The Database Architecture group enables teams to build sustainably with data by providing decision frameworks for data placement, data growth controls, and coordinating the database review process across all datastores.
Priorities:
- Enabling teams to make sustainable data architecture decisions
- Preventing database performance issues before they reach production
- Establishing and maintaining data lifecycle best practices
Database Health
The Database Health group provides the monitoring, observability, and health frameworks that keep databases healthy across both GitLab.com and self-managed deployments, including shift-left identification of saturation points.
Priorities:
- Maintaining operational runway by proactively managing database saturation points
- Providing visibility into database health across all deployment types
- Optimizing database resource utilization and cost efficiency
Database Automation
The Database Automation group owns the automation frameworks, tools, and templates that make GitLab’s Postgres databases easier to operate at scale — replacing manual, bespoke processes with standardized, repeatable automation. All three teams contribute automations, but Database Automation owns the frameworks and manages the planning load for infrastructure changes.
Priorities:
- Replacing manual database operations with standardized, automated processes
- Building reusable tooling for database provisioning, configuration, and upgrades
- Enabling reliable, repeatable database operations across deployment types
| Name | Role |
|---|---|
Biren Shah
|
Senior Database Reliability Engineer |
Saad Ullah
|
Senior Site Reliability Engineer |
Matt Kasa
|
Staff Backend Engineer, Database |
Jon Jenkins
|
Senior Backend Engineer, Database |
Previous Teams
Previously, this stage consisted of 2 teams: Database Frameworks and Database Operations. These teams had a very large and overlapping scope covering our production database systems, but had different tools at their disposal. This resulted in difficulty for teams in two respects: the teams would pursue different projects with the same goals and different tools, and the teams each had more scope than they could reasonably plan for or accomplish.
In Q1 of FY27, we reorganized the teams into their current structure in order to accomplish a few things:
- Narrow team’s scope to prevent fatigue from jumping between projects and areas
- Provide more management support allowing the teams to grow beyond their current size limitations
- Expand the department’s overall scope to include topics that impact self-managed customers
Database Frameworks
The Database Frameworks group managed the Rails application code that interfaces and communicates with our database systems.
Database Operations
The Database Operations group managed the infrastructure and automation that power GitLab.com’s PostgreSQL databases.
How We Work
Each team within Database Excellence is composed of a mix of backend engineers and reliability engineers (SRE/DBRE). The balance varies by team — Database Architecture and Database Health are primarily backend engineers, while Database Automation is primarily reliability engineers — but every team has both disciplines represented.
While each team has a distinct focus area, several responsibilities are shared across the entire stage. Database reviews are coordinated by Database Architecture but staffed by members of all three teams. Oncall rotations draw from reliability engineers across the stage. Operational needs such as saturation mitigation and incident response are distributed across all teams rather than owned by any single group. Infrastructure management and database upgrades are also shared across teams, as the regional distribution of the three groups — spanning AMER, EMEA, and APAC — enables the potential for follow-the-sun coverage. This shared model ensures that operational knowledge stays broad and no single team becomes a bottleneck.
Requesting Help
Support Escalations
TBD
Reliability Requests
TBD
Tier-2 On-Call
Database Tier-2 is staffed as a 24/5 response with team members responding on a “Best Effort” basis. This means it’s possible that pages to this rotation may occasionally go unacknowledged. The limited availability of database operators has made it difficult to commit beyond that.
We may readdress this rotation in FY27-Q2 in response to the recent reorganization.
Long Term Stable Counterpart or Reviewer requests
Longer term requests, such as stable counterpart or reviewers, are handled at the stage level. These requests should be submitted as a counterpart request
Triage Rotations
TBA
Planning Process
TBA
Database Automation Team
Database Framework Group
Database Health Team
Database Operations Team
Database Stakeholders
dbce4bfb)
Alexander Sosna
Prabakaran Murugesan
Leonardo da Rosa
Maxime Orefice
Vamshidhar Poralla
Alex Ives
Allison Browne
Backend Engineer
Krasimir Angelov
Mei Yang
Rafael Henchen
Simon Tomlinson
Biren Shah
Saad Ullah
Matt Kasa
Jon Jenkins