Staff Software Engineer, Fleet Reliability and Performance
Central Denmark Region, Denmark
Posted on Friday, March 31, 2023
About The RoleWe build the foundations for all of Uber’s fleet of 100,000s of hosts or VMs by ensuring they are running optimally and are configured efficiently for the container platforms using the hosts. We observe and detect a broad range of reliability and quality problems through codified processes and automatically drive remediation.We run generically across bare metal hosts and VMs and across our own on-prem data centers and multiple cloud vendors, and closely collaborate to develop integrations that ensure effective and automated management.Internally we integrate with Uber’s stateful and stateless container scheduling platforms to run host operations in a safe and efficient way and use this to realize remediation of bad hosts or apply fleet-wide upgrades such as rolling out a new kernel.We maintain the base OS image and handle kernel upgrades and configuration and provide high-fidelity host and container metrics to ensure secure and optimal performance for the workloads on the hosts.Our team consists of a healthy combination of both junior and senior engineers with an array of experiences across the industry. We value ideas over hierarchy, always improving, getting things done through code, and having a measurable impact on the business. What You Will DoYou will improve your software engineering, systems engineering, hardware/Linux OS/kernel knowledge, cloud knowledge, and infrastructure systems experience to investigate and decipher ambiguous problems in our production fleet while also contributing to planning, new systems design, and improvement of existing systems to enable even greater efficiency and insight.
- Contribute to planning, design and architecture, and building of systems, tooling, and observability in support of production server fleet reliability, and cloud expansion efforts
- Low-level debugging into host-level issues and generalization of detection
- Actively drive collaboration across multiple teams to create alignment and progress.
- Implement solutions in Go with a strong focus on clean, readable code with unit and integration test coverage.
- Take an active part in code change peer reviews to ensure quality and cross-collaboration occurs across the team.
- Contribute to engineering cultivation in terms of quality, monitoring, and on-call practices.
- Own part of the team’s charter and through that help setting longer-term direction for the team.
- 8+ years of experience
- BS, MS, or Ph.D. degree in computer science, similar technical field of study, or equivalent practical experience
- Background in multiple programming languages, e.g., C/C++, Python, Go, etc.
- Strong hands-on experience with Linux investigating and debugging performance problems
- An inherent aim to collaborate, both within the team and across organization
- Excellent written and verbal interpersonal skills, and the ability to write detailed design documents, post mortems
- A belief that your team can accomplish more together than as separate individuals
- Attention to detail, particularly around software engineering fundamentals, testing methodologies, and quality
- Experience with cloud and migration to cloud is a plus
- Strong understanding of Linux kernel internals, e.g., ability to read and understand kernel code.
- Hands-on knowledge of Linux kernels, hardware performance evaluation, tuning and debugging.
- An understanding of server hardware at scale: data center network fundamentals, OS imaging, provisioning, distribution, and configuration deployment at a large scale
- Experience with large distributed systems.
- Experience with containerization software such as Kubernetes, Docker, and Mesos.
- Comfortable working with on-prem and cloud-based infrastructure (AWS, GCP).
- Accommodations may be available based on religious and/or medical conditions, or as required by applicable law. To request accommodation, please get in touch with email@example.com.
See more open positions at Uber
Something looks off?