The Agents of Webapp is a Site Reliability Engineering (SRE) team managing Slack’s webserving infrastructure and dedicated to making Slack reliable. We continuously seek to improve the visibility, speed, and safety of Slack’s “webapp” runtime: components at the center of Slack’s distributed application architecture. We are a growing and changing team, welcoming new perspectives and strategies to address evolving challenges to reliability. We collaborate with many engineering teams at Slack to continuously improve a shared runtime for the webapp: the infrastructure which enables continuous deployment of application code to meet the needs and expectations of millions of Slack users.
- You will help develop new deployment mechanisms for our webapp infrastructure, such as: canary, A/B, blue/green, red-line and other deployment patterns
- You will lead large engineering projects, from start to finish, where the scope is mostly understood
- You will define SLA/SLOs for the Slack webapp, manage code deployments, fixes and software updates, and automate our operational processes
- This team has an operational responsibility in addition to being a software development team. You will participate in the team’s on-call rotation, assist with triaging, and addressing production issues, and respond to incidents at Slack that involve the webapp.
- You will directly support multiple components of Slack’s webserving infrastructure, including Apache, HHVM, Squid, Memcache, Docker, Kubernetes and AWS services.
- You will collaboratively help support additional software components at Slack that work in conjunction with the webapp.
- Examples include: Consul, Envoy, HAProxy, Chef, Terraform, databases and caching services
- You will review code and get your code reviewed; mentor and be mentored by other engineers. Teamwork is what makes the dream work.
- A positive approach that embraces standard methodologies for software management and reliability, including unit testing, code review, design documentation, debugging, and troubleshooting.
- A passion for reliability, scaling patterns, up-time, and availability.
- Strong command of computer science fundamentals: data structures, algorithms, programming languages, distributed systems, and information retrieval
- Bachelor’s degree in Computer Science, Engineering or related field, or equivalent training or work experience
- A demonstrable history of thriving within a software development team, even if your roles have included traditional operations and/or infrastructure management duties.
- Professional functional or imperative programming languages — e.g., PHP, Python, Ruby, Go, C, or Java (used without frameworks)
- Curiosity about how things work and love to share that knowledge with others
- Experience managing critical production infrastructure, maintaining reliability and uptime, and having a “customer first” view of operational safety.
Vacancy Type: Full Time
Job Location: San Jose, CA, US
Application Deadline: N/A