As a Principal Engineer of the SRE team, you will work on our mission-critical eCommerce platforms, tools, to ensure the highest levels of availability and reliability of all our applications. This will require you to maintain high site uptime/availability while embracing rapid change and growth using a strong Devops mindset of continuous delivery and site automation. Our SRE team is furnished with a standout opportunity to build tools, frameworks, and cloud platforms that will support our company’s growth over the next decade. This role requires deep technical knowledge, adaptability, hands on execution, and a ruthless drive towards higher levels of availability and resiliency. This will require you to define our standards for monitoring, alerting, scalability, and production-readiness. You will also help with CDN implementation, Site Speed performance engineering aspects of our eCommerce platform and services.
- Establish end-to-end monitoring and alerting on all critical components of the services.
- Work with engineering domain teams to make sure the applications are production ready, scalable, and reliable from the grounds up.
- Lead and participate in performance tests, identifies the bottlenecks, opportunities for optimization and capacity demands.
- Leads experienced SRE engineers who are familiar with clustering technologies – high availability, resiliency, and horizontal scaling.
- Leads SRE team to conduct root cause analysis of critical business and production issues.
- Monitor and report on SLO/performance/capacity for a given applications services. Work with business and product owners to establish key performance indicators.
- Defining and executing High Availability, Disaster Recovery, Sustained Resiliency, Chaos Engineering tests.
- Lead SRE design reviews and work cross-functionally with Domain Engineering teams on operational readiness.
- Drives SRE engineers for process improvements and automation opportunities.
- Identify and drive opportunities to improve automation, management, and visibility of services.
- Conducts OS tuning, optimization and system requirements for vertical scaling.
- Partners closely with enterprise architecture and infrastructure teams to support key business applications.
- Design, build and operate Cloud infrastructure to enable reliable and rapid deployment of microservices with effective monitoring and resilient operations.
- 6 years of experience working with an IT Infrastructure and Operations
- 10 years of experience in software development or a related field
- 8 years of IT experience developing and implementing systems within an organization
- Experience working with Continuous Integration / Continuous Deployment tools
- Master’s degree in Computer Science, CIS, or related field
- 8 years of experience in Site Reliability Engineering
- 8 years of experience working with source code control systems
- 8 years of experience working on projects involving the implementation of solutions applying development lifecycles (SDLC)
- Bachelor’s degree in Computer Science, CIS, or related field
- 6 years of experience leading teams, with or without direct reports
- 8 years of experience working with defect or incident tracking software
- 8 years of experience working with application and integration middleware
Vacancy Type: Full Time
Job Location: Mission, TX, US
Application Deadline: N/A