Principal, Site Reliability Engineer
Sysco Corporate
APPLY NOWEmployment Type Full time
Company name US0623 Sysco Technologies, LLCCompensation Range The pay range provided is not indicative of Sysco’s actual pay range but is merely algorithmic and provided for generalized comparison. Factors that may be used to determine rate of pay include specific skills, work location, work experience and other individualized factors
Requirements Impactful changes across the platform and sustained leadership roles. Responsible for designs and future direction for high availability, performant web/mobile applications, resilient and scalable systems, and metrics and monitoring. Responsible for defining best practices across development, product, architecture, and leadership to collaborate and mentor reliability across the platform. Forward thinking and action to be ahead of issues before they occur through automation and careful analysis. Critical thinking and debugging skills of highly complex environments including networking packet analysis, kubernetes, nginx, streaming (kafka), edge networks, caching, and application layer generalist. Fully accountable for overall system reliability and performance.
Description
Job Summary:
Impactful changes across the platform and sustained leadership roles.Responsible for designs and future direction for high availability, performant web/mobile applications, resilient and scalable systems, and metrics and monitoring.Responsible for defining best practices across development, product, architecture, and leadership to collaborate and mentor reliability across the platform.Forward thinking and action to be ahead of issues before they occur through automation and careful analysis.Critical thinking and debugging skills of highly complex environments including networking packet analysis, kubernetes, nginx, streaming (kafka), edge networks, caching, and application layer generalist.Fully accountable for overall system reliability and performance.
Duties and Responsibilities:
Develop and refine strategy and process for all reliability tracking across the platform in conjunction with senior members of the team.
Lead strategic discussions to continue the evolution of flexibility and sustainability of the entire product suite.
Partner with support teams, DevOps, Engineering, and customers to inform decisions and implement improvements.
Responsible for RCA findings related to reliability are addressed at initial injection to prevent regression.
Looking broadly across the platform for latent reliability issues and address them before they are surfaced.
Provide the orchestration for the production environment by monitoring availability and taking a holistic view of system health
Architect the software and systems to manage platform infrastructure and applications.
Documenting and performing annual reviews for tribal knowledge and best practices.
Define the objectives for system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve
Gather and analyze metrics for trending performance tuning and fault finding
Partner with development teams to improve services through rigorous testing and release procedures
Provide leadership for system design, platform management, and capacity planning
Balance feature development speed and reliability with well-defined service level objectives
Actively maintain a thorough understanding of system architecture, applications, and related integrations. Partner with the Platform team to understand and improve system monitoring and alerting.
Drive active-active multisite reliability targets.
Ability to drive performance and reliability in a multi-cloud environment.
Implement Enterprise level procedures and processes.
Hands on experience with the top Cloud providers.
Education Required:
Bachelor’s degree in computer science, computer engineering or related field, or relevant training.
Education Preferred:
Or equivalent combination of experience and education.
Experience Required:
8 years experience in Site Reliability Role.
8 years experience with enterprise cloud platforms.
Availability to work extended or off-cycle hours and participate in a 24/7 Site Reliability on-call rotation.
Experience Preferred:
8 years’ experience in cloud operations / DevOps role.
Experience with AWS.
Experience with APM tools such as DataDog, New Relic, Nagios or Splunk.
Experience in an agile development environment.
Physical Demands:
Reasonable accommodations will be made to enable individuals with disabilities to perform the essential functions of this job.
Overview Sysco is the global leader in foodservice distribution. With over 71,000 colleagues and a fleet of over 13,000 vehicles, Sysco operates approximately 333 distribution facilities worldwide and serves more than 700,000 customer locations. We offer our colleagues the opportunity to grow personally and professionally, to contribute to the success of a dynamic organization, and to serve others in a manner that exceeds their expectations. We’re looking for talented, hard-working individuals to join our team. Come grow with us and let us show you why Sysco is at the heart of food and service.
AFFIRMATIVE ACTION STATEMENT Applicants must be currently authorized to work in the United States. We are proud to be an Equal Opportunity and Affirmative Action employer, and consider qualified applicants without regard to race, color, creed, religion, ancestry, national origin, sex, sexual orientation, gender identity, age, disability, veteran status or any other protected factor under federal, state or local law. This opportunity is available through Sysco Corporation, its subsidiaries and affiliates.
APPLY NOWYou have not recently viewed any jobs.
You have not saved any jobs.