Job Description


What we're looking for:

A Site Reliability Engineer to join our Technical Operations team in San Jose, CA. You will have experience operating a large-scale SaaS infrastructure in both private cloud (in data center) and on public cloud platforms. You will have experience with the entire tech stack but a strong background in network administration/engineering is required. You will be able to demonstrate with concrete examples that you are a self-motivated team player. Troubleshooting, automation, security, and architecture are key components of this role.

What you'll be doing:

Troubleshoot production issues in a 24/7 environment as part of a global team

Operationally manage all parts of the tech stack with a focus on the network side

Automate repeatable processes including, but not limited to, deployments, monitoring, and upgrades. Manage infrastructure as code

Secure the infrastructure with industry standard best practices and determine weaknesses

Plan capacity, reliability, and functionality while optimizing costs

Work cross functionally with other teams to support and grow the product and the business

Work with and manage third party vendors

Research and test new technologies to build for the future and speed the development cycle

What we value:

5+ years' experience supporting a 24/7 production environment for a SaaS product and supporting software engineering teams

Strong knowledge of network concepts and protocols: TCP/IP, BGP, NAT, VoIP, DNS, DHCP, VPN, etc.

Strong knowledge of Linux administration, virtualization, storage, web/app servers, databases

Fluent in at least one scripting language and experience with configuration management tools

Desired Skills and Experience:

Experience with deploying and managing at least one public cloud provider

Management of containers

Location: San Jose

Duration: Permanent