$3 - $5 Posted: 20 hours ago
Job Description
<p><b>Themesoft Inc.</b> is a global IT solutions provider and a Woman Owned Minority Business Enterprise headquartered in Dallas, TX. With a strong presence across the US, Canada, India, Singapore, and Brazil, we specialize in digital transformation, consulting, and workforce solutions across diverse industries.</p><p><br></p><p><b>We are currently looking for a tech-savvy and results-driven professional for one of our leading clients.</b> If you're passionate about technology and looking to grow in a dynamic, fast-paced environment, this could be the perfect fit for you!</p><p><br></p><p><b>Role : Azure Cloud Engineer AI</b></p><p><b>Location : Toronto, Canada- Hybrid (3 days to office)</b></p><p><b>6+ months</b></p><p><br></p><p><br></p><p><b>Cloud Engineer - AI Infrastructure</b></p><p><b>Role Overview</b></p><p>As a Cloud Engineer, you will be responsible for implementing and maintaining scalable, secure, and high-performance cloud infrastructure to support AI/ML workloads. You'll work closely with platform, application, and data teams to ensure reliable operations and efficient delivery of AI services.</p><p><br></p><p><b>Key Responsibilities</b></p><p><b>Infrastructure & Platform Operations</b></p><ul><li>Deploy and manage cloud-native infrastructure for AI/ML workloads (GPU/CPU clusters, autoscaling, spot instances).</li><li>Configure and maintain networking components (Azure VNet, Private Link, peering, HA/DR setups).</li><li>Operate storage and database systems including Azure Data Lake Storage, relational databases, and vector databases (FAISS, Milvus, Pinecone).</li><li>Implement IAM policies, secrets management (Key Vault), and encryption standards.</li></ul><p><b>Observability & Reliability</b></p><ul><li>Set up monitoring for latency, throughput, GPU utilization, and cost metrics.</li><li>Integrate logging and tracing tools (OpenTelemetry) and maintain SLOs/SLIs for infrastructure services.</li><li>Support incident response and root cause analysis using SRE principles.</li></ul><p><b>CI/CD & Infrastructure Automation</b></p><ul><li>Build and maintain CI/CD pipelines using GitHub Actions or Azure DevOps.</li><li>Implement GitOps workflows for infrastructure-as-code using Terraform or Bicep.</li><li>Create reusable IaC modules and templates for consistent deployments.</li></ul><p><b>FinOps & Cost Optimization</b></p><ul><li>Monitor and optimize GPU usage, caching strategies, and inference performance.</li><li>Support cost governance and reporting for AI infrastructure.</li></ul><p><b>Application Enablement</b></p><ul><li>Provide infrastructure support for APIs, microservices, and event-driven architectures.</li><li>Enable model serving runtimes (TensorRT-LLM, vLLM, Triton/KServe).</li><li>Support RAG pipelines including embeddings, chunking, and retrieval systems.</li></ul><p><b>Security & Compliance</b></p><ul><li>Apply defense-in-depth strategies: IAM least privilege, private networking, image signing.</li><li>Ensure compliance with data residency, encryption, and audit requirements.</li></ul><p><br></p><p><b>Qualifications</b></p><ul><li>Bachelor's degree in Computer Science, Engineering, or related field.</li><li>3-5 years of experience in cloud infrastructure (Azure preferred).</li><li>Hands-on experience with Kubernetes, Terraform/Bicep, and cloud networking.</li><li>Familiarity with AI/ML infrastructure components and model serving.</li><li>Proficiency in Python for automation; Go or TypeScript is a plus.</li></ul><p><br></p><p><b>Tech Stack</b></p><ul><li><b>Cloud & Infra</b>: Azure (AKS, Functions, Event Hubs, Key Vault), Terraform/Bicep, GitHub Actions</li><li><b>AI Infra</b>: Kubernetes, KServe/Triton, vLLM, TensorRT-LLM</li><li><b>Ops</b>: Prometheus, Grafana, OpenTelemetry, ArgoCD</li><li><b>Data</b>: Feature stores (Feast), vector DBs (FAISS, Milvus), relational DBs</li><li><b>App Layer</b>: APIs, microservices, frontend/backend integration</li></ul><p><br></p><p><b>Success Metrics</b></p><ul><li><b>Reliability</b>: SLOs met, uptime maintained</li><li><b>Security</b>: No critical vulnerabilities, audit-ready infrastructure</li><li><b>Cost Efficiency</b>: Optimized GPU and infra spend</li><li><b>Velocity</b>: Fast and reliable deployments</li><li><b>Collaboration</b>: Effective cross-team support and documentation</li></ul><p><br></p><p><br></p><p><br></p><p>Regards,</p><p><br></p><p> _</p><p>Parthasarathy K</p><p>Lead Recruiter</p><p>Work: <b></b> Ext: 306,Direct: </p><p></p><p>Themesoft Inc Themesoft Jobs</p><p></p>Create Your Resume First
Give yourself the best chance of success. Create a professional, job-winning resume with AI before you apply.
It's fast, easy, and increases your chances of getting an interview!
Application Disclaimer
You are now leaving Careeler.com and being redirected to a third-party website to complete your application. We are not responsible for the content or privacy practices of this external site.
Important: Beware of job scams. Never provide your bank account details, credit card information, or any form of payment to a potential employer.