Internal Audit - Data Engineer III

AmerisourceBergen Corporation • Full-time • Pune, IN • 16h ago

Our team members are at the heart of everything we do. At Cencora, we are united in our responsibility to create healthier futures, and every person here is essential to us being able to deliver on that purpose. If you want to make a difference at the center of health, come join our innovative company and help us improve the lives of people and animals everywhere. Apply today!

Job Details

POSITION SUMMARY

The Internal Audit Data Analytics team is seeking experienced Data Engineer to support the build‑out and ongoing enhancement of Internal Audit’s Databricks-based analytics environment. This role will focus on designing, building, and maintaining scalable data pipelines and data lake solutions used to support stand‑alone audits, continuous auditing, and risk monitoring initiatives across the enterprise.

Reporting to the Data Analytics Sr. Manager – Internal Audit, the Data Engineers will play a critical role in enabling high-quality, governed, and automated data flows into Internal Audit’s Databricks cube. This position will partner closely with auditors, data analysts, IT organization, and business stakeholders to ensure reliable data ingestion, data quality, and availability of analytical datasets for use in audit execution, risk assessments, and strategic data-driven initiatives.

This is a hands-on engineering role requiring deep technical expertise in Databricks, cloud platforms (Azure preferred), data modeling, ETL/ELT design, and development of scalable data engineering solutions.

PRIMARY DUTIES AND RESPONSIBILITIES

Data Engineering & Pipeline Development

Design, build, and maintain large-scale, fault-tolerant data pipelines using Python/PySpark, Databricks, Delta Lake, and orchestration tools (e.g., Airflow, Azure Data Factory).
Develop and optimize ETL/ELT workflows to support ingestion, transformation, and modeling of large datasets into a Lakehouse using Delta Lake: batch ingestion from files, databases, APIs; streaming using Structured Streaming; handling semi-structured data (JSON, Parquet, Avro); ELT patterns using Spark SQL / PySpark; Incremental processing patterns; Databricks Jobs; External orchestrators (ADF, Airflow, etc.)
Implement CDC, incremental loads, and full refresh patterns; handle schema evolution and data reconciliation.
Develop and maintain curated data models (bronze/silver/gold) and support BI/analytics consumption.
Optimize performance and cost (partitioning, Z-ORDER, file sizing, caching, cluster policies, job tuning).
Implement scalable data lake and analytical platform architectures on Azure, ensuring security, governance, and cost efficiency.
Automate repeatable ingestion processes using infrastructure as code (IaC) and Continuous Integration (CI)/Continuous Delivery (CD) deployment methodologies.
Develop robust data models and semantic layers to facilitate analytical consumption by auditors and Data Analytics teammates.

Data Quality, Monitoring & Governance

Create and manage data quality checks, anomaly detection routines, and automated alerting to ensure accuracy and integrity of audit datasets, and SLA-driven operations.
Establish repeatable processes for documenting data lineage, validation, reconciliation, and test coverage.
Implement scalable frameworks for metadata management, schema validation, and versioning of data pipelines.

Audit Collaboration & Analytics Support

Support IA audit execution by enabling access to clean, reliable, and well-documented datasets.
Provide SME-level guidance on data availability, data structures, pipeline behavior, and data limitations.

Standards, Innovation & Best Practices

Establish consistency in design patterns, coding approaches, documentation, and engineering standards.
Identify opportunities to modernize or optimize existing pipelines, architecture, or data processing patterns.
Contribute to the continuous improvement of the Internal Audit analytics program through automation, performance tuning, and new capability development.
Create and maintain technical documentation, runbooks, and onboarding guides.
Participate in code reviews and promote engineering best practices (testing, CI/CD, version control).

EXPERIENCE AND EDUCATIONAL REQUIREMENTS

Bachelor’s or Masters degree in Computer Science, Data Engineering, Information Systems, Analytics, or related discipline; equivalent work experience considered.
Minimum 3–5 years of relevant experience required; 5–7 years preferred including 2-4 years of hands-on Data Engineering experience with Databricks.
Deep expertise working with Databricks, including cluster design, notebook development, Spark optimization, Delta Lake, Delta Live Tables, Unity Catalog (Centralized permissions, Data lineage, Table & schema access controls), and data governance/access controls.
Strong proficiency in Python, PySpark/Spark, and SQL; understanding of Spark architecture: Driver, Executors, Stages, Tasks and Shuffle, portioning, caching. Performance tuning and optimization on large datasets.
Experience designing and managing large-scale data ingestion from complex enterprise systems (ERP, financial systems, operational platforms).
Hands-on experience with Azure (preferred), Amazon Web Services (AWS), or Google Cloud Platform (GCP) cloud services.
Solid understanding of data warehousing/Lake house concepts, Delta Lake (Delta Lake table design, ACID transactions, schema enforcement & evolution, time travel, handling late arriving data) and medallion architecture.
Experience in creating and supporting end-to-end ETL/ELT workflows.
Experience handling semi‑structured data (JSON, Parquet, Avro).
Prior experience developing semantic models for analytics consumption.
Strong experience with data quality frameworks, validation routines, and monitoring strategies.
Experience with Git-based development and CI/CD practices.
Experience with cloud storage and services (Azure Data Lake Storage).
Experience with data integration tools (e.g., Fivetran, ADF, etc).
Experience collaborating with US-based onshore teams is strongly preferred.
Databricks Data Engineer Professional or Associate Certification is preferred.
Azure Data Engineer Associate (DP‑203) Certification is preferred

What Cencora offers

Benefit offerings outside the US may vary by country and will be aligned to local market practice. The eligibility and effective date may differ for some benefits and for team members covered under collective bargaining agreements.

Full timeAffiliated CompaniesAffiliated Companies: CENCORA BUSINESS SERVICES INDIA PRIVATE LIMITEDEqual Employment Opportunity

Cencora is committed to providing equal employment opportunity without regard to race, color, religion, sex, sexual orientation, gender identity, genetic information, national origin, age, disability, veteran status or membership in any other class protected by federal, state or local law.

The company’s continued success depends on the full and effective utilization of qualified individuals. Therefore, harassment is prohibited and all matters related to recruiting, training, compensation, benefits, promotions and transfers comply with equal opportunity principles and are non-discriminatory.

Cencora is committed to providing reasonable accommodations to individuals with disabilities during the employment process which are consistent with legal requirements. If you wish to request an accommodation while seeking employment, please call 888.692.2272 or email hrsc@cencora.com. We will make accommodation determinations on a request-by-request basis. Messages and emails regarding anything other than accommodations requests will not be returned