[Remote] Mid-Level Data Engineer
Note: The job is a remote job and is open to candidates in USA. Simple Technology Solutions is a company that prioritizes its team members and offers flexibility for personal and professional growth. They are seeking a Mid-Level Data Engineer to join their federal data engineering team, where the role involves building and maintaining ETL pipelines on a cloud-based Enterprise Data Platform using AWS.
Responsibilities
- Develop new ETL pipelines and data ingestion processes alongside senior engineers using AWS Glue (Spark-based, PySpark), MWAA (Airflow), Lambda, and SNS, fully conforming to the agency's Enterprise ETL Standards, ETL Common Library, and PEP 8 Python coding standards
- Integrate the agency's ETL Common Library into Glue jobs for standardized orchestration, error handling, metadata recording, and SNS notifications for all success and error job events
- Ingest structured and semi-structured datasets (CSV, XML, JSON, Avro, pipe-delimited) into S3 landing, raw, and curated zones using Apache Iceberg tables with Parquet as the default format; enforce transactional loading and prevent duplicate loads per dataset reporting period
- Configure static ETL metadata in the centralized PostgreSQL metadata store; ensure dynamic metadata records job status and timestamps for all key execution steps
- Monitor assigned production jobs and participate in operations support rotations; identify and escalate failed jobs and performance issues promptly to maintain data availability within contractually required ingestion timelines
- Ensure ETL Load Reports are populated in real-time and ETL Gap Reports are updated on a weekly basis covering all gaps from the inception of the initial ingest process
- Build and maintain materialized views and semantic layer objects in Trino and Athena to ensure optimized query performance and consistent business logic
- Produce and maintain required documentation for each assigned dataset: Business Requirements, ETL Design Documents, Data Models (Mermaid format), Data Dictionaries, Mapping Documents, Deployment Documents, O&M Guides, and ETL Test Plans
- Write unit and integration tests achieving the 90% minimum code coverage threshold; complete security scans at least once per sprint as part of the Definition of Done
- Deploy ETL resources using CloudFormation templates through the agency CICD pipeline; submit Change Requests to the Change Control Board within required timelines
- Support transition of ETL jobs from other agency teams by verifying standards conformance, performing deployments, and validating data loads
- Support disaster recovery exercises, pre-production deployments, and ad hoc data requests as assigned
- Participate in 2-week sprint ceremonies, quarterly PI planning, backlog refinement, and agile delivery using JIRA and GitHub
Skills
- US Citizenship is required
- Bachelor's Degree is required
- Minimum of 3-5 years' position related experience is required
- Bachelor's degree or higher in Computer Science, Information Systems, Data Engineering, or a related field
- 3-5 years of experience in data engineering or a closely related technical role
- Hands-on experience with Python (PEP 8), PySpark, and SQL for ETL pipeline development
- Experience with AWS services including Glue, S3, MWAA (Airflow), Lambda, SNS, and SQS
- Familiarity with Apache Iceberg, Parquet, and ORC file formats and S3 data lake zone concepts
- Experience with PostgreSQL and basic familiarity with Redshift or Oracle
- Familiarity with Trino or Athena for query and semantic layer development
- Experience with CloudFormation, GitHub branching workflows, and CI/CD-integrated deployments
- Ability to produce clear ETL documentation including data models (Mermaid format) and data dictionaries
- Understanding of ETL metadata concepts including static and dynamic metadata, load reports, and gap reports
- Experience in agile development environments with sprint-based delivery
- Experience supporting IV&V and/or User Acceptance Testing (UAT) processes in a federal or technical program environment
- Experience with automated testing frameworks; ability to write unit and integration tests achieving defined code coverage thresholds
- Must be able to work 8am-5pm Eastern Time regardless of home location
- Active federal public trust suitability determination or ability to obtain one required
- Familiarity with FISMA, NIST 800-53, and OWASP ASVS Level 2 is a plus
Benefits
- Flexibility to help them thrive personally and professionally
- Special incentives for team members living in qualified HUBZones
Company Overview