- Develop and support NGS pipelines using scientific workflow language via best practices, including unit testing, CI/CD, containerization, and code reviews.
- Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
- Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and AWS ‘big data’ technologies.
- Design, develop and maintain Application Programming Interfaces (APIs), microservices, and asynchronous queuing systems.
- Work with bioinformatic scientists as the stakeholders to assist with data-related technical issues and support their data infrastructure needs.
- Bachelor or Master in Software Engineering, Computer Science, data Science/engineering, bioinformatics, computer and electrical engineering, or a related field
- With at least 3-5 + (BS) or 2 + (MS) years of relevant experience.
- Bioinformatics pipeline development experience
- NGS bioinformatics pipeline development experience
- Understanding the process of how data is developed and processed through the sequencing machine
- Scientific workflow language experiences such as NetFlow, SnakeMake, CWL or WDL
- Fluency in 1 or more or relevant programming languages (e.g., Python, Java, R)
- Experience across multiple tiers of an application, including a database, network, operating system, and containers
- Familiarity with standard tools and data formats related to genomics resequencing projects processing and analysis.
- Familiarity with data workflow development and ETL process.
- Strong communication skills in a collaborative environment
- AWS Data Platform experience: S3, Kinesis, Dynamo, RDS
Bioinformatiocs, Snakemake, Netflow, CWL, WDL, Pipeline, workflow
Read more here: Source link