Bioinformatics Software Engineer, Cancer Genomics Research Laboratory (req2036) job with Frederick National Laboratory


We are seeking an enthusiastic, creative, and collaborative bioinformatics software engineer to support pipeline development and analysis for our broad portfolio of genomic studies. If you have experience designing and deploying robust, reproducible, production-quality pipelines, then come join our talented team of bioinformaticians dedicated to understanding the genetics of cancer!

The Cancer Genomics Research Laboratory (CGR) investigates the contribution of germline and somatic genetic variation to cancer susceptibility and outcomes in support of the NCI’s Division of Cancer Epidemiology and Genetics (DCEG), the world’s most comprehensive cancer epidemiology research group. CGR is located at the NCI-Shady Grove campus in Gaithersburg, MD and operated by Leidos Biomedical Research, Inc. We care deeply about discovering the genetic and environmental determinants of cancer, and new approaches to cancer prevention, through our contributions to the molecular, genetic, and epidemiologic research of the 70+ investigators in DCEG. Our bioinformaticians have both the passion to learn and the opportunity to apply their skills to our rich and varied short- and long-read sequencing datasets, generated in support of DCEG’s multidisciplinary family- and population-based studies. Working in concert with the epidemiologists, biostatisticians, and basic research scientists in DCEG’s intramural research program, CGR conducts genome wide association studies (GWAS), targeted, whole-exome, and whole-genome sequencing studies, including analysis of germline and somatic variants, structural variation, copy number variation, metagenomics, transcriptomics, and more.


* Develop and maintain robust, tested pipelines for a wide variety of computational genomics applications, with an emphasis on scalability, portability, and thorough documentation
* Work with scientists and bioinformaticians to model and automate data analytics, visualization, and reporting workflows
* Implementation and management of data analysis pipelines using the cloud environment or on-prem high-performance compute clusters.
* Applying industry best practices in planning, execution, and documenting systems, software, and APIs
* Provide end-users with technical support for deployment and execution of complex pipelines


To be considered for this position, you must minimally meet the knowledge, skills, and abilities listed below:

* Possession of Bachelor’s degree from an accredited college/university according to the Council for Higher Education Accreditation (CHEA) in computer science, software engineering, bioinformatics, statistics, or related field or four (4) years relevant experience in lieu of degree. Foreign degrees must be evaluated for U.S. equivalency
* In addition to educational requirements, a minimum of five (5) years of progressively responsible scientific software engineering and/or complex system management/bioinformatics experience
* Extensive pipeline development experience, including collaborative coding and use of source control (e.g. git)
* Demonstrated expertise with the full software development lifecycle
* Experience with Snakemake, CWL, WDL, Nextflow or other workflow management systems
* Experience with various environment/dependency management tools (e.g. pip, venv, conda, mamba, renv)
* Experience managing large computational tasks in a Linux-based high-performance computing environment
* Excellent programming skills in at least one high-performance programming language such as C/C++, Java, or Rust, and at least one scripting language such as Bash, Python, Perl, or R
* Team-oriented with excellent written and verbal communication skills, organizational skills, and attention to detail; ability to organize and execute multiple projects in parallel both independently and as part of working groups
* Demonstrated experience with writing technical documentation of software
* Ability to obtain and maintain a security clearance


Candidates with these desired skills will be given preferential consideration:

* Masters or PhD preferred
* Experience in techniques used in bioinformatics, genomics, and statistical genetics, with expertise in computational approaches to genomic analyses
* Experience with software testing types including unit, integration, regression, and acceptance tests, as well as related packages (e.g. unittest, pytest, TAP)
* Experience with CI/CD (GitLab CI/CD, Travis CI, CircleCI, etc.)
* Experience with documentation tools such as Sphinx or Doxygen
* Experience with Google Cloud, AWS, or managed cloud environments
* Knowledge of various DevOps tools and technologies, such as Docker/Singularity, Kubernetes, Ansible/Terraform
* Familiarity with relational and NoSQL databases design, data warehousing, and data modeling (e.g. MySQL, Amazon Redshift, MongoDB, Elasticsearch, FileMaker)
* A portfolio of open-source software engineering projects

Source link