The Data Scientist uses professional bioinformatics concepts. He/she applies computational procedures to resolve a variety of analysis and research issues. Works on assignments of moderate to extended scope where analysis of data requires a review of a variety of factors. He/she develops additional analyses as needed to achieve research objectives.

The Data Scientist will develop innovative approaches to apply AI/ML methods to clinical data sets, and advise others on implementation of effective approaches.  He/she will be required to utilize the elasticity of the AWS Cloud for Big Data Intensive (e.g. Hadoop/Spark) compute infrastructure and parallel system environment. He/she assists in creating pipelines and configurations on a Linux-based distributed file system for Very Large health data, premise-hosted as well as public cloud based. Such data will be clinical, phenotypic and population level data of several categories, structured, semi-structured and unstructured data. Non-structured data includes genetic, text, images, and “messy” alphanumeric data. Strong AWS and Linux system administration skills will be needed to build scalable general-purpose computational and inferential software tools to work with the data.

The Data Scientist will also evaluate third party tools, especially for Next Generation Sequencing, Natural Language Processing (NLP), text mining and information retrieval, for adaptation into and use in our system, under guidance from the Bioinformatics manager. Further the candidate will work towards ensuring system compliance with the university’s policies with respect to privacy and security.
Data Scientist
8 Hours
Monday – Friday, 40 hours per week
99 – Policy-Covered (No Bargaining Unit)
Clinical Systems / IT Professionals
Required Qualifications
•    Bachelor’s degree in biological science, computational/programming, or related area and/or equivalent experience/training
•    Minimum 3 years of related experience
•    Thorough knowledge of bioinformatics methods, applications programming, web development and data structures
•    Thorough knowledge of bioinformatics programming design, modification and implementation
•    Understanding of relational databases, web interfaces, and operating systems
Preferred Qualifications
•    Doctoral degree in Computer Science or biomedical computation or related area
•    Strong project management skills
•    Thorough knowledge of modern biology and applicable field of research
•    Thorough knowledge of web, application and data security concepts and methods
•    Experience writing queries, functions, scripts and procedures with SQL and PL/SQL
•    Experience with ETL tools, data mapping, and validation (informatica or equivalent)
•    Ability to investigate ETL pipeline and data querying performance and process failures and work to improve them
•    Experience with multiple database management systems; demonstrated ability to work on multiple tasks
•    Experience analyzing data on the order of tens of billions of records. Strong database and big data structure design and querying
•    Experience of applying machine learning, statistical or similar data science techniques to real-world data
•    Experience working with the following: AWS, Unix/Linux OS and shell scripting, Python, java, C++, Rstudio, Jupyter, iPython, Hadoop, Spark, CentOS, Obuntu Linux
•    Experience applying machine learning and natural language processing, including text classification, information extraction, and clinical NLP 
•    Experience working with clinical databases that include medical record data, common data model OMOP, cancer related terminologies and ontologies, and experience working with sequencing data
•    Experience working with UCSF data assets and data systems and knowledge of UCSF data sharing and security practices
•    Communication skills to work with both technical and non-technical personnel in multiple fields of expertise and at various levels in the organization
•    Ability to communicate technical information in a clear and concise manner
•    Ability to interface with management on a regular basis
•    Self-motivated, work independently or as part of a team, able to learn quickly, meet deadlines and demonstrate problem solving skills
Full Time
Helen Diller Family Comprehensive Cancer Center
Department Description
The UCSF Helen Diller Family Comprehensive Cancer Center (HDFCCC) is one of only two cancer centers in the Bay Area to receive the prestigious designation of “comprehensive” from the National Cancer Institute (NCI). The HDFCCC integrates the work of researchers and clinicians dedicated to four fundamental pursuits: laboratory research into the causes of cancer progression; clinical research to translate new knowledge into viable treatments; compassionate, state-of-the-art patient care; and population research that can lead to improvements in prevention, early detection, and quality-of-life for those living with cancer. The twin pillars of precision medicine and precision population health guide research and treatment at HDFCCC.

The Bakar Computational Health Sciences Institute (BCHSI) is a critical component of a global UCSF initiative in Precision Medicine, which seeks to aggregate and integrate vast, disparate datasets to advance understanding of biological processes, determine mechanisms of disease, and inform diagnosis and treatment of patients.  Beginning with a base of excellent computational faculty dispersed among our four top-ranked professional schools (Dentistry, Medicine, Nursing and Pharmacy) and Graduate Division, superb research programs and outstanding Medical Center, BCHSI will establish a central convening center, hire additional faculty, and build programs for research and education. BCHSI will develop and enhance UCSF’s computational approaches and strategies in basic, translational, clinical and population-based biomedical research, working with partners in industry and academia where appropriate.  It will be a campus hub for computer scientists and for researchers who employ computation as a primary tool in their biomedical research.
About UCSF
The University of California, San Francisco (UCSF) is a leading university dedicated to promoting health worldwide through advanced biomedical research, graduate-level education in the life sciences and health professions, and excellence in patient care. It is the only campus in the 10-campus UC system dedicated exclusively to the health sciences. We bring together the world’s leading experts in nearly every area of health. We are home to five Nobel laureates who have advanced the understanding of cancer, neurodegenerative diseases, aging and stem cells.
Pride Values
UCSF is a diverse community made of people with many skills and talents. We seek candidates whose work experience or community service has prepared them to contribute to our commitment to professionalism, respect, integrity, diversity and excellence – also known as our PRIDE values.

In addition to our PRIDE values, UCSF is committed to equity – both in how we deliver care as well as our workforce. We are committed to building a broadly diverse community, nurturing a culture that is welcoming and supportive, and engaging diverse ideas for the provision of culturally competent education, discovery, and patient care. Additional information about UCSF is available at

Join us to find a rewarding career contributing to improving healthcare worldwide.
Mission Bay (SF)

