Tag: CHR

Ubuntu Manpage: alleleCounts.pl – Generate tab seperated file with allelic counts and depth for each

Provided by: liballelecount-perl_4.2.1-1_all NAME alleleCounts.pl – Generate tab seperated file with allelic counts and depth for each specified locus. SYNOPSIS Where possible use the C version for large data (it’s also more configurable). alleleCounts.pl Required: -bam -b BAM/CRAM file (expects co-located index) – if CRAM see ‘-ref’ -output -o Output…

Continue Reading Ubuntu Manpage: alleleCounts.pl – Generate tab seperated file with allelic counts and depth for each

How to modify VCF file?

Hi community, I have a question: the SNP position in vcf file is from GRCh37/hg19, I need to change the position to GRCh38. So, I used UCSC liftover to replace the hg19 pos by GRCh38 pos and deleted some SNPs, then sorted the pos and saved to a new vcf…

Continue Reading How to modify VCF file?

python – Matching two files(vcf to maf) using a dictionaries, and appending the contents

annotation_file ##INFO=<ID=ClinVar_CLNSIG,Number=.,xxx ##INFO=<ID=ClinVar_CLNREVSTAT,Number=.,yyy ##INFO=<ID=ClinVar_CLNDN,Number=.zzz #CHROM POS ID REF ALT QUAL FILTER INFO chr1 10145 . AAC A 101.83 . AC=2;AF=0.067;AN=30;aaa chr1 10146 . AC A 98.25 . AC=2;AF=0.083;AN=24;bbb chr1 10146 . AC * 79.25 . AC=2;AF=0.083;AN=24;ccc chr1 10439 . AC A 81.33 . AC=1;AF=0.008333;AN=120;ddd chr1 10450 . T G 53.09…

Continue Reading python – Matching two files(vcf to maf) using a dictionaries, and appending the contents

Using featureCounts and downloading Rsubread

Using featureCounts and downloading Rsubread 1 @4769e097 Last seen 23 hours ago United Kingdom I am trying to perform a count per gene analysis using featureCounts in R. I have downloaded the gtf file and edited it within R to only contain the gene ID, chr, start, end, and strand,…

Continue Reading Using featureCounts and downloading Rsubread

Obtain equivalent variant ids (chr-pos-ref-alt) for GRCh37 and GRCh38

Obtain equivalent variant ids (chr-pos-ref-alt) for GRCh37 and GRCh38 0 Hi all, I want to obtain the equivalent variant id (chr-pos-ref-alt) from GRCh38 in GRCh37. This is to deal with some variants poorly lifted over. To exemplify, see the variant gnomad.broadinstitute.org/variant/10-17838942-A-G?dataset=gnomad_r3 It has two equivalents in GRCh37. I want to…

Continue Reading Obtain equivalent variant ids (chr-pos-ref-alt) for GRCh37 and GRCh38

Missing data per site

Hi, I want to calculate statistics of missing data per each site in my vcf file. Using vcftools –missing-site gives wrong stats for several sites. Is there is any other way to calculate it? Thank you! I have 36 samples and here is an example of the vcftools –missing-site output…

Continue Reading Missing data per site

Getting variants information programatically by quering chromosome, position, reference and alternative (chr pos ref alt)

Getting variants information programatically by quering chromosome, position, reference and alternative (chr pos ref alt) 0 Hello.Is there a command line program or an API for getting variants’ data such as reference sequences (in the NM_########.# structure), HGVS’s (such as c.673T>C) and condition ID value (such as 616843), clinical significance…

Continue Reading Getting variants information programatically by quering chromosome, position, reference and alternative (chr pos ref alt)

19MID0006_AdcVis_Assessment-3.pdf – 19MID0006-UDHAYAKUMAR.P library(ggplot2) # Warning: package ‘ggplot2’ was built under R version 4.1.2 # Importing

19MID0006-UDHAYAKUMAR.Plibrary(ggplot2)## Warning: package ‘ggplot2’ was built under R version 4.1.2# Importing the datasetpikachu<-read.csv(“C:/Users/P.UDHAYAKUMARPERUMAL/Downloads/pokemon.csv”) head(pikachu)Name<chr>Type.1<chr>Type.2<chr>Total<int>HP<int>Attack<int>Defense<int>Sp..Atk<int>Sp..Def<int>1 BulbasaurGrassPoison31845494965652 IvysaurGrassPoison40560626380803 VenusaurGrassPoison5258082831001004VenusaurMega VenusaurGrassPoison625801001231221205 CharmanderFire30939524360506 CharmeleonFire40558645880656 rows | 1-10 of 13 columns(1)Categorical v/s Categorical(1.1)Stacked Barchart(Name v/s Type.1)df1<-ggplot(pikachu, aes(x = Name,fill = Type.1)) + geom_bar(position = “stack”) df1+theme(axis.text.x=element_text(angle=90))(1.2)Grouped Barchart(Type.1 v/s Type.2)df2<-ggplot(pikachu,aes(x = Type.1,fill = Type.2)) + geom_bar(position…

Continue Reading 19MID0006_AdcVis_Assessment-3.pdf – 19MID0006-UDHAYAKUMAR.P library(ggplot2) # Warning: package ‘ggplot2’ was built under R version 4.1.2 # Importing

different result using minimap2 and pbmm2

Hi all! I am analysing CSS Pacbio data and each sample came from different run, in particular I have three files for each sample. I tested both pbmm2 and minimap2 to align my long reads, after getting the consensus sequences. This is the command I used to run mnimap2: minimap2…

Continue Reading different result using minimap2 and pbmm2

pjotrp/sambamba – sambamba – Genenetwork

10 years ago ​ 10 years ago ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ 10 years ago ​ ​ ​ ​ ​ ​ 10 years ago 10 years ago 10 years ago ​ ​ 10 years ago ​ 10 years…

Continue Reading pjotrp/sambamba – sambamba – Genenetwork

RStudio AI Weblog: pins 0.4: Versioning

A new version of pins is available on CRAN today, which adds support for versioning your datasets and DigitalOcean Spaces boards! As a quick recap, the pins package allows you to cache, discover and share resources. You can use pins in a wide range of situations, from downloading a dataset…

Continue Reading RStudio AI Weblog: pins 0.4: Versioning

WGBS data presentation

WGBS data presentation 0 Hi All, I have the data from WGBS where I have the input format which includes columns of Chr, start position, end position, average methylation values. I wanted to show the global methylation patterns using plot as shown in the reference figure. Please help me in…

Continue Reading WGBS data presentation

I wonder if someone please explain what secondary, supplementary, duplicates and paired in sequencing mean in samtools flagstat

I wonder if someone please explain what secondary, supplementary, duplicates and paired in sequencing mean in samtools flagstat 0 ”194492 + 0 in total (QC-passed reads + QC-failed reads) 80 + 0 secondary 0 + 0 supplementary 0 + 0 duplicates 193804 + 0 mapped (99.65% : N/A) 194412 +…

Continue Reading I wonder if someone please explain what secondary, supplementary, duplicates and paired in sequencing mean in samtools flagstat

Optimize a script that extract features from Fasta file using biopython

Hey, I have a script that extract features from a large fasta file (1767 MB) using biopython. I am sending it as a bash job via ssh remote server. The job is running for two days now.. Is there a way to optimize my script? I think maybe the problem…

Continue Reading Optimize a script that extract features from Fasta file using biopython

chromoMap function – RDocumentation

Usage chromoMap( ch.files, data.files, title = c(), ch_gap = 5, ploidy = 1, top_margin = 25, left_margin = 50, chr_width = 15, chr_length = 4, chr_color = c(“black”), data_based_color_map = FALSE, segment_annotation = FALSE, lg_x = 0, lg_y = 0, data_type = c(“numeric”, “categorical”), labels = FALSE, canvas_width = NULL,…

Continue Reading chromoMap function – RDocumentation

Ubuntu Manpage: sambamba-view – tool for extracting information from SAM/BAM files

Provided by: sambamba_0.8.2+dfsg-2_amd64 NAME sambamba-view – tool for extracting information from SAM/BAM files SYNOPSIS sambamba view OPTIONS <input.bam | input.sam> [region1 […]] DESCRIPTION sambamba view allows to efficiently filter SAM/BAM files for alignments satisfying various conditions, as well as access its SAM header and information about reference sequences. In order…

Continue Reading Ubuntu Manpage: sambamba-view – tool for extracting information from SAM/BAM files

bedtools -u not giving unique files

bedtools -u not giving unique files 1 The following are the steps Im following: First step to extract sample using bed file is this (here the bedfile is input bedfile converted to Hg38): tabix -h -R Hg19_to_Hg38_sorted.bed.gz gnomad.genomes.v{g_version}.hgdp_tgp.chr{chr}.vcf.bgz | perl {vcftools} -c {sample_name} > {sample_name}_out.vcf’ output({sample_name}_out.vcf’) chr2 113982416 rs56177103 TATAAAATAAAATAAA…

Continue Reading bedtools -u not giving unique files

Does anyone know how to get the headers for a bam.tdf file converted to a bedgraph file?

Does anyone know how to get the headers for a bam.tdf file converted to a bedgraph file? 0 I followed this thread: Conversion from tdf to bed format Converted like this: igvtools tdftobedgraph file.tdf file.bedgraph Now I have a bedgraph without headers but I have no idea what the last…

Continue Reading Does anyone know how to get the headers for a bam.tdf file converted to a bedgraph file?

split gtex genotype data by chromosomes.

Hello, I used and edited the command line to use –vcf to import vcf file. I used these commands: for chr in $(seq 1 22); do      plink –vcf /dbGAP/GTEx_Analysis_2017-06-05_v8_WholeExomeSeq_979Indiv_VEP_annot.vcf.gz            –chr $chr            –recode            –out…

Continue Reading split gtex genotype data by chromosomes.

rs532111960 RefSNP Report – dbSNP

Help Variant Details tab shows known variant placements on genomic sequences: chromosomes (NC_), RefSeqGene, pseudogenes or genomic regions (NG_), and in a separate table: on transcripts (NM_) and protein sequences (NP_). The corresponding transcript and protein locations are listed in adjacent lines, along with molecular consequences from Sequence Ontology. When…

Continue Reading rs532111960 RefSNP Report – dbSNP

sam – Use Htslib to create auxilary tags in bam file C++

I am creating a threaded c++ file where i generate in silico bam files, using header, DNA sequence and read information. First i use bam_init1() to create the bam1_t structure just named “b”. Then i use bam_set1 to create the actual sequence entry in the bam file bam_set1(b,read_id_length,READ_ID,flag,chr_idx,min_beg,mapq,n_cigar,cigar,-1,-1,0,strlen(DNAsequence),DNAsequence,quality_string,l_aux) And finally…

Continue Reading sam – Use Htslib to create auxilary tags in bam file C++

use tcgabiolinks package to download TCGA data

TCGA Data download in terms of ease of use ,RTCGA The bag should be better , And because it’s already downloaded data , The use is relatively stable . But also because of the downloaded data , There is no guarantee that the data is new .TCGAbiolinks The package is…

Continue Reading use tcgabiolinks package to download TCGA data

Find Transposon Element insertions using long reads (nanopore), by alignment directly. (minimap2)

find_te_ins is designed to find Transposon Element (TE) insertions using long reads (nanopore), by alignment directly. (minimap2) Install $ git clone github.com/bakerwm/find_te_ins.git&#13; $ cd find_te_ins Change the following variables upon your condition: genome_fa and te_fa in line-10 and line-11; $ bash run_pipe.sh run_pipe.sh Prerequisite minimap2 – 2.17-r974-dirty, align long…

Continue Reading Find Transposon Element insertions using long reads (nanopore), by alignment directly. (minimap2)

criteria for pvOutputThreshold = 1e-100 in EQTLMATRIX and creation of geneloc.txt for ciseQTL

criteria for pvOutputThreshold = 1e-100 in EQTLMATRIX and creation of geneloc.txt for ciseQTL 0 I would like to know the following points in MATRIXEQTL: (i) what is good threshold value for pvOutputThreshold = 1e-5 / 1e-6 / 1e-100 in MATRIXeQTL ? (ii). Best model for eQTL Linear/cross_linear/Anova when I don’t…

Continue Reading criteria for pvOutputThreshold = 1e-100 in EQTLMATRIX and creation of geneloc.txt for ciseQTL

rs9789283 RefSNP Report – dbSNP

Help Variant Details tab shows known variant placements on genomic sequences: chromosomes (NC_), RefSeqGene, pseudogenes or genomic regions (NG_), and in a separate table: on transcripts (NM_) and protein sequences (NP_). The corresponding transcript and protein locations are listed in adjacent lines, along with molecular consequences from Sequence Ontology. When…

Continue Reading rs9789283 RefSNP Report – dbSNP

Samtools flagstat confusing result of a merged bam file

Hi, I am a bioinformatics student and I am struggling with an issue, I had paired-end fastq files for one sample with some low-quality bases at the end and adapter contamination, so I went and I trimmed my reads with trimmomatic, it gave me 4 files that I used for…

Continue Reading Samtools flagstat confusing result of a merged bam file

Plotting date intervals in ggplot2

I have a dataset which has a bunch of date intervals (i.e. POSIXct format start dates and end dates). In the example provided, let’s say it’s each period is associated to when someone was in school or out of school. I’m interested in plotting the data in ggplot2, each row…

Continue Reading Plotting date intervals in ggplot2

Convert DNAStringSet to a list of elements in R? (Error in seq[[1]][[“seq”]] : subscript out of bounds in R)

I have a bed file which contains DNA sequences information as follow: ** track name=”194″ description=”194 methylation (sites)” color=0,60,120 useScore=1 chr1 15864 15866 FALSE 894 + chr1 534241 534243 FALSE 921 – chr1 710096 710098 FALSE 729 + chr1 714176 714178 FALSE 12 – chr1 720864 720866 FALSE 988 -…

Continue Reading Convert DNAStringSet to a list of elements in R? (Error in seq[[1]][[“seq”]] : subscript out of bounds in R)

Ubuntu Manpage: samtools reheader – replaces the header in the input file

Provided by: samtools_1.13-2_amd64 NAME samtools reheader – replaces the header in the input file SYNOPSIS samtools reheader [-iP] [-c CMD | in.header.sam ] in.bam DESCRIPTION Replace the header in in.bam with the header in in.header.sam. This command is much faster than replacing the header with a BAM→SAM→BAM conversion. By default…

Continue Reading Ubuntu Manpage: samtools reheader – replaces the header in the input file

[SOLVED] changing the order of input changes samtools merge ouput

I realized that this is a stupid mistake I have made. Since samtools do not overwrite the files by default, the output that I get from samtools merge output.bam f2.bam f1.bam wan’t what I thought it was below is my original post ++++++++++++++++++++++++++ I’m using samtool/1.9.0 and I’m trying to…

Continue Reading [SOLVED] changing the order of input changes samtools merge ouput

The Genetic Architecture of Sleep Health Scores in the UK

Introduction Sleep is a complex neurological and physiological state. It is defined as a natural and reversible state of reduced responsiveness to external stimuli and relative inactivity, accompanied by a loss of consciousness.1 Sleep disorders can be classified as seven major categories: insomnia disorders, sleep-related breathing disorders, central disorders of…

Continue Reading The Genetic Architecture of Sleep Health Scores in the UK

Overestimation of number of reads from nanopore data (flagstat)

Same issue as mentioned on the minimap2 tool: github.com/lh3/minimap2/issues/236#issue-361097444 For example nanopore reads aligned to the host transcriptome the flagstat output is: 5953480 + 0 in total (QC-passed reads + QC-failed reads) 2961480 + 0 secondary 22696 + 0 supplementary 0 + 0 duplicates 4195469 + 0 mapped (70.47% :…

Continue Reading Overestimation of number of reads from nanopore data (flagstat)

Samtools flagstat

Samtools flagstat 1 I aligned my ONT sequencing run with minimap2, subsequently I filtered the file using samtools view -b -F 256 aln_transcriptome_sorted_6.bam -o filtered_aln_transcriptome_6.bam to end up with primary alignments only. When I run samtools flagstat on the filtered file I get the following output: 3502608 + 0 in…

Continue Reading Samtools flagstat

r – Changing chr to datetime on RStudio

Want to improve this question? Update the question so it’s on-topic for Cross Validated. Closed 4 hours ago. I would like to change a chr column in my dataframe to a datetime column. Date Value 1 01/7/20 13:05 100 2 01/7/20 13:15 102 3…

Continue Reading r – Changing chr to datetime on RStudio

Regions File Format – ANGSD-wrapper/angsd-wrapper Wiki

ANGSD-wrapper prefers the regions file to be formatted as chr_name:start_position-end_position. Below, we will create a toy BED file as an example and show how we can go from BED file format to ANGSD-wrapper’s regions file format. Create toy BED file Let’s create an example BED file. You can run the…

Continue Reading Regions File Format – ANGSD-wrapper/angsd-wrapper Wiki

Monocle3 differential expression failed when active.assay is not “RNA”

after run estimate_size_factors, data with active.assay = ‘integrated’ works too, but no deg in the result. > [email protected] = ‘integrated’ > cds_raw <- as.cell_data_set(seurat_object) Warning: Monocle 3 trajectories require cluster partitions, which Seurat does not calculate. Please run ‘cluster_cells’ on your cell_data_set object > cds <- cluster_cells(cds_raw) > pr_graph_test_res <-…

Continue Reading Monocle3 differential expression failed when active.assay is not “RNA”

Failed to instantiate plugin dbNSFP in VEP

Failed to instantiate plugin dbNSFP in VEP 0 Hi Team, My VEP (version 105, installed by perl INSTALL.pl) works well. But I face some problems to use dbNSFP plugin (also installed by perl INSTALL.pl) with VEP tool. My dbNSFP version 4.2a was installed by the following code without any warning…

Continue Reading Failed to instantiate plugin dbNSFP in VEP

How to convert bedgraph file with bins into GRanges object?

You could convert your bedGraph bins from hg18 to hg19 using liftover, so you can overlap them with your peaks. You would read them into a GRanges object, then hand this to the liftover function to translate from hg18 to hg19, then unlist the results to get back a regular…

Continue Reading How to convert bedgraph file with bins into GRanges object?

laboratory jobs in germany

We wish you a good luck and have a prosperous career. Working at Labcorp | Jobs and Careers at Labcorp 15 GNeuS Postdoc Positions in Neutron Science of 24 Months Each (Full-time Job) FZJ – Forschungszentrum Jülich. What other similar jobs are there to Laboratory jobs in Germany? Clinical Laboratory…

Continue Reading laboratory jobs in germany

snakemake truncating shell codes

snakemake truncating shell codes 0 I’m trying to change the chromosome number notation from [0-9XY] to Chr[0-9XY] using the samtools reheader in the shell command of the snakemake. rule rename: input: os.path.join(config[“input”], “{sample}.bam”), output: os.path.join(config[“output”], “new_sample/{sample}_chr.bam”) log: os.path.join(config[“log”], “samtools/{sample}”) shell: “samtools view -H {input} | sed -e ‘s/SN:([0-9XY]*)/SN:chr1/’ -e ‘s/SN:MT/SN:chrM/’…

Continue Reading snakemake truncating shell codes

RStudio AI Weblog: Coaching ImageNet with R

ImageNet (Deng et al. 2009) is a picture database organized in keeping with the WordNet (Miller 1995) hierarchy which, traditionally, has been utilized in pc imaginative and prescient benchmarks and analysis. Nonetheless, it was not till AlexNet (Krizhevsky, Sutskever, and Hinton 2012) demonstrated the effectivity of deep studying utilizing convolutional…

Continue Reading RStudio AI Weblog: Coaching ImageNet with R

r – ggplot: Try to plot boxplots with geom_rect on its background, but keep having error with object “variable” not found

I was almost desperate with this error after working on this for 4 hrs, googled and looked from past posts already. Here is my data structure: str(tcga_exp) ‘data.frame’: 11775 obs. of 5 variables: $ cohort: chr “BRCA-Basal.Tumor” “BRCA-LumA.Tumor” “BRCA-LumB.Tumor” “BRCA-LumA.Tumor” … $ exp : num 6.35 5.54 6.56 5.05 5.98…

Continue Reading r – ggplot: Try to plot boxplots with geom_rect on its background, but keep having error with object “variable” not found

Create junctions from Bed file for IGV visualization

Create junctions from Bed file for IGV visualization 0 Any advice for creating junctions file from a bed-like file? My bed file looks like this: chr start end chr star end I have tried to copy the format used in TopHat (junctions file). But I can’t see the junctions in…

Continue Reading Create junctions from Bed file for IGV visualization

r – RSQlite – Find values with most occurences in group

I’m using RSQlite to import Datasets from an SQlite-Database. There are multiple millions of observations within the Database. Therefor I’d like to do as much as possible of Data selection and aggregation within the Database. At some point I need to aggregate a character variable. I want to get the…

Continue Reading r – RSQlite – Find values with most occurences in group

Doubt samtools flagstat

Doubt samtools flagstat 0 I’d like to see the percentage of how many sequences align with my decrementing sequence and I’ve come to this sample table. But wanted to know what use? The mapped (80.94% : N/A)or properly paired (0.06% : N/A) percentage? 1036193 + 0 in total (QC-passed reads…

Continue Reading Doubt samtools flagstat

find positions of a short sequence in a genome

Here’s a demo Python script you can modify for your use, which suggests the rough principle: #!/usr/bin/env python import sys import re bed = “””chr1t0t10tABCDEFGHIJ chr1t5t15tFGHIJABCDO chr1t10t20tABCDOPABCD””” string_to_match = sys.argv[1] pattern = re.compile(string_to_match) for line in bed.split(“n”): (chr, start, stop, id) = line.split(“t”) for match in pattern.finditer(id): sys.stdout.write(“t”.join([chr, str(int(start) +…

Continue Reading find positions of a short sequence in a genome

GitHub – AI-sandbox/gnomix

This repository includes a python implemenation of Gnomix, a fast and accurate local ancestry method. Gnomix can be used in two ways: training a model from scratch using reference training data or loading a pre-trained Gnomix model (see Pre-Trained Models below) In both cases the models are used to infer…

Continue Reading GitHub – AI-sandbox/gnomix

Convert SNP IDs as chr:pos:effect allele:ref allele to rsIDs

Convert SNP IDs as chr:pos:effect allele:ref allele to rsIDs 0 I have a set of 58000 SNPs for which the SNP ID is in the format of: chr:pos:effect allele:ref allele (Grch37 build), but I need to convert this to rsID where one is available for the SNP. I’ve tried using…

Continue Reading Convert SNP IDs as chr:pos:effect allele:ref allele to rsIDs

Homer finds same peak multiple times

I am using Homer to identify peaks in RNA-seq data and then determine differential expression by counting reads per peak. Homer has a lovely package that does just this: getDifferentialPeaksReplicates.pl. The issue is that for some reason Homer returns the same peak multiple times in its final output (Bonus question:…

Continue Reading Homer finds same peak multiple times

Is the Ensembl GRCh38 genome assembly more up to date than the UniProtKB online database?

Dear all, I am working with a list of Ensembl accession codes for a desired group of proteins. I have downloaded the protein annotations related to the genome assembly GRCH38. I fetched the genomic coordinates from UniProtKB API service using the Ensembl accession codes. The service provide a protein annotation…

Continue Reading Is the Ensembl GRCh38 genome assembly more up to date than the UniProtKB online database?

clusterProfiler won’t read gene list

clusterProfiler won’t read gene list 0 So I have a list of DE genes that I would like to analyse for enriched GO and KEGG terms. I was going to use clusterProfiler for this, but I can’t seem to get past constructing the gene list. I have followed the vignette…

Continue Reading clusterProfiler won’t read gene list

Forge a BSgenome data package

My supervisor has requested that I create coverage plots to visualize BAM alignments of RNA-Seq data. I though a good way to do this would be to use Gviz. We work on the model legume Medicago truncatula which does not have a BSgenome package so I though I’d try and…

Continue Reading Forge a BSgenome data package

How can I find reads for specific elements in a bam file?

Hi, I have a specific set of 1,009 elements in a bed file that I am interested in. I also have bam files which I would like to process to know the number of reads for these specific elements (for comparison purposes). I understand some simple uses of samtools commands,…

Continue Reading How can I find reads for specific elements in a bam file?

Question about ROH analysis by Plink 1.9

Hi all, I have recently tried to estimate runs of homozygosity (ROH) from my vcf file by using plink 1.9. I ran following code to generate binary files that plink required: plink –vcf myfile.vcf –make-bed –out out_name –no-sex –no-parents –no-fid –no-pheno –allow-extra-chr This vcf file only contains one individual and…

Continue Reading Question about ROH analysis by Plink 1.9

Gene coordinates for hg19

Gene coordinates for hg19 0 Hi, is there a list which gives for each gene its starting coordinate (chr:pos) and its ending one with respect to the hg19 reference genome? I have a list of positions on hg19 expressed as chr:pos and I have to assign each one to the…

Continue Reading Gene coordinates for hg19

Obtaining The Snp Rs Number With The Chromosomal Position

Obtaining The Snp Rs Number With The Chromosomal Position 3 This question is similar to this one (Get rs number based on position). I have a text file with SNPs in the chr:position format 10:71086 10:72876 10:75794 I was wondering if there is an R package (or perhaps one in…

Continue Reading Obtaining The Snp Rs Number With The Chromosomal Position

Legacy genetics of Arachis cardenasii in the peanut crop shows the profound benefits of international seed exchange

Significance A great challenge for humanity is feeding its growing population while minimizing ecosystem damage and climate change. Here, we uncover the global benefits arising from the introduction of one wild species accession to peanut-breeding programs decades ago. This work emphasizes the importance of biodiversity to crop improvement: peanut cultivars…

Continue Reading Legacy genetics of Arachis cardenasii in the peanut crop shows the profound benefits of international seed exchange

Phasing with SHAPEIT

Edit June 7, 2020: The code below is for pre-phasing with SHAPEIT2. For phased imputation using the output of SHAPEIT2 and ultimate production of phased VCFs, see my answer here: A: ERROR: You must specify a valid interval for imputation using the -int argument, So, the steps are usually: pre-phasing…

Continue Reading Phasing with SHAPEIT

Produce PCA bi-plot for 1000 Genomes Phase III

Note1 – Previous version: Produce PCA bi-plot for 1000 Genomes Phase III in VCF format (old) Note2 – this data is for hg19 / GRCh37 Note3 – GRCh38 data is available HERE The tutorial has been updated based on the 1000 Genomes Phase III imputed genotypes. The original tutorial was…

Continue Reading Produce PCA bi-plot for 1000 Genomes Phase III

Get gene names from rs SNP ids

Gene to rs id library(biomaRt) ## It might take long time to process if many genes (>50) in the list. ## hgnc_gene_symbols.txt is the file that has the list of gene symbols one per line. genes <- read.table(“~/hgnc_gene_symbols.txt”) ensembl = useMart(“ensembl”, dataset=”hsapiens_gene_ensembl”) dbsnp = useMart(“snp”, dataset = “hsapiens_snp”) getHGNC2ENSG =…

Continue Reading Get gene names from rs SNP ids

Row names and probe names does not match in topTable output

Row names and probe names does not match in topTable output 0 Hello I am using limma to analyze differential methylation on a 850k Illumina array, and set up my model as recommended by the user guide. Today I noticed after running topTable() that the rownames in the result data…

Continue Reading Row names and probe names does not match in topTable output

The result of plink –freq is filled with NA

The result of plink –freq is filled with NA 0 I downloaded the vcf file. Then I used plink to convert it to a bed file and calculated the array frequency. However, the result of plink –freq was filled with NA. Can anyone give us an opinion? command ① ./plink –vcf…

Continue Reading The result of plink –freq is filled with NA

MAPQ (Mapping quality) of 0 for most reads from BWA-MEM2 (with no secondary alignment or other apparent reason)

Hello, I got a very weird output from BWA-mem2 – most of the reads have mapping quality of 0, even though there is no secondary alignment or anything else suspicious. I got sequencing data that was aligned with Novoalign to hg18, the data was bam files. I needed to realign…

Continue Reading MAPQ (Mapping quality) of 0 for most reads from BWA-MEM2 (with no secondary alignment or other apparent reason)

Manhattan plot how to reduce space between axis and axis-labels

Manhattan plot how to reduce space between axis and axis-labels 0 Hello everyone, I have plotted the Manhattan plot via qqman in R. However, it leaves huge white space between axis ticks and axis labels and also between axis labels and axis labs. Could someone offer any tip to reduce…

Continue Reading Manhattan plot how to reduce space between axis and axis-labels

Samtools Depth Option For More Than One Bam Files

Samtools Depth Option For More Than One Bam Files 1 Hi everyone, I’ve been stuck on this for several days. I want to use the samtools depth command but not only for a single bam file. I need to find a way to include all my bam files downloaded in…

Continue Reading Samtools Depth Option For More Than One Bam Files

PRSice-2 without Ref SNP ID

PRSice-2 without Ref SNP ID 1 Does PRSice-2 support a base tile that has chromosome number/name and chromosome position instead of reference SNP ID in the base file? I’m trying to calculate PRS scores using a weights file from the PGS catalog with ~6 million variants. The file has only…

Continue Reading PRSice-2 without Ref SNP ID

Get rsID for a list of SNPs in an entire GWAS sumstats file

Here is a fairly efficient way to do this; assuming hg38 and BEDOPS and standard Unix tools installed. $ bedmap –echo –echo-map-id –delim ‘t’ <(awk ‘{n=split($0,a,/[:_]/); print “chr”a[1]”t”a[2]”t”a[2]+1″t”a[3]”https://www.biostars.org/”a[4];}’ sumstats.txt | sort-bed -) <(wget -qO- hgdownload.cse.ucsc.edu/goldenPath/hg38/database/snp150.txt.gz | gunzip -c | cut -f2-5 | sort-bed -) > answer.bed This gets around making…

Continue Reading Get rsID for a list of SNPs in an entire GWAS sumstats file

Fasta.fai file error

Fasta.fai file error 0 Hi, I have been struggling with an error in bedtools intersect. The command I am trying to run is as follows bedtools intersect -a sorted.vcf -b nstd166.GRCh38.variant_call_chr.vcf.gz -wo -sorted -f 0.8 -r -g Homo_sapiens_assembly38.fasta.fai For some of the files that I am assessing, I don’t get…

Continue Reading Fasta.fai file error

karyoploteR: uncircle your genomes

Hi all, I’d like to present karyoploteR, an R/Bioconductor package we have developed to plot any data on any genome in non-circular layouts. The goal of this project was to develop a tool as flexible as Circos, but easier to use and representing genomes as straight lines instead of circles,…

Continue Reading karyoploteR: uncircle your genomes

List of human protein coding genes with given name (known function?)

List of human protein coding genes with given name (known function?) 2 Hello, To put it simply, I am doing differential expression analysis on human RNA-seq data and I want to focus my analysis of genes that are: 1) Protein coding, so no SNOR or MIR 2) Genes with a…

Continue Reading List of human protein coding genes with given name (known function?)

vcf file analysis

vcf file analysis 0 Hello everyone, I have 22 vcf file for each chr. They were in genome build hg19 so I did a liftover and convert them to hg38 genome build. Now I need just chrom and position values from these vcf files and merge them together into a…

Continue Reading vcf file analysis

PLINK Haplotype blocks estimation not working

Hi, I am using PLINK to estimate haplotype blocks using Gabriel’s method. I am using the following command plink –file Chr$PBS_ARRAY_INDEX –noweb –all –blocks –ld-window-kb 500 And it seemed to be working just fine but when job finished no blocks were called at all. The log file does not mention…

Continue Reading PLINK Haplotype blocks estimation not working

Find overlaping sequences with pyranges from overlap

I am trying to replicate the mergeByOverlap function from R BioConductor in python using the pyranges package. In R the code would be: gr.snp <- with(gr.snp, GRanges(chr, IRanges(start, end),rsid=gr.snp$rsid)) snp.annotated <- data.frame(mergeByOverlaps(gr.snp, gencode, maxgap=2000, type=”start”)) which returns: nrow(snp.annotated) [1] 34 colnames(snp.annotated) [1] “gr.snp.seqnames” “gr.snp.start” [3] “gr.snp.end” “gr.snp.width” [5] “gr.snp.strand” “gr.snp.rsid”…

Continue Reading Find overlaping sequences with pyranges from overlap

methylation beta distribution (minfi generated)

methylation beta distribution (minfi generated) 0 Hello, I am analyzing EPIC methylation array and did necessary filtering for cross-reactive probes, common snps, excluded XY chr. ~10% of my samples cluster separately (which I am calling “outliers” for now) than the rest. Since these samples are collected from human brain with…

Continue Reading methylation beta distribution (minfi generated)

Indels statistics

Hi, I have a vcf statistics for heterozygote and homozygote cases and I would like to find matches with my maf file. The issue is that the reference field in maf file is different and it exlcudes nucleotides in alternative states, e.g. if you have a ref CAA and alternative…

Continue Reading Indels statistics

phase_trio.sh | searchcode

phase_trio.sh | searchcode PageRenderTime 24ms CodeModel.GetById 16ms app.highlight 5ms RepoModel.GetById 1ms app.codeStats 0ms /Phase/phase_trio.sh github.com/BioinformaticsArchive/fCNV Shell |…

Continue Reading phase_trio.sh | searchcode

Color label of rainfall plot drawn by KaryoploteR

You can use the standard legend() command as outlined in this issue here: support.bioconductor.org/p/124328/ Minimal example based on bernatgel.github.io/karyoploter_tutorial//Examples/Rainfall/Rainfall.html : library(karyoploteR) somatic.mutations <- read.table(file=”ftp://ftp.sanger.ac.uk/pub/cancer/AlexandrovEtAl/somatic_mutation_data/Pancreas/Pancreas_raw_mutations_data.txt”, header=FALSE, sep=”t”, stringsAsFactors=FALSE) somatic.mutations <- setNames(somatic.mutations, c(“sample”, “mut.type”, “chr”, “start”, “end”, “ref”, “alt”, “origin”)) somatic.mutations <- split(somatic.mutations, somatic.mutations$sample) sm <- somatic.mutations[[“APGI_1992”]] sm.gr <- toGRanges(sm[,c(“chr”, “start”, “end”,…

Continue Reading Color label of rainfall plot drawn by KaryoploteR

Rscript match

Rscript match 0 I have two dataframe. One is vcf. Its content is : ** head(vcf) X.CHROM POS ID CHROM_POS 1 chr1 100000421 rs1047982323 chr1_100000421 2 chr1 100000827 rs1375386196 chr1_100000827 3 chr1 100001753 rs866745787 chr1_100001753 4 chr1 100001904 rs1416462966 chr1_100001904 5 chr1 100002334 rs1220478954 chr1_100002334 6 chr1 100002490 rs181634796 chr1_100002490**…

Continue Reading Rscript match

bash script

bash script 3 Hello everyone, I have a file like this: RSID1 RSID2 chr1_169894240_G_T_b38 chr1_169894240_G_T_b38 chr1_169894240_G_T_b38 chr1_169891332_G_A_b38 chr1_169891332_G_A_b38 chr1_169891332_G_A_b38 chr1_169661963_G_A_b38 chr1_169661963_G_A_b38 chr1_169661963_G_A_b38 chr1_169697456_A_T_b38 chr1_169697456_A_T_b38 chr1_169697456_A_T_b38 chr1_27636786_T_C_b38 chr1_27636786_T_C_b38 chr1_196651787_C_T_b38 chr1_196651787_C_T_b38 chr6_143501715_T_C_b38 chr6_143501715_T_C_b38 I want to extract info just like: chr1_169894240 chr1_169894240. I don’t want to have other info. I just want…

Continue Reading bash script

spots not filling the whole tissue image

Issue with Seurat SpatialPlot: spots not filling the whole tissue image 0 In Seurat, SpatialPlot generates a plot with an enlarged/expanded image of tissue section as compared to the original spot image. This seems to happen on the relatively small image with a number of spots around 500. I ‘d…

Continue Reading spots not filling the whole tissue image

How To Filter Mapped Reads With Samtools

Hi, You get a bam (machine readable sam) file after mapping, and it contains information about mapped and unmapped reads. To get the unmapped reads from a bam file use: samtools view -f 4 file.bam > unmapped.sam the output will be in sam to get the output in bam, use:…

Continue Reading How To Filter Mapped Reads With Samtools

working with .gmt files

working with .gmt files 3 Hi! I have downloaded a pathway data set in .gmt format form the GSEA website. I’m wondering how can I properly read this data set in R. Could anyone help me? Thank you!   myposts • 9.5k views • link updated 2 hours ago by…

Continue Reading working with .gmt files

Looking up Gene IDs in R

Looking up Gene IDs in R 1 Hello, Given a list of gene names, I need to create a table containing the Ensemble ID, chromosome, start, end of that gene. Example: ## ens_id gene view chr start end ## 1: ENSG00000243485 MIR1302-2HG Gene Expression chr1 29553 30267 ## 2: ENSG00000237613…

Continue Reading Looking up Gene IDs in R

Platypus

Platypus 0 Hi, I’m super new to WGS and bioinformatics, but I’m a classic software data scientist, so I know enough to be annoying. I’m using Platypus too call variants on 100X WGS via Nebula Genomics. I found an odd series of calls and am not sure if this is…

Continue Reading Platypus

IOError [errno 2] No such file or directory: ‘-o’

TopHat error: IOError [errno 2] No such file or directory: ‘-o’ 2 Hello everyone. I’m now running tophat in our server. First, I just simply tried “tophat -p 8 -G <gtf_file> <ref_genome> <fa1><fa2>” and it worked. Then I wrote a for loop scripts but It reported error: [2019-04-03 10:49:16] Beginning…

Continue Reading IOError [errno 2] No such file or directory: ‘-o’

Parsing snp result

Parsing snp result 1 I am trying to parse dbSNP results into data frame in python, I got the result as “bytes” and I wonder if there is a way to parse it into dataframe. I tried multiple xml packages (xml, lxml) but they are not able to separate the…

Continue Reading Parsing snp result

How to properly combine two bam files of a paired-end data

How to properly combine two bam files of a paired-end data 3 Hi all! I am mapping a paired-end read separately using bowtie2. After that, I want to combine the two bam file into one for downstream analysis. How to properly do this combination? I tried: samtools sort -n R1.bam…

Continue Reading How to properly combine two bam files of a paired-end data

PLINK ASSOC understanding the results

PLINK ASSOC understanding the results 1 Hello to all, I have 10 vcf files – 5 female fish and 5 male fish I have merged all 10 fish to one vcf file.(all_fish.vcf) I performed the plink association analysis on all 10 fish with the command: -noweb –const-fid –allow-no-sex –allow-extra-chr –pheno…

Continue Reading PLINK ASSOC understanding the results

How to set variant FILTER in a VCF file based on overlap with regions in a BED file

I figured out how to do the annotation using BCFTools. 2 steps are needed. Input BED file requires 1 for each region where the annotation should be set Chr_01 1000 2000 1 Chr_05 5000 6000 1 Input header file: ##INFO=<ID=BAD_REGION,Number=0,Type=Flag,Description=”My bad region for some reason”> bgzip and tabix the bed…

Continue Reading How to set variant FILTER in a VCF file based on overlap with regions in a BED file

Allele frequency calculation

Allele frequency calculation 0 Hello everyone, I use vcf tools to find AF values by using this command: vcftools –gzvcf $SUBSET_VCF –freq2 –out $OUT –max-alleles 2 The output I got from this is: chr pos nalleles nchr a1 a2 <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 22 16050408 2 846…

Continue Reading Allele frequency calculation

extract list of SNPs from multiple chr{1:22}.bgen files using plink2

extract list of SNPs from multiple chr{1:22}.bgen files using plink2 1 hello, I have extracted out list of snps based on the maf cutoff 0,,0.0001, 0.001,0.01,0.1,.55,1.0. I am running plink2 to extract this list from .bgen files for individual chromosomes using the following code plink2 –chr{1:22}.bgen –extract maf1_snps for imputed…

Continue Reading extract list of SNPs from multiple chr{1:22}.bgen files using plink2

How to separate sub-families from transposons sequence based fasta files?

How to separate sub-families from transposons sequence based fasta files? 1 I’m working on the classification of transposable elements. I want to retrieve sequences of their sub-classes in separate files. Is there any code or tool present to separate their sub-families because dataset contains thousands of sequence entries for different…

Continue Reading How to separate sub-families from transposons sequence based fasta files?

Converting an S4 object into a dataframe in R

I have an S4 object named ‘res’ which I got while using an R package called RDAVIDWebService. I can’t seem to find a way to convert this object into a dataframe in R. I tried using the function ‘as.data.frame(res)’ but it throws this error: > as.data.frame(res) Error in as.data.frame.default(res) :…

Continue Reading Converting an S4 object into a dataframe in R