Tag: BED

organizing a Bed file for bedtools getfasta

organizing a Bed file for bedtools getfasta 0 I am trying to use bedtools getfasta on some bed files, but the issue is that the peaks bed file columns are mixed up such that the first column with the chromosome names contains the peak location as well for some of…

Continue Reading organizing a Bed file for bedtools getfasta

Plotting date intervals in ggplot2

I have a dataset which has a bunch of date intervals (i.e. POSIXct format start dates and end dates). In the example provided, let’s say it’s each period is associated to when someone was in school or out of school. I’m interested in plotting the data in ggplot2, each row…

Continue Reading Plotting date intervals in ggplot2

Convert DNAStringSet to a list of elements in R? (Error in seq[[1]][[“seq”]] : subscript out of bounds in R)

I have a bed file which contains DNA sequences information as follow: ** track name=”194″ description=”194 methylation (sites)” color=0,60,120 useScore=1 chr1 15864 15866 FALSE 894 + chr1 534241 534243 FALSE 921 – chr1 710096 710098 FALSE 729 + chr1 714176 714178 FALSE 12 – chr1 720864 720866 FALSE 988 -…

Continue Reading Convert DNAStringSet to a list of elements in R? (Error in seq[[1]][[“seq”]] : subscript out of bounds in R)

Failure to detect mutations in U2AF1 due to changes in the GRCh38 reference sequence

Materials and Methods Genomic data was collected as part of the MDS National History Study or The Cancer Genome Atlas project and consented appropriately under those protocols 8 Sekeres M.A. Gore S.D. Stablein D.M. DiFronzo N. Abel G.A. DeZern A.E. Troy J.D. Rollison D.E. Thomas J.W. Waclawiw M.A. Liu J.J….

Continue Reading Failure to detect mutations in U2AF1 due to changes in the GRCh38 reference sequence

Bioconda faststructure – gitmetadata

I am using the conda env of faststructure from bioconda channel. Got this error messages. Could it be that the bioconda package needs to be updated? Best regards: python structure.py structure.py:3: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88 import fastStructure structure.py:4: RuntimeWarning: numpy.dtype size changed,…

Continue Reading Bioconda faststructure – gitmetadata

processing in strelka2 with multiples bam file in directory

processing in strelka2 with multiples bam file in directory 0 If I manually tell strelka2 to use these three bam files below, then I get the desired results of 3 individually genome files in results/variants. xxx_00.bam yyy_01.bam zzz_02.bam ${path_to_strelka}/bin/configureStrelkaGermlineWorkflow.py –bam xxx_00.bam –bam yyy_01.bam –bam zzz_02 –referenceFasta <fasta> –callRegions <.bed.gz> –runDir…

Continue Reading processing in strelka2 with multiples bam file in directory

Getting peak heights from TF chIP-seq data (wig file)

Getting peak heights from TF chIP-seq data (wig file) 1 Hello everyone, I have TF ChIP seq data from NCBI GEO in wig format. I converted wig to bedgraph and then used MACS peak caller to get bed narrowpeak files.I further uploaded file on genome browser to get graphical map…

Continue Reading Getting peak heights from TF chIP-seq data (wig file)

Regions File Format – ANGSD-wrapper/angsd-wrapper Wiki

ANGSD-wrapper prefers the regions file to be formatted as chr_name:start_position-end_position. Below, we will create a toy BED file as an example and show how we can go from BED file format to ANGSD-wrapper’s regions file format. Create toy BED file Let’s create an example BED file. You can run the…

Continue Reading Regions File Format – ANGSD-wrapper/angsd-wrapper Wiki

bedtools intersect error: Invalid record in file

Hello to all I am trying to run bedtools intersect with vcf file and a bed file (my goal is to add the depth data to my VCF) I get an error running this command: bedtools intersect -a depth.bed -b fish.vcf -wa -wb > $out The error: “Error: Invalid record…

Continue Reading bedtools intersect error: Invalid record in file

What is RNAcentral? | RNAcentral

RNAcentral is a database of non-coding RNA sequences that aggregates ncRNA data from over 40 member resources known as Expert Databases.1 Non-coding RNAs Similar to mRNAs, non-coding RNAs (ncRNAs) are transcribed from DNA but are not translated into proteins. NcRNAs are found in all organisms and have a broad range…

Continue Reading What is RNAcentral? | RNAcentral

bedtools genomecov problem with merged bam

Hi, I was using puge haplotig, and in that work flow the first step was to use bedtools genomecov so I moved here. I have three paired end dataset, illumina wgs reads, HiC reads, and Chicago sequencing reads. I aligned the paired end reads of illumina wgs to the genome,…

Continue Reading bedtools genomecov problem with merged bam

How to convert bedgraph file with bins into GRanges object?

You could convert your bedGraph bins from hg18 to hg19 using liftover, so you can overlap them with your peaks. You would read them into a GRanges object, then hand this to the liftover function to translate from hg18 to hg19, then unlist the results to get back a regular…

Continue Reading How to convert bedgraph file with bins into GRanges object?

Systems biology analysis of human genomes points to key pathways conferring spina bifida risk

Significance Genetic investigations of most structural birth defects, including spina bifida (SB), congenital heart disease, and craniofacial anomalies, have been underpowered for genome-wide association studies because of their rarity, genetic heterogeneity, incomplete penetrance, and environmental influences. Our systems biology strategy to investigate SB predisposition controls for population stratification and avoids…

Continue Reading Systems biology analysis of human genomes points to key pathways conferring spina bifida risk

P-value cut-off to identify SNPs at ChIP-seq peaks? [BedTools]

P-value cut-off to identify SNPs at ChIP-seq peaks? [BedTools] 0 Hi all, I have a bed file of SNPs and also H3K27ac ChIP-seq .broadpeak file from Roadmap epigenome… I want to find the SNPs in my list that intersects a H3K27ac peak using BedTools intersect However, should I filter the…

Continue Reading P-value cut-off to identify SNPs at ChIP-seq peaks? [BedTools]

Reference panel data to be used for GCTA-COJO

Reference panel data to be used for GCTA-COJO 0 I performed a genome-wide meta-analysis based on summary statistics from the four cohorts to identify significant loci. Next, I would like to perform a conditional analysis using GCTA-COJO to search for SNPs independent of significant lead SNPs. I know that GCTA…

Continue Reading Reference panel data to be used for GCTA-COJO

Arrange the size of subplots in plotHeatmap deeptools figure

Arrange the size of subplots in plotHeatmap deeptools figure 1 Hi all, I am trying to generate a 10×10 (10 bw files + 10 bed files) figure using deeptools but I am having trouble arranging the size of subfigures. I want the subfigures in the same size but if I…

Continue Reading Arrange the size of subplots in plotHeatmap deeptools figure

computeMatrix in deeptool is Running with no result

computeMatrix in deeptool is Running with no result 0 Hi All, I wonder if someone can help me in explaining what to input on the -R <bed file> argument of the code below? computeMatrix scale-regions -S <bigwig file(s)> -R <bed file> -b 1000 what I did for example, I download…

Continue Reading computeMatrix in deeptool is Running with no result

A matrix sample for Profile plots and heatmaps of Computematrix, deepTools

A matrix sample for Profile plots and heatmaps of Computematrix, deepTools 0 Hi everyone, I have a count matrix from feature counts and of course, couple of peak (.bed) files. I want to visualize the peaks all together to show the coverage and overall comparing. I was going to use…

Continue Reading A matrix sample for Profile plots and heatmaps of Computematrix, deepTools

Using STAR SJ.out.tab file to identify novel ncRNAs

Using STAR SJ.out.tab file to identify novel ncRNAs 0 Hi All, I am attempting to identify novel ncRNAs from a circadian RNAseq dataset. Specifically I have a ribo-depleted RNAseq timecourse with 31 samples (sample every 2 hours for 60hrs). I have run STAR (code below). I am trying to follow…

Continue Reading Using STAR SJ.out.tab file to identify novel ncRNAs

Piranha Peak-Calling with multiple replicates

Piranha Peak-Calling with multiple replicates 0 I am trying to call RNA-Protein interation peaks by using Piranha software. I have multiple replicates for each experiment and the control data, and I can’t seem to understand how to combine them into one Piranha query. For example, if I was to call…

Continue Reading Piranha Peak-Calling with multiple replicates

PLINK sanity check – Bioinformatics Stack Exchange

I am a new user of PLINK and am analysing some SNP data for the first time. After creating a .bim file with $ plink –file my_data –make-bed I notice that for several SNPs my data is different from dbSNP e.g. rs145496306: BIM file: A G dbSNP: G>A,T rs3813199: BIM…

Continue Reading PLINK sanity check – Bioinformatics Stack Exchange

Genome Bioinformatics Analyst – Pittsburgh

**Description** UPMC Presbyterian is hiring a Genome Bioinformatics Analyst to join the Molecular and Genomic Pathology Laboratory (MGP) team! This role will work a daylight schedule Monday through Friday. No weekends or holidays are required! The Molecular and Genomic Pathology Laboratory (MGP) is a dynamic state-of-the-art clinical laboratory that prides…

Continue Reading Genome Bioinformatics Analyst – Pittsburgh

how to add reference alleles to VCF?

how to add reference alleles to VCF? 1 I’m converting gVCFs to VCF, but the reference alleles are missing. An example below: #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 180525_FD02929177 1 97547947 . T . . . DP=31 GT:DP:RGQ 0/0:31:81 1 97915614 . C . . . DP=40…

Continue Reading how to add reference alleles to VCF?

SNP extraction

SNP extraction 0 I want to extract specific SNPs of interest i have in a text file into an additive genetics model so that each SNP can be in a 0/1/2 format for each subject using genetics info in from PLINK (.bed, .bim, and .fam files). How can i do…

Continue Reading SNP extraction

Senior Bioinformatics Scientist II/ Staff Bioinformatics Scientist

Inscripta was founded in 2015 and recently launched the world’s first benchtop Digital Genome Engineering platform. The company is growing aggressively, investing in its leadership, team, and technology with a recent $150mm financing round led by Fidelity and TRowe price. The company’s advanced CRISPR-based platform, consisting of an instrument, reagents,…

Continue Reading Senior Bioinformatics Scientist II/ Staff Bioinformatics Scientist

bedops bedmap operation in python

bedops bedmap operation in python 1 I noticed there is a conda version of bedops bedmap function available. I’ve been struggling to use it though. Could someone refer me to any documentation of it’s usage in a python script please. Have a great day. Thanks in advance. python bedops bedmap…

Continue Reading bedops bedmap operation in python

How to call LOH with FreeC

How to call LOH with FreeC 0 Good morning, I am try to infer loss of heterozygosity (LOH) from WGS data using Freec. For this purpose, I am using these parameters in the “[BAF]” section of the configuration file: [BAF] makePileup = My_somaticVCF.vcf.gz fastaFile = hg19.fa SNPfile = hg19_snp142.SingleDiNucl.1based.txt.gz When…

Continue Reading How to call LOH with FreeC

How To Write Data In A Granges Object To A Bed File.

Given a GRanges object: gr <- GRanges(seqnames = Rle(c(“chr1”, “chr2”, “chr1”, “chr3”), c(1, 3, 2, 4)), ranges = IRanges(1:10, end = 7:16, names = head(letters, 10)), strand = Rle(strand(c(“-“, “+”, “*”, “+”, “-“)), c(1, 2, 2, 3, 2))) You can simply: df <- data.frame(seqnames=seqnames(gr), starts=start(gr)-1, ends=end(gr), names=c(rep(“.”, length(gr))), scores=c(rep(“.”, length(gr))),…

Continue Reading How To Write Data In A Granges Object To A Bed File.

Intersecting Roadmap’s Histone ChIP-seq data using BedTools?!?!??!

Intersecting Roadmap’s Histone ChIP-seq data using BedTools?!?!??! 0 Hi all, I want to prioritise my list of 112 SNPs by looking at those that lie within open chromatin regions and/or active promoter/enhancer histone marks In order to do that I have downloaded several Histone modification ChIP-seq data from Roadmap epigenomics…

Continue Reading Intersecting Roadmap’s Histone ChIP-seq data using BedTools?!?!??!

4 Dead Infants, a Convicted Mother, and a Genetic Mystery

That evening she wrote an email to Folbigg’s lawyer and said she was in. As she dug into the investigation, she assumed that her scientific work would help guide the legal system closer to the truth. She had no idea that over the course of two all-consuming years, she would…

Continue Reading 4 Dead Infants, a Convicted Mother, and a Genetic Mystery

Create junctions from Bed file for IGV visualization

Create junctions from Bed file for IGV visualization 0 Any advice for creating junctions file from a bed-like file? My bed file looks like this: chr start end chr star end I have tried to copy the format used in TopHat (junctions file). But I can’t see the junctions in…

Continue Reading Create junctions from Bed file for IGV visualization

Merging genotyping array VCFs and then running kinship analysis

Merging genotyping array VCFs and then running kinship analysis 0 Hello, I have about 4200 array genotyping VCFs (from the Illumina Infinium CoreExome-24 Kit) and I have merged them using bcftools merge. The chip has 500K exonic SNPs. These are trio data – which means 1700 of them are probands,…

Continue Reading Merging genotyping array VCFs and then running kinship analysis

One-hot encoding for PLINK or VCF

One-hot encoding for PLINK or VCF 0 I want to write an autoencoder for SNP data. Is there an established way to one-hot-encode binary PLINK or VCF input? I believe that can be done by manipulating PLINK’s bed file but am afraid to do something wrong. By one-hot encoding I…

Continue Reading One-hot encoding for PLINK or VCF

find positions of a short sequence in a genome

Here’s a demo Python script you can modify for your use, which suggests the rough principle: #!/usr/bin/env python import sys import re bed = “””chr1t0t10tABCDEFGHIJ chr1t5t15tFGHIJABCDO chr1t10t20tABCDOPABCD””” string_to_match = sys.argv[1] pattern = re.compile(string_to_match) for line in bed.split(“n”): (chr, start, stop, id) = line.split(“t”) for match in pattern.finditer(id): sys.stdout.write(“t”.join([chr, str(int(start) +…

Continue Reading find positions of a short sequence in a genome

Plink –merge-list only outputting fam

Plink –merge-list only outputting fam 0 I am attempting to merge Plink files towards an algorithm I am using (CookHLA). I have already made the bed/bim/fam files for my vcf files. Now, I want to merge them into a single file so I can progress with the algorithm. To do…

Continue Reading Plink –merge-list only outputting fam

Count 5’End Mapped To A Specific Genomic Position

Count 5’End Mapped To A Specific Genomic Position 7 I got several SAM/BAM files, and I am interested in 5’ends of the mapped reads. Is there any tools or scripts to count how many 5’ends are mapped at a specific genomic position? N.B. I am not try to count the…

Continue Reading Count 5’End Mapped To A Specific Genomic Position

P-values far too high for quantitative regenie phenotype

P-values far too high for quantitative regenie phenotype 0 Hi all, I’m having some trouble running regenie (v2.2.4) on a quantitative phenotype for a large cohort. I’m testing a standard height GWAS with heights rounded to the nearest integer. I’ve tried a few different tests to see where the issue…

Continue Reading P-values far too high for quantitative regenie phenotype

How to extract genomic upstream region of a protein identified by its NCBI accession number?

How to extract genomic upstream region of a protein identified by its NCBI accession number? 1 I have a list of NCBI protein accession numbers. I would like to extract out the upstream genomic region of the corresponding gene’s nucleotide sequence. I will be thankful to you if you can…

Continue Reading How to extract genomic upstream region of a protein identified by its NCBI accession number?

Changing the sample IDs of a bed/bim/fam PLINK fileset

Changing the sample IDs of a bed/bim/fam PLINK fileset 0 Hi everyone, I am working with a genotype set that is not identified with the samples IDs that I want. However, I do have a lookup table which I can use in R to get the right identification when I…

Continue Reading Changing the sample IDs of a bed/bim/fam PLINK fileset

Aro Biotherapeutics hiring Investigator, Genetics & Bioinformatics in Philadelphia, Pennsylvania, United States

About Aro BioTx Join the team at Aro Biotherapeutics creating breakthrough biotherapeutics based on Centyrin oligonucleotide conjugates. Centyrins are small protein domains based on the fibronectin domains of human Tenascin C that combine the affinity and specificity properties of antibodies with the stability and tissue penetration properties of small molecules….

Continue Reading Aro Biotherapeutics hiring Investigator, Genetics & Bioinformatics in Philadelphia, Pennsylvania, United States

bedtools getfasta concatenating sequences

bedtools getfasta concatenating sequences 0 Hi, I have a bed file containing exons of the genes. the name field is specified with name of the gene like (ENSG***). when I run bedtools getfasta I get the sequences of each exon separately. is there a standard way in order to concatenate…

Continue Reading bedtools getfasta concatenating sequences

Bedtools: Merging Many Bed Files

Bedtools: Merging Many Bed Files 2 I am using the algorithm CookHLA for my research. As part of its preparation, I need to feed it a bed file representing at least 100 of my samples. I have made the bed files for 500 samples using samtools and bedtools in a…

Continue Reading Bedtools: Merging Many Bed Files

Create combined CpG and non-CpG bedgraphs for DNA methylation using Bismark

Create combined CpG and non-CpG bedgraphs for DNA methylation using Bismark 0 Hello, previously, I was using Bismark with the ‘–comprehensive’ option to generate the individual bedgraph files from which we made the bigWigs. For one particular figure, my boss wants me to generate merged bdg files so that we…

Continue Reading Create combined CpG and non-CpG bedgraphs for DNA methylation using Bismark

How to transform a whole-genome callset into whole-exome callset?

How to transform a whole-genome callset into whole-exome callset? 0 Hi all, I have a callset from whole-genome data and with this callset, I want to transform it into exome callset by extracting the variants using a exome target interval. I obtained two exome target list, one from 1KG project…

Continue Reading How to transform a whole-genome callset into whole-exome callset?

Submit sequence data to NCBI

Data provision and standards. GEO sequence submission procedures are designed to encourage provision of MINSEQE elements: Thorough descriptions of the biological samples under investigation, and procedures to which they were subjected. Thorough descriptions of the protocols used to generate and process the data. Request updates to accessioned records per the…

Continue Reading Submit sequence data to NCBI

How to pipe awk of bed file into samtools to extract fasta sequences?

How to pipe awk of bed file into samtools to extract fasta sequences? 1 I have a bed file (seq.bed) that contains “queryID queryStart queryEnd”. Following is the example (the content of seq.bed file). SRR5892231.6 28 178 SRR5892231.7 4 307 SRR5892231.7 16 307 SRR5892231.9 216 408 I would like to…

Continue Reading How to pipe awk of bed file into samtools to extract fasta sequences?

How can I find reads for specific elements in a bam file?

Hi, I have a specific set of 1,009 elements in a bed file that I am interested in. I also have bam files which I would like to process to know the number of reads for these specific elements (for comparison purposes). I understand some simple uses of samtools commands,…

Continue Reading How can I find reads for specific elements in a bam file?

Haplotype divergence supports long-term asexuality in the oribatid mite Oppiella nova

Significance Putatively ancient asexual species pose a challenge to theory because they appear to escape the predicted negative long-term consequences of asexuality. Although long-term asexuality is difficult to demonstrate, specific signatures of haplotype divergence, called the “Meselson effect,” are regarded as strong support for long-term asexuality. Here, we provide evidence…

Continue Reading Haplotype divergence supports long-term asexuality in the oribatid mite Oppiella nova

Question about ROH analysis by Plink 1.9

Hi all, I have recently tried to estimate runs of homozygosity (ROH) from my vcf file by using plink 1.9. I ran following code to generate binary files that plink required: plink –vcf myfile.vcf –make-bed –out out_name –no-sex –no-parents –no-fid –no-pheno –allow-extra-chr This vcf file only contains one individual and…

Continue Reading Question about ROH analysis by Plink 1.9

Database for Enhancers with Coordinates

Database for Enhancers with Coordinates 4 Can anyone recommend some good databases for extracting bed files with enhancer coordinates. I have used UCSC in the past, I was hoping to find some alternatives ChIP-Seq genome • 163 views • link updated 11 minutes ago by Papyrus &starf; 1.3k • written…

Continue Reading Database for Enhancers with Coordinates

Visulization of raw 4C-seq reads in UCSC

Visulization of raw 4C-seq reads in UCSC 1 I’m trying to create bedGraph files to view raw and normalised reads from a 4C-seq experiment to view in UCSC for two biological replicates. Is there a simple way to do this? I’ve tried using bamCoverage and expected to get peaks for…

Continue Reading Visulization of raw 4C-seq reads in UCSC

How to download BED file with all the fields?

How to download BED file with all the fields? 2 Hello, my goal : to download a certain BED file from ucsc website that contains all these fields: bin chrom chromStart chromEnd name score strand signalValue pValue qValue peak I will describe my actions and my problem: – I go…

Continue Reading How to download BED file with all the fields?

how to to download a BED file from ucsc to directory using linux

how to to download a BED file from ucsc to directory using linux 2 Hello, my goal : to download a BED file as desribed here to my directory using linux commands . in the meantine, I am trying to download the wanted file directly in the following way: my…

Continue Reading how to to download a BED file from ucsc to directory using linux

Why may BOLT-LMM and SAIGE (quantitative, linear-mixed model) yield different results when ran on the absolutely the same dataset?

As a validation experiment, I have run the same GWAS of a quantitative phenotype derived from the UKBiobank, alongside the genomic data from the UKBiobank, once using the program BOLT-LMM and once using SAIGE linear mixed model (with selected quantitative trait tag). I wanted to see if the results would…

Continue Reading Why may BOLT-LMM and SAIGE (quantitative, linear-mixed model) yield different results when ran on the absolutely the same dataset?

Agilent Sure Select .bed file v6 and v8

Agilent Sure Select .bed file v6 and v8 0 Hi, I would like to differentiate between the Agilent SureSelect .bed file version v6 & v8. The release notes is unclear and couldn’t arrive at any inference. I would like to differentiate and get the genomic regions and gene names as…

Continue Reading Agilent Sure Select .bed file v6 and v8

Dnaman software manual

Dnaman software manual DNADynamo DNA Sequence Analysis SoftwareCLC Genomics Workbench – Qiagen They have sent a man to locate the key. He was making notes with a slim gold pen on a Gucci pad? When I married my husband, that all…

Continue Reading Dnaman software manual

How to get the nucleotide sequence through ORF information?

How to get the nucleotide sequence through ORF information? 0 I have a file with ORF information, including the start position and end position on the chromosome. At first I wanted to create a bed file, and then use the getFastaFromBed of bedtools to get the sequence. But I found…

Continue Reading How to get the nucleotide sequence through ORF information?

Exon coordinates and sequence

I did it like that: 1- Download refGene.txt.gz and hg19.fasta from the UCSC goldenpath. ( note: convert hg19.2bit to hg19.fa using twoBitToFa ) 2- Create a bed file with exon coordiniate using my awk script // to_transcript.awk BEGIN { OFS =”t” } { name=$2 name2=$13 sens = $4 ==”+” ?…

Continue Reading Exon coordinates and sequence

SNP exon region UCSC

SNP exon region UCSC 2 how i can get SNP in only exons regions genome with UCSC? UCSC get the all SNP of gene region, and there is no filter option to get only exon region. tx ucsc SNP exon • 245 views • link updated 2 hours ago by…

Continue Reading SNP exon region UCSC

Phenotype file for eQTL analysis using GEMMA

Phenotype file for eQTL analysis using GEMMA 0 Hello All, I appreciate it if someone could direct me in this regard. I am running eQTL analysis using GEMMA software. I have corrected the expression file with all samples (280 samples) and the genotype file is (170). I have a couple…

Continue Reading Phenotype file for eQTL analysis using GEMMA

Answer: Highly mapped to introns

I think your problem is that your bed file doesn’t match the genome/gtf you used. I think it’s too old. My $gtf is the version 104 one like yours. zcat hg19_Ensembl_gene.bed.gz | head chr1 **66999065** 67210057 **ENST00000237247** 0 + 67000041 67208778 0 27 25,123,64,25,84,57,55,176,12,12,25,52,86,93,75,501,81,128,127,60,112,156,133,203,65,165,1302, 0,863,92464,99687,100697,106394,109427,110161,127130,134147,137612,138561,139898,143621,146295,148486,150724,155765,156807,162051,185911,195881,200365,205952,207275,207889,209690, grep ENST00000237247 $gtf 1 havana…

Continue Reading Answer: Highly mapped to introns

Remove related samples using plink

Remove related samples using plink 0 Hi, I generated pairwise IBD (PI_HAT) using plink1.9 –genome option. I have >200,000 samples, so I used –parallel and combined the sub files using cat. Is there a way to remove related samples using the output file .genome.gz ? I read about –rel-cutoff but…

Continue Reading Remove related samples using plink

Highly mapped to introns

Highly mapped to introns 0 Hi, I am analyzing RNA-seq data from human blood samples. I checked the read distribution using RSeQC read_distribution after mapping by STAR. Usually, I get more than 80% of reads mapped to exons. However, at this time, the result showed only several % were mapped…

Continue Reading Highly mapped to introns

Phasing with SHAPEIT

Edit June 7, 2020: The code below is for pre-phasing with SHAPEIT2. For phased imputation using the output of SHAPEIT2 and ultimate production of phased VCFs, see my answer here: A: ERROR: You must specify a valid interval for imputation using the -int argument, So, the steps are usually: pre-phasing…

Continue Reading Phasing with SHAPEIT

Intersecting compressed gVCF with bed file

Intersecting compressed gVCF with bed file 1 This may be a ridiculously simple question to ask but, I have a compressed genomic VCF file generated by the Strelka germline variant caller, with lines like the following, where no variation was detected: chr1 27394730 . T . . PASS END=27394756;BLOCKAVG_min30p3a GT:GQX:DP:DPF:MIN_DP…

Continue Reading Intersecting compressed gVCF with bed file

Produce PCA bi-plot for 1000 Genomes Phase III

Note1 – Previous version: Produce PCA bi-plot for 1000 Genomes Phase III in VCF format (old) Note2 – this data is for hg19 / GRCh37 Note3 – GRCh38 data is available HERE The tutorial has been updated based on the 1000 Genomes Phase III imputed genotypes. The original tutorial was…

Continue Reading Produce PCA bi-plot for 1000 Genomes Phase III

Merge regions in bedgraph file

Merge regions in bedgraph file 1 I have a bedgraph file with the chromosome, start and end point, and the coverage: CM000994.3 10167710 10167711 95 CM000994.3 10167718 10167720 95 I want to merge regions that are close together. With a bed file I could use something like this: bedtools merge…

Continue Reading Merge regions in bedgraph file

merge chipseq peaks with bedtools/other tool

# this should do it, concatenate peak locations in all peaks, sort them and merge cat A B C …. | sort -k1,1 -k2,2n | mergeBed -i stdin > locations.bed To know which files the peaks co-ordinates are merged from, you need to have an identifier in each file before…

Continue Reading merge chipseq peaks with bedtools/other tool

Feature selection

Feature selection 0 Hello, I am starting out with bulk ATAC data as bed files that include the read counts. I want to use this data for a package called MOFA, which requires these preprocessing steps: Normalisation: For count-based data such as RNA-seq or ATAC-seq we recommend size factor normalisation…

Continue Reading Feature selection

Tau t gromacs manual

Tau t gromacs manual TAU at TACC – TACC User PortalBiomolecules | Free Full-Text | Electrostatics of Tau GN Drive Tau | The Gundam Wiki | Fandom Kirk took a break and sent out for coffee when a patrolman brought him…

Continue Reading Tau t gromacs manual

The result of plink –freq is filled with NA

The result of plink –freq is filled with NA 0 I downloaded the vcf file. Then I used plink to convert it to a bed file and calculated the array frequency. However, the result of plink –freq was filled with NA. Can anyone give us an opinion? command ① ./plink –vcf…

Continue Reading The result of plink –freq is filled with NA

Extracting exons and transcripts from gff3/gtf

I was just doing something similar about a week ago. You may be able to accomplish this using the GenomicFeatures R package. First load up the following in R: library(GenomicFeatures) library(GenomicRanges) library(rtracklayer) Then you will need to get the chromosome sizes file, which you can generate with directions from this…

Continue Reading Extracting exons and transcripts from gff3/gtf

How to obtain zero-based coordinates read depth using bedtools coverage for a specific region?

Disclaimer: I may use coverage and ‘mean read depth’ interchangeably in this post. I’m refering to the average, per-base read depth. I’m running and compaing some mean coverage estimates for some specific bed regions on my bam files using bedtools; however, I’m having trouble finding the correct way to do…

Continue Reading How to obtain zero-based coordinates read depth using bedtools coverage for a specific region?

Quick Way To Combine Two Datasets Using Only Common Markers

Quick Way To Combine Two Datasets Using Only Common Markers 6 Is there a quick way to combine two datasets so that only the common markers are kept? Currently, if I have two datasets, I have to first get the intersection of the two BIM/MAP files, then extract those markers…

Continue Reading Quick Way To Combine Two Datasets Using Only Common Markers

MAPQ (Mapping quality) of 0 for most reads from BWA-MEM2 (with no secondary alignment or other apparent reason)

Hello, I got a very weird output from BWA-mem2 – most of the reads have mapping quality of 0, even though there is no secondary alignment or anything else suspicious. I got sequencing data that was aligned with Novoalign to hg18, the data was bam files. I needed to realign…

Continue Reading MAPQ (Mapping quality) of 0 for most reads from BWA-MEM2 (with no secondary alignment or other apparent reason)

Gromacs 4.5.4 manual

Gromacs 4.5.4 manual OntheStabilityofNegativelyChargedPlatelets inCalcium g_energy(1) [debian man page] Gromacs User Manual Version 4.6The defence of Königsberg had cost the lives of 42,000 German soldiers and 25,000 civilians. He yanked the seaman out and shouted at him to report straight to…

Continue Reading Gromacs 4.5.4 manual

read count to gene

read count to gene 0 I am using this command to get read counts to gene by using the bedtools intersect. samtools view -Shu -q10 -@ 20 UE-2955-CMLib12_sorted.bam | bedtools intersect -c -a GCA_900659725.1_ASM90065972v1_genomic.gff -b stdin > UE-2955-CMLib{i}_intersect_counts2.bed The command work for other files but not for one file. Which…

Continue Reading read count to gene

How to get Read Counts from MACS2 output files?

How to get Read Counts from MACS2 output files? 0 Hello, I am working with GEO datasets that supply both bigwig(bw) and bed files for each ATAC sample. I need the read counts/pile up value for downstream analysis, but the 6+4 narrow peak file format from MACS2 does not include…

Continue Reading How to get Read Counts from MACS2 output files?

Samtools Depth Option For More Than One Bam Files

Samtools Depth Option For More Than One Bam Files 1 Hi everyone, I’ve been stuck on this for several days. I want to use the samtools depth command but not only for a single bam file. I need to find a way to include all my bam files downloaded in…

Continue Reading Samtools Depth Option For More Than One Bam Files

Normalization and differential analysis in ATAC-seq data

Normalization and differential analysis in ATAC-seq data 2 Hello everyone! I would like to know if someone had experiences with normalization and differential expression on ATAC-seq data. After using MACS2 for the peak calling, how can we use Dseq2 or EdgeR on these datas? Someone try this? What is the…

Continue Reading Normalization and differential analysis in ATAC-seq data

How is better perform the analyze the somatic mutations? (the mutations of my interest gene)

How is better perform the analyze the somatic mutations? (the mutations of my interest gene) 0 Hi all, I have 14 interest proteins and want to know how does their genetic status change during cancer (which somatic mutations occur in their genes?). To this aim, I started analysis on the…

Continue Reading How is better perform the analyze the somatic mutations? (the mutations of my interest gene)

GIAB Benchmark (High Confidence) Bed Filles

GIAB Benchmark (High Confidence) Bed Filles 0 Hi all, I havent used Genome in a Bottle for a couple of years. When I did use it, I recall I would download samples in VCF format for: AshkenaziTrio (three each) NA12878 (only one) ChineseTrio (three each) I would then download what…

Continue Reading GIAB Benchmark (High Confidence) Bed Filles

List of codon numbers in a panel

List of codon numbers in a panel 1 Hello everyone!! My group sequenced multiple hotspots of a panel of genes. Now they want me to create a list with what it is sequenced, with the gene name, the exon and the codon number. I have reached the point to know…

Continue Reading List of codon numbers in a panel

GREAT genome ontology top hits criteria?

GREAT genome ontology top hits criteria? 0 Hi, when running a GREAT analysis of a bed file using the default settings, what are the criteria by which the top few hits are shown in each of the gene ontology categories? For example, in ‘mouse phenotype single KO’ I get 8…

Continue Reading GREAT genome ontology top hits criteria?

GREAT difference between ‘mouse phenotype’ and ‘mouse phenotype single KO’

GREAT difference between ‘mouse phenotype’ and ‘mouse phenotype single KO’ 0 Hi, I’ve been trying to find information on what the difference is between the ‘mouse phenotype’ and ‘mouse phenotype single KO’ categories that are generated in GREAT genome ontology analyses but haven’t been able to find any detailed explanation….

Continue Reading GREAT difference between ‘mouse phenotype’ and ‘mouse phenotype single KO’

Get rsID for a list of SNPs in an entire GWAS sumstats file

Here is a fairly efficient way to do this; assuming hg38 and BEDOPS and standard Unix tools installed. $ bedmap –echo –echo-map-id –delim ‘t’ <(awk ‘{n=split($0,a,/[:_]/); print “chr”a[1]”t”a[2]”t”a[2]+1″t”a[3]”https://www.biostars.org/”a[4];}’ sumstats.txt | sort-bed -) <(wget -qO- hgdownload.cse.ucsc.edu/goldenPath/hg38/database/snp150.txt.gz | gunzip -c | cut -f2-5 | sort-bed -) > answer.bed This gets around making…

Continue Reading Get rsID for a list of SNPs in an entire GWAS sumstats file

Fasta.fai file error

Fasta.fai file error 0 Hi, I have been struggling with an error in bedtools intersect. The command I am trying to run is as follows bedtools intersect -a sorted.vcf -b nstd166.GRCh38.variant_call_chr.vcf.gz -wo -sorted -f 0.8 -r -g Homo_sapiens_assembly38.fasta.fai For some of the files that I am assessing, I don’t get…

Continue Reading Fasta.fai file error

How To Uncompress The 1000 Genome Vcf.Gz File

How To Uncompress The 1000 Genome Vcf.Gz File 2 Hello, Can somebody tell me how to uncompress 1000 Genome vcf.gz files? I am performing an RNA-editing analysis and would like to substract annotated SNPs/INDELs. I have already done so using dbSNP data with bedtools instersect, but am still stuck with…

Continue Reading How To Uncompress The 1000 Genome Vcf.Gz File

How to get the sequence differences between multiple bacterial genomes

How to get the sequence differences between multiple bacterial genomes 1 I am working on some closely related bacterial species (complete genomes from NCBI). I would like to extract the sequence differences between them. To be more specific, I want to find unique sequences (50 -100 nts) in each of…

Continue Reading How to get the sequence differences between multiple bacterial genomes

Tool for calculating base-level error rate in WGS.

Tool for calculating base-level error rate in WGS. 0 I am seeking a tool to calculate substitution, insertion, deletion error rates at a per-base sequenced level. It has to take a genomic region bed file and vcf files since I don’t want to count germline variants and would like to…

Continue Reading Tool for calculating base-level error rate in WGS.

converting multiple bam files to bed

Bedtools – converting multiple bam files to bed 1 Hi all, I have previous experience in R, but since some months ago I am trying new things in Python (JupyterLab). I have a a directory with different files. Some of them are ‘.bam’ files. My objective is to obtain ‘.bed’…

Continue Reading converting multiple bam files to bed

“intersectBed” does not appear to be installed or on the path, so this method is disabled. Please install a more recent version of BEDTools and re-import to use this method

from keras.layers import Conv2D from keras.layers import AveragePooling2D from janggu import inputlayer from janggu import outputconv from janggu import DnaConv2D from janggu.data import ReduceDim # load the dataset which consists of # 1) a reference genome REFGENOME = resource_filename(‘janggu’, ‘resources/pseudo_genome.fa’) # 2) ROI contains regions spanning positive and negative examples…

Continue Reading “intersectBed” does not appear to be installed or on the path, so this method is disabled. Please install a more recent version of BEDTools and re-import to use this method

Diffbind adding my own consensus peaks

I’m not entirely sure what you are actually trying to do. Are you trying to add this peakset to use as a consensus peakset for counting? If so you can specify it as a parameter to dba.count() by setting peaks=merged_narrowPeak_sorted without having to load it to look like a sample…

Continue Reading Diffbind adding my own consensus peaks

comparing variants between two VCF files

comparing variants between two VCF files 1 I have two VCF files (e.g. SV1.vcf.gz, SV2.vcf.gz) and a bed file (reg.bed). I would like to compare the variants among them in the BED regions. The comparison includes the common variants and unique variants present in SV1 and SV2. I am currently…

Continue Reading comparing variants between two VCF files

Can I use the summits.bed from MACS2 on HOMER

Can I use the summits.bed from MACS2 on HOMER 1 I understand that to run HOMER you need BED files, so could I use the BED output file from “macs2 callpeak” to run “findMotifsGenome.pl”? Homer Macs callpeak bed • 40 views • link updated 2 hours ago by seidel 8.3k…

Continue Reading Can I use the summits.bed from MACS2 on HOMER

how to get exon regions for this gene w/ build 19 coordinates

how to get exon regions for this gene w/ build 19 coordinates 1 I am trying to get the build 19 equivalent of this table of exons for a gene of interest from here: 22 42126574 42126752 22 42126851 42126992 22 42127447 42127634 22 42127842 42127983 22 42128174 42128350 22…

Continue Reading how to get exon regions for this gene w/ build 19 coordinates

Solvuu hiring Bioinformatics Engineer in United States

Summary At Solvuu, we are building technology to revolutionize bioinformatics and data science. We are seeking an accomplished, self-motivated and ambitious bioinformatics engineer with a strong track record in developing, executing, and maintaining bioinformatics pipelines on AWS for biotech R&D. The successful candidate will have the opportunity to drive and…

Continue Reading Solvuu hiring Bioinformatics Engineer in United States

Accepted bedtools 2.30.0+dfsg-2 (source) into unstable

—–BEGIN PGP SIGNED MESSAGE—– Hash: SHA512 Format: 1.8 Date: Thu, 02 Sep 2021 06:54:44 +0200 Source: bedtools Architecture: source Version: 2.30.0+dfsg-2 Distribution: unstable Urgency: medium Maintainer: Debian Med Packaging Team <debian-med-packag…@lists.alioth.debian.org> Changed-By: Andreas Tille <ti…@debian.org> Changes: bedtools (2.30.0+dfsg-2) unstable; urgency=medium . [ Steffen Möller ] * Update metadata – indent,…

Continue Reading Accepted bedtools 2.30.0+dfsg-2 (source) into unstable

Problem with HOMER -find

Problem with HOMER -find <motif file> 0 I am trying to search for specific motifs in my set of promoters. It should be possible with the option -find *.motif I know that there are for sure some occurrences because I have found them with search function in Notepad. However HOMER…

Continue Reading Problem with HOMER -find

convert genomic bigWig file to transcriptome space

convert genomic bigWig file to transcriptome space 0 Hi all, Is anyone aware of a function to convert a bw file mapped to a genome to map to a transcriptome (of said genome), where the input would be the genomic bw file and gff/gtf/bed annotation and output a single ‘transcriptomic’…

Continue Reading convert genomic bigWig file to transcriptome space