Tag: CHR

Samtools flagstat confusing result of a merged bam file

Hi, I am a bioinformatics student and I am struggling with an issue, I had paired-end fastq files for one sample with some low-quality bases at the end and adapter contamination, so I went and I trimmed my reads with trimmomatic, it gave me 4 files that I used for…

Continue Reading Samtools flagstat confusing result of a merged bam file

Plotting date intervals in ggplot2

I have a dataset which has a bunch of date intervals (i.e. POSIXct format start dates and end dates). In the example provided, let’s say it’s each period is associated to when someone was in school or out of school. I’m interested in plotting the data in ggplot2, each row…

Continue Reading Plotting date intervals in ggplot2

Convert DNAStringSet to a list of elements in R? (Error in seq[[1]][[“seq”]] : subscript out of bounds in R)

I have a bed file which contains DNA sequences information as follow: ** track name=”194″ description=”194 methylation (sites)” color=0,60,120 useScore=1 chr1 15864 15866 FALSE 894 + chr1 534241 534243 FALSE 921 – chr1 710096 710098 FALSE 729 + chr1 714176 714178 FALSE 12 – chr1 720864 720866 FALSE 988 -…

Continue Reading Convert DNAStringSet to a list of elements in R? (Error in seq[[1]][[“seq”]] : subscript out of bounds in R)

Ubuntu Manpage: samtools reheader – replaces the header in the input file

Provided by: samtools_1.13-2_amd64 NAME samtools reheader – replaces the header in the input file SYNOPSIS samtools reheader [-iP] [-c CMD | in.header.sam ] in.bam DESCRIPTION Replace the header in in.bam with the header in in.header.sam. This command is much faster than replacing the header with a BAM→SAM→BAM conversion. By default…

Continue Reading Ubuntu Manpage: samtools reheader – replaces the header in the input file

[SOLVED] changing the order of input changes samtools merge ouput

I realized that this is a stupid mistake I have made. Since samtools do not overwrite the files by default, the output that I get from samtools merge output.bam f2.bam f1.bam wan’t what I thought it was below is my original post ++++++++++++++++++++++++++ I’m using samtool/1.9.0 and I’m trying to…

Continue Reading [SOLVED] changing the order of input changes samtools merge ouput

The Genetic Architecture of Sleep Health Scores in the UK

Introduction Sleep is a complex neurological and physiological state. It is defined as a natural and reversible state of reduced responsiveness to external stimuli and relative inactivity, accompanied by a loss of consciousness.1 Sleep disorders can be classified as seven major categories: insomnia disorders, sleep-related breathing disorders, central disorders of…

Continue Reading The Genetic Architecture of Sleep Health Scores in the UK

Overestimation of number of reads from nanopore data (flagstat)

Same issue as mentioned on the minimap2 tool: github.com/lh3/minimap2/issues/236#issue-361097444 For example nanopore reads aligned to the host transcriptome the flagstat output is: 5953480 + 0 in total (QC-passed reads + QC-failed reads) 2961480 + 0 secondary 22696 + 0 supplementary 0 + 0 duplicates 4195469 + 0 mapped (70.47% :…

Continue Reading Overestimation of number of reads from nanopore data (flagstat)

Samtools flagstat

Samtools flagstat 1 I aligned my ONT sequencing run with minimap2, subsequently I filtered the file using samtools view -b -F 256 aln_transcriptome_sorted_6.bam -o filtered_aln_transcriptome_6.bam to end up with primary alignments only. When I run samtools flagstat on the filtered file I get the following output: 3502608 + 0 in…

Continue Reading Samtools flagstat

r – Changing chr to datetime on RStudio

Want to improve this question? Update the question so it’s on-topic for Cross Validated. Closed 4 hours ago. I would like to change a chr column in my dataframe to a datetime column. Date Value 1 01/7/20 13:05 100 2 01/7/20 13:15 102 3…

Continue Reading r – Changing chr to datetime on RStudio

Regions File Format – ANGSD-wrapper/angsd-wrapper Wiki

ANGSD-wrapper prefers the regions file to be formatted as chr_name:start_position-end_position. Below, we will create a toy BED file as an example and show how we can go from BED file format to ANGSD-wrapper’s regions file format. Create toy BED file Let’s create an example BED file. You can run the…

Continue Reading Regions File Format – ANGSD-wrapper/angsd-wrapper Wiki

Monocle3 differential expression failed when active.assay is not “RNA”

after run estimate_size_factors, data with active.assay = ‘integrated’ works too, but no deg in the result. > [email protected] = ‘integrated’ > cds_raw <- as.cell_data_set(seurat_object) Warning: Monocle 3 trajectories require cluster partitions, which Seurat does not calculate. Please run ‘cluster_cells’ on your cell_data_set object > cds <- cluster_cells(cds_raw) > pr_graph_test_res <-…

Continue Reading Monocle3 differential expression failed when active.assay is not “RNA”

Failed to instantiate plugin dbNSFP in VEP

Failed to instantiate plugin dbNSFP in VEP 0 Hi Team, My VEP (version 105, installed by perl INSTALL.pl) works well. But I face some problems to use dbNSFP plugin (also installed by perl INSTALL.pl) with VEP tool. My dbNSFP version 4.2a was installed by the following code without any warning…

Continue Reading Failed to instantiate plugin dbNSFP in VEP

How to convert bedgraph file with bins into GRanges object?

You could convert your bedGraph bins from hg18 to hg19 using liftover, so you can overlap them with your peaks. You would read them into a GRanges object, then hand this to the liftover function to translate from hg18 to hg19, then unlist the results to get back a regular…

Continue Reading How to convert bedgraph file with bins into GRanges object?

laboratory jobs in germany

We wish you a good luck and have a prosperous career. Working at Labcorp | Jobs and Careers at Labcorp 15 GNeuS Postdoc Positions in Neutron Science of 24 Months Each (Full-time Job) FZJ – Forschungszentrum Jülich. What other similar jobs are there to Laboratory jobs in Germany? Clinical Laboratory…

Continue Reading laboratory jobs in germany

snakemake truncating shell codes

snakemake truncating shell codes 0 I’m trying to change the chromosome number notation from [0-9XY] to Chr[0-9XY] using the samtools reheader in the shell command of the snakemake. rule rename: input: os.path.join(config[“input”], “{sample}.bam”), output: os.path.join(config[“output”], “new_sample/{sample}_chr.bam”) log: os.path.join(config[“log”], “samtools/{sample}”) shell: “samtools view -H {input} | sed -e ‘s/SN:([0-9XY]*)/SN:chr1/’ -e ‘s/SN:MT/SN:chrM/’…

Continue Reading snakemake truncating shell codes

RStudio AI Weblog: Coaching ImageNet with R

ImageNet (Deng et al. 2009) is a picture database organized in keeping with the WordNet (Miller 1995) hierarchy which, traditionally, has been utilized in pc imaginative and prescient benchmarks and analysis. Nonetheless, it was not till AlexNet (Krizhevsky, Sutskever, and Hinton 2012) demonstrated the effectivity of deep studying utilizing convolutional…

Continue Reading RStudio AI Weblog: Coaching ImageNet with R

r – ggplot: Try to plot boxplots with geom_rect on its background, but keep having error with object “variable” not found

I was almost desperate with this error after working on this for 4 hrs, googled and looked from past posts already. Here is my data structure: str(tcga_exp) ‘data.frame’: 11775 obs. of 5 variables: $ cohort: chr “BRCA-Basal.Tumor” “BRCA-LumA.Tumor” “BRCA-LumB.Tumor” “BRCA-LumA.Tumor” … $ exp : num 6.35 5.54 6.56 5.05 5.98…

Continue Reading r – ggplot: Try to plot boxplots with geom_rect on its background, but keep having error with object “variable” not found

Create junctions from Bed file for IGV visualization

Create junctions from Bed file for IGV visualization 0 Any advice for creating junctions file from a bed-like file? My bed file looks like this: chr start end chr star end I have tried to copy the format used in TopHat (junctions file). But I can’t see the junctions in…

Continue Reading Create junctions from Bed file for IGV visualization

r – RSQlite – Find values with most occurences in group

I’m using RSQlite to import Datasets from an SQlite-Database. There are multiple millions of observations within the Database. Therefor I’d like to do as much as possible of Data selection and aggregation within the Database. At some point I need to aggregate a character variable. I want to get the…

Continue Reading r – RSQlite – Find values with most occurences in group

Doubt samtools flagstat

Doubt samtools flagstat 0 I’d like to see the percentage of how many sequences align with my decrementing sequence and I’ve come to this sample table. But wanted to know what use? The mapped (80.94% : N/A)or properly paired (0.06% : N/A) percentage? 1036193 + 0 in total (QC-passed reads…

Continue Reading Doubt samtools flagstat

find positions of a short sequence in a genome

Here’s a demo Python script you can modify for your use, which suggests the rough principle: #!/usr/bin/env python import sys import re bed = “””chr1t0t10tABCDEFGHIJ chr1t5t15tFGHIJABCDO chr1t10t20tABCDOPABCD””” string_to_match = sys.argv[1] pattern = re.compile(string_to_match) for line in bed.split(“n”): (chr, start, stop, id) = line.split(“t”) for match in pattern.finditer(id): sys.stdout.write(“t”.join([chr, str(int(start) +…

Continue Reading find positions of a short sequence in a genome

Convert SNP IDs as chr:pos:effect allele:ref allele to rsIDs

Convert SNP IDs as chr:pos:effect allele:ref allele to rsIDs 0 I have a set of 58000 SNPs for which the SNP ID is in the format of: chr:pos:effect allele:ref allele (Grch37 build), but I need to convert this to rsID where one is available for the SNP. I’ve tried using…

Continue Reading Convert SNP IDs as chr:pos:effect allele:ref allele to rsIDs

Homer finds same peak multiple times

I am using Homer to identify peaks in RNA-seq data and then determine differential expression by counting reads per peak. Homer has a lovely package that does just this: getDifferentialPeaksReplicates.pl. The issue is that for some reason Homer returns the same peak multiple times in its final output (Bonus question:…

Continue Reading Homer finds same peak multiple times

Is the Ensembl GRCh38 genome assembly more up to date than the UniProtKB online database?

Dear all, I am working with a list of Ensembl accession codes for a desired group of proteins. I have downloaded the protein annotations related to the genome assembly GRCH38. I fetched the genomic coordinates from UniProtKB API service using the Ensembl accession codes. The service provide a protein annotation…

Continue Reading Is the Ensembl GRCh38 genome assembly more up to date than the UniProtKB online database?

clusterProfiler won’t read gene list

clusterProfiler won’t read gene list 0 So I have a list of DE genes that I would like to analyse for enriched GO and KEGG terms. I was going to use clusterProfiler for this, but I can’t seem to get past constructing the gene list. I have followed the vignette…

Continue Reading clusterProfiler won’t read gene list

Forge a BSgenome data package

My supervisor has requested that I create coverage plots to visualize BAM alignments of RNA-Seq data. I though a good way to do this would be to use Gviz. We work on the model legume Medicago truncatula which does not have a BSgenome package so I though I’d try and…

Continue Reading Forge a BSgenome data package

How can I find reads for specific elements in a bam file?

Hi, I have a specific set of 1,009 elements in a bed file that I am interested in. I also have bam files which I would like to process to know the number of reads for these specific elements (for comparison purposes). I understand some simple uses of samtools commands,…

Continue Reading How can I find reads for specific elements in a bam file?

Question about ROH analysis by Plink 1.9

Hi all, I have recently tried to estimate runs of homozygosity (ROH) from my vcf file by using plink 1.9. I ran following code to generate binary files that plink required: plink –vcf myfile.vcf –make-bed –out out_name –no-sex –no-parents –no-fid –no-pheno –allow-extra-chr This vcf file only contains one individual and…

Continue Reading Question about ROH analysis by Plink 1.9

Gene coordinates for hg19

Gene coordinates for hg19 0 Hi, is there a list which gives for each gene its starting coordinate (chr:pos) and its ending one with respect to the hg19 reference genome? I have a list of positions on hg19 expressed as chr:pos and I have to assign each one to the…

Continue Reading Gene coordinates for hg19

Obtaining The Snp Rs Number With The Chromosomal Position

Obtaining The Snp Rs Number With The Chromosomal Position 3 This question is similar to this one (Get rs number based on position). I have a text file with SNPs in the chr:position format 10:71086 10:72876 10:75794 I was wondering if there is an R package (or perhaps one in…

Continue Reading Obtaining The Snp Rs Number With The Chromosomal Position

Legacy genetics of Arachis cardenasii in the peanut crop shows the profound benefits of international seed exchange

Significance A great challenge for humanity is feeding its growing population while minimizing ecosystem damage and climate change. Here, we uncover the global benefits arising from the introduction of one wild species accession to peanut-breeding programs decades ago. This work emphasizes the importance of biodiversity to crop improvement: peanut cultivars…

Continue Reading Legacy genetics of Arachis cardenasii in the peanut crop shows the profound benefits of international seed exchange

Phasing with SHAPEIT

Edit June 7, 2020: The code below is for pre-phasing with SHAPEIT2. For phased imputation using the output of SHAPEIT2 and ultimate production of phased VCFs, see my answer here: A: ERROR: You must specify a valid interval for imputation using the -int argument, So, the steps are usually: pre-phasing…

Continue Reading Phasing with SHAPEIT

Produce PCA bi-plot for 1000 Genomes Phase III

Note1 – Previous version: Produce PCA bi-plot for 1000 Genomes Phase III in VCF format (old) Note2 – this data is for hg19 / GRCh37 Note3 – GRCh38 data is available HERE The tutorial has been updated based on the 1000 Genomes Phase III imputed genotypes. The original tutorial was…

Continue Reading Produce PCA bi-plot for 1000 Genomes Phase III

Get gene names from rs SNP ids

Gene to rs id library(biomaRt) ## It might take long time to process if many genes (>50) in the list. ## hgnc_gene_symbols.txt is the file that has the list of gene symbols one per line. genes <- read.table(“~/hgnc_gene_symbols.txt”) ensembl = useMart(“ensembl”, dataset=”hsapiens_gene_ensembl”) dbsnp = useMart(“snp”, dataset = “hsapiens_snp”) getHGNC2ENSG =…

Continue Reading Get gene names from rs SNP ids

Row names and probe names does not match in topTable output

Row names and probe names does not match in topTable output 0 Hello I am using limma to analyze differential methylation on a 850k Illumina array, and set up my model as recommended by the user guide. Today I noticed after running topTable() that the rownames in the result data…

Continue Reading Row names and probe names does not match in topTable output

The result of plink –freq is filled with NA

The result of plink –freq is filled with NA 0 I downloaded the vcf file. Then I used plink to convert it to a bed file and calculated the array frequency. However, the result of plink –freq was filled with NA. Can anyone give us an opinion? command ① ./plink –vcf…

Continue Reading The result of plink –freq is filled with NA

MAPQ (Mapping quality) of 0 for most reads from BWA-MEM2 (with no secondary alignment or other apparent reason)

Hello, I got a very weird output from BWA-mem2 – most of the reads have mapping quality of 0, even though there is no secondary alignment or anything else suspicious. I got sequencing data that was aligned with Novoalign to hg18, the data was bam files. I needed to realign…

Continue Reading MAPQ (Mapping quality) of 0 for most reads from BWA-MEM2 (with no secondary alignment or other apparent reason)

Manhattan plot how to reduce space between axis and axis-labels

Manhattan plot how to reduce space between axis and axis-labels 0 Hello everyone, I have plotted the Manhattan plot via qqman in R. However, it leaves huge white space between axis ticks and axis labels and also between axis labels and axis labs. Could someone offer any tip to reduce…

Continue Reading Manhattan plot how to reduce space between axis and axis-labels

Samtools Depth Option For More Than One Bam Files

Samtools Depth Option For More Than One Bam Files 1 Hi everyone, I’ve been stuck on this for several days. I want to use the samtools depth command but not only for a single bam file. I need to find a way to include all my bam files downloaded in…

Continue Reading Samtools Depth Option For More Than One Bam Files

PRSice-2 without Ref SNP ID

PRSice-2 without Ref SNP ID 1 Does PRSice-2 support a base tile that has chromosome number/name and chromosome position instead of reference SNP ID in the base file? I’m trying to calculate PRS scores using a weights file from the PGS catalog with ~6 million variants. The file has only…

Continue Reading PRSice-2 without Ref SNP ID

Get rsID for a list of SNPs in an entire GWAS sumstats file

Here is a fairly efficient way to do this; assuming hg38 and BEDOPS and standard Unix tools installed. $ bedmap –echo –echo-map-id –delim ‘t’ <(awk ‘{n=split($0,a,/[:_]/); print “chr”a[1]”t”a[2]”t”a[2]+1″t”a[3]”https://www.biostars.org/”a[4];}’ sumstats.txt | sort-bed -) <(wget -qO- hgdownload.cse.ucsc.edu/goldenPath/hg38/database/snp150.txt.gz | gunzip -c | cut -f2-5 | sort-bed -) > answer.bed This gets around making…

Continue Reading Get rsID for a list of SNPs in an entire GWAS sumstats file

Fasta.fai file error

Fasta.fai file error 0 Hi, I have been struggling with an error in bedtools intersect. The command I am trying to run is as follows bedtools intersect -a sorted.vcf -b nstd166.GRCh38.variant_call_chr.vcf.gz -wo -sorted -f 0.8 -r -g Homo_sapiens_assembly38.fasta.fai For some of the files that I am assessing, I don’t get…

Continue Reading Fasta.fai file error

karyoploteR: uncircle your genomes

Hi all, I’d like to present karyoploteR, an R/Bioconductor package we have developed to plot any data on any genome in non-circular layouts. The goal of this project was to develop a tool as flexible as Circos, but easier to use and representing genomes as straight lines instead of circles,…

Continue Reading karyoploteR: uncircle your genomes

List of human protein coding genes with given name (known function?)

List of human protein coding genes with given name (known function?) 2 Hello, To put it simply, I am doing differential expression analysis on human RNA-seq data and I want to focus my analysis of genes that are: 1) Protein coding, so no SNOR or MIR 2) Genes with a…

Continue Reading List of human protein coding genes with given name (known function?)

vcf file analysis

vcf file analysis 0 Hello everyone, I have 22 vcf file for each chr. They were in genome build hg19 so I did a liftover and convert them to hg38 genome build. Now I need just chrom and position values from these vcf files and merge them together into a…

Continue Reading vcf file analysis

PLINK Haplotype blocks estimation not working

Hi, I am using PLINK to estimate haplotype blocks using Gabriel’s method. I am using the following command plink –file Chr$PBS_ARRAY_INDEX –noweb –all –blocks –ld-window-kb 500 And it seemed to be working just fine but when job finished no blocks were called at all. The log file does not mention…

Continue Reading PLINK Haplotype blocks estimation not working

Find overlaping sequences with pyranges from overlap

I am trying to replicate the mergeByOverlap function from R BioConductor in python using the pyranges package. In R the code would be: gr.snp <- with(gr.snp, GRanges(chr, IRanges(start, end),rsid=gr.snp$rsid)) snp.annotated <- data.frame(mergeByOverlaps(gr.snp, gencode, maxgap=2000, type=”start”)) which returns: nrow(snp.annotated) [1] 34 colnames(snp.annotated) [1] “gr.snp.seqnames” “gr.snp.start” [3] “gr.snp.end” “gr.snp.width” [5] “gr.snp.strand” “gr.snp.rsid”…

Continue Reading Find overlaping sequences with pyranges from overlap

methylation beta distribution (minfi generated)

methylation beta distribution (minfi generated) 0 Hello, I am analyzing EPIC methylation array and did necessary filtering for cross-reactive probes, common snps, excluded XY chr. ~10% of my samples cluster separately (which I am calling “outliers” for now) than the rest. Since these samples are collected from human brain with…

Continue Reading methylation beta distribution (minfi generated)

Indels statistics

Hi, I have a vcf statistics for heterozygote and homozygote cases and I would like to find matches with my maf file. The issue is that the reference field in maf file is different and it exlcudes nucleotides in alternative states, e.g. if you have a ref CAA and alternative…

Continue Reading Indels statistics

phase_trio.sh | searchcode

phase_trio.sh | searchcode PageRenderTime 24ms CodeModel.GetById 16ms app.highlight 5ms RepoModel.GetById 1ms app.codeStats 0ms /Phase/phase_trio.sh github.com/BioinformaticsArchive/fCNV Shell |…

Continue Reading phase_trio.sh | searchcode

Color label of rainfall plot drawn by KaryoploteR

You can use the standard legend() command as outlined in this issue here: support.bioconductor.org/p/124328/ Minimal example based on bernatgel.github.io/karyoploter_tutorial//Examples/Rainfall/Rainfall.html : library(karyoploteR) somatic.mutations <- read.table(file=”ftp://ftp.sanger.ac.uk/pub/cancer/AlexandrovEtAl/somatic_mutation_data/Pancreas/Pancreas_raw_mutations_data.txt”, header=FALSE, sep=”t”, stringsAsFactors=FALSE) somatic.mutations <- setNames(somatic.mutations, c(“sample”, “mut.type”, “chr”, “start”, “end”, “ref”, “alt”, “origin”)) somatic.mutations <- split(somatic.mutations, somatic.mutations$sample) sm <- somatic.mutations[[“APGI_1992”]] sm.gr <- toGRanges(sm[,c(“chr”, “start”, “end”,…

Continue Reading Color label of rainfall plot drawn by KaryoploteR

Rscript match

Rscript match 0 I have two dataframe. One is vcf. Its content is : ** head(vcf) X.CHROM POS ID CHROM_POS 1 chr1 100000421 rs1047982323 chr1_100000421 2 chr1 100000827 rs1375386196 chr1_100000827 3 chr1 100001753 rs866745787 chr1_100001753 4 chr1 100001904 rs1416462966 chr1_100001904 5 chr1 100002334 rs1220478954 chr1_100002334 6 chr1 100002490 rs181634796 chr1_100002490**…

Continue Reading Rscript match

bash script

bash script 3 Hello everyone, I have a file like this: RSID1 RSID2 chr1_169894240_G_T_b38 chr1_169894240_G_T_b38 chr1_169894240_G_T_b38 chr1_169891332_G_A_b38 chr1_169891332_G_A_b38 chr1_169891332_G_A_b38 chr1_169661963_G_A_b38 chr1_169661963_G_A_b38 chr1_169661963_G_A_b38 chr1_169697456_A_T_b38 chr1_169697456_A_T_b38 chr1_169697456_A_T_b38 chr1_27636786_T_C_b38 chr1_27636786_T_C_b38 chr1_196651787_C_T_b38 chr1_196651787_C_T_b38 chr6_143501715_T_C_b38 chr6_143501715_T_C_b38 I want to extract info just like: chr1_169894240 chr1_169894240. I don’t want to have other info. I just want…

Continue Reading bash script

spots not filling the whole tissue image

Issue with Seurat SpatialPlot: spots not filling the whole tissue image 0 In Seurat, SpatialPlot generates a plot with an enlarged/expanded image of tissue section as compared to the original spot image. This seems to happen on the relatively small image with a number of spots around 500. I ‘d…

Continue Reading spots not filling the whole tissue image

How To Filter Mapped Reads With Samtools

Hi, You get a bam (machine readable sam) file after mapping, and it contains information about mapped and unmapped reads. To get the unmapped reads from a bam file use: samtools view -f 4 file.bam > unmapped.sam the output will be in sam to get the output in bam, use:…

Continue Reading How To Filter Mapped Reads With Samtools

working with .gmt files

working with .gmt files 3 Hi! I have downloaded a pathway data set in .gmt format form the GSEA website. I’m wondering how can I properly read this data set in R. Could anyone help me? Thank you!   myposts • 9.5k views • link updated 2 hours ago by…

Continue Reading working with .gmt files

Looking up Gene IDs in R

Looking up Gene IDs in R 1 Hello, Given a list of gene names, I need to create a table containing the Ensemble ID, chromosome, start, end of that gene. Example: ## ens_id gene view chr start end ## 1: ENSG00000243485 MIR1302-2HG Gene Expression chr1 29553 30267 ## 2: ENSG00000237613…

Continue Reading Looking up Gene IDs in R

Platypus

Platypus 0 Hi, I’m super new to WGS and bioinformatics, but I’m a classic software data scientist, so I know enough to be annoying. I’m using Platypus too call variants on 100X WGS via Nebula Genomics. I found an odd series of calls and am not sure if this is…

Continue Reading Platypus

IOError [errno 2] No such file or directory: ‘-o’

TopHat error: IOError [errno 2] No such file or directory: ‘-o’ 2 Hello everyone. I’m now running tophat in our server. First, I just simply tried “tophat -p 8 -G <gtf_file> <ref_genome> <fa1><fa2>” and it worked. Then I wrote a for loop scripts but It reported error: [2019-04-03 10:49:16] Beginning…

Continue Reading IOError [errno 2] No such file or directory: ‘-o’

Parsing snp result

Parsing snp result 1 I am trying to parse dbSNP results into data frame in python, I got the result as “bytes” and I wonder if there is a way to parse it into dataframe. I tried multiple xml packages (xml, lxml) but they are not able to separate the…

Continue Reading Parsing snp result

How to properly combine two bam files of a paired-end data

How to properly combine two bam files of a paired-end data 3 Hi all! I am mapping a paired-end read separately using bowtie2. After that, I want to combine the two bam file into one for downstream analysis. How to properly do this combination? I tried: samtools sort -n R1.bam…

Continue Reading How to properly combine two bam files of a paired-end data

PLINK ASSOC understanding the results

PLINK ASSOC understanding the results 1 Hello to all, I have 10 vcf files – 5 female fish and 5 male fish I have merged all 10 fish to one vcf file.(all_fish.vcf) I performed the plink association analysis on all 10 fish with the command: -noweb –const-fid –allow-no-sex –allow-extra-chr –pheno…

Continue Reading PLINK ASSOC understanding the results

How to set variant FILTER in a VCF file based on overlap with regions in a BED file

I figured out how to do the annotation using BCFTools. 2 steps are needed. Input BED file requires 1 for each region where the annotation should be set Chr_01 1000 2000 1 Chr_05 5000 6000 1 Input header file: ##INFO=<ID=BAD_REGION,Number=0,Type=Flag,Description=”My bad region for some reason”> bgzip and tabix the bed…

Continue Reading How to set variant FILTER in a VCF file based on overlap with regions in a BED file

Allele frequency calculation

Allele frequency calculation 0 Hello everyone, I use vcf tools to find AF values by using this command: vcftools –gzvcf $SUBSET_VCF –freq2 –out $OUT –max-alleles 2 The output I got from this is: chr pos nalleles nchr a1 a2 <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 22 16050408 2 846…

Continue Reading Allele frequency calculation

extract list of SNPs from multiple chr{1:22}.bgen files using plink2

extract list of SNPs from multiple chr{1:22}.bgen files using plink2 1 hello, I have extracted out list of snps based on the maf cutoff 0,,0.0001, 0.001,0.01,0.1,.55,1.0. I am running plink2 to extract this list from .bgen files for individual chromosomes using the following code plink2 –chr{1:22}.bgen –extract maf1_snps for imputed…

Continue Reading extract list of SNPs from multiple chr{1:22}.bgen files using plink2

How to separate sub-families from transposons sequence based fasta files?

How to separate sub-families from transposons sequence based fasta files? 1 I’m working on the classification of transposable elements. I want to retrieve sequences of their sub-classes in separate files. Is there any code or tool present to separate their sub-families because dataset contains thousands of sequence entries for different…

Continue Reading How to separate sub-families from transposons sequence based fasta files?

Converting an S4 object into a dataframe in R

I have an S4 object named ‘res’ which I got while using an R package called RDAVIDWebService. I can’t seem to find a way to convert this object into a dataframe in R. I tried using the function ‘as.data.frame(res)’ but it throws this error: > as.data.frame(res) Error in as.data.frame.default(res) :…

Continue Reading Converting an S4 object into a dataframe in R