Tag: CHR
where to obtain hg19 reference transcriptome in gencode
where to obtain hg19 reference transcriptome in gencode 1 Hi, I want to use salmon to generate an index and later use it again to get a table of counts that I can use for differential expression analysis with the hg19 version. For the hg38 reference genome I have had…
impute tool /imputing GTs _ generated imputed file has unexpected no. of cols
Hello, I’m trying to use impute tool to impute more snps using ref panels with my GT data, i followed impute2 tutorial, its example/toy data works fine, then I used the faster impute4 and impute5 tools for mydata as the run gets killed on impute2 with my data. anyways, the…
r – Error adding significance levels in ggplot boxplot after changing the stats table
I am trying to add significance levels to a geom_boxplot by using stats_pvalue_manual. Since I have a 12-level factor I ordered the data by mean size. I only keep the significant comparisons that are closest to each other. If I use the stats table everything works (but looks horrible). Here…
Phylogenomic analysis, cryptic species discovery, and DNA barcoding of the genus Cibotium in China based on plastome data
Introduction Traditional Chinese herbal medicine plays an indispensable role in the treatment of multiple diseases in China and other developing countries (Newman et al., 2008). Apart from the traditional usage, many medicinal plants, such as Artemisia annua L. (artemisinin, Tu, 2016), Huperzia javanica (Sw.) C. Y. Yang (Huperzine A, Zangara, 2003;…
How to calculate TPM from featureCounts output
How to calculate TPM from featureCounts output 0 I would like to find the TPM counts for the GSE102073 study. When i downloaded the raw data from GEO, the raw data are featureCounts output. First part of the file: # Program:featureCounts v1.4.3-p1; Command:”/data/NYGC/Software/Subread/subread-1.4.3-p1-Linux-x86_64/bin/featureCounts” “-s” “2” “-a” “/data/NYGC/Resources/ENCODE/Gencode/gencode.v18.annotation.gtf” “-o” “/data/analysis/LevineD/Project_LEV_01204_RNA_2014-01-30/Sample_JB4853/featureCounts/Sample_JB4853_counts.txt” “/data/analysis/LevineD/Project_LEV_01204_RNA_2014-01-30/Sample_JB4853/STAR_alignment/Sample_JB4853_Aligned.out.WithReadGroup.sorted.bam”…
r – ggplot – reorder_within – Order month
I am plotting a bar graph in RSTUDIO. However, the axes x show in different month order. Using the command reoder_within, doesn’t work for the purpose. Follow below the ME. ggplot(de, aes(fill=Cidade, y = Leitura , x = Mes ))+geom_bar(position=’dodge’, stat=”identity”) Generate the follow plot: Plot My purpose is modify…
How to extract haplotype data from phased bcf files
How to extract haplotype data from phased bcf files 1 Hello, I have filtered/processed phased bcf files from wgs. I would like to extract the haplotype data per sample, so that I have a tab delim file which looks like this: Sample Chr Pos hap1 hap2 AW23 chr1 1234 A…
How to determine the exact version of hg38 if I have only the FASTA file
How to determine the exact version of hg38 if I have only the FASTA file 1 I have a FASTA file which contains hg38 assembly. It contains the primary contigs, alt contigs, decoy, HLA, mito. How do I determine the exact version of hg38 based on the FASTA? Here some…
What Are The Most Common Stupid Mistakes In Bioinformatics?
Forum:What Are The Most Common Stupid Mistakes In Bioinformatics? 78 While I of course never have stupid mistakes…ahem…I have many “friends” who: forget to check both strands generate random genomic sites without avoiding masked (NNN) gaps confuse genome freezes and even species but I’m sure there are some other very…
A score file with chr & positions instead of SNP IDs
Hello, I am calculating PRSs using the command: plink2 \–bfile plink_file \–score score_file 1 2 3 header cols=+scoresums,-scoreavgs \–out PRS_sum Currently, the columns in my score_file are: SNP IDs, effect allele, beta weights. I was wondering if possible to give instead of SNP IDs in the first column, chrs and…
problems with MAF for MutSigCV (vcf2maf)
I am trying to run MutSIgCV and got stuck with this error: MutSigCV allsamples.md.tc.ir.br.pr.ug.dbsnp.vep.maf \ “$anno”exome_full192.coverage.txt \ “$anno”gene.covariates.txt \ my_results \ “$anno”mutation_type_dictionary_file.txt \ “$anno”chr_files_hg19 ====================================== MutSigCV v1.4 (c) Mike Lawrence and Gaddy Getz Broad Institute of MIT and Harvard ====================================== MutSigCV: PREPROCESS ——————– Loading mutation_file… Error using MutSigCV>MutSig_preprocess (line 246)…
DanMAC5: a browser of aggregated sequence variants from 8,671 whole genome sequenced Danish individuals | BMC Genomic Data
Demographics Data from three studies were included: Dan-NICAD: 1,649 individuals with symptoms of obstructive coronary artery disease, predominantly chest pain, undergoing coronary computed tomography angiography. In total, 52% were females, the mean age was 57 years (+/- 9 SD), median coronary artery calcium score were 0 [0–82] and 24% of…
ExomeDepth error in getBamCounts when adding a fasta reference
ExomeDepth error in getBamCounts when adding a fasta reference 1 Whenever I try to add a reference fasta file for computing the GC content in the GetBamCounts function: my.countsV6 <- getBamCounts(bed.frame =AgilentV6, bam.files = BAMFiles, include.chr = TRUE, referenceFasta = “data/hg19.fa” ) I get an error like this: Reference fasta…
How to plot combined bar graph in R
How to plot combined bar graph in R 1 After running the following function count_table <- table(Sample.@meta.data$seurat_clusters, Sample@meta.data$orig.ident, Sample@meta.data$singlr_labels) View(count_table) str(count_table) ‘table’ int [1:15, 1:2, 1:20] 0 0 0 0 0 0 2 0 273 0 … – attr(*, “dimnames”)=List of 3 ..$ : chr [1:15] “0” “1” “2” “3”…
PhaeoEpiView: an epigenome browser of the newly assembled genome of the model diatom Phaeodactylum tricornutum
Despite being an established model, the genome and annotation of P. tricornutum are not concordant and the existing epigenetic resources generated so far lack a well-defined framework for accurate and user friendly utilization. To address this limitation, we sought to establish a coherent resource rendering the multiple genomic and epigenomic…
Allele frequency visualization
Allele frequency visualization 2 Hello dears all, Actually I have a .frq file (allele frequency file) which It is generated from .vcf file through VCFtools and it is included by five populations (The vcf file is generated by Stacks pipeline). But the point is, I don’t know how can I…
How to add conditional annotation from other column of R dataframe in ggplot2
I’m trying to make a plot from a dataframe of over 300k rows where the peaks and valleys will be annotated from another column rather than the x and y. How can I do that..!! Dataframe : chr start end.x StoZ.x Hscore end.y StoZ.y Tier Gene 1 chr1 1 10000…
r – How to customize y-axis intervals and marks in ggplot with limited intervals and 40 unit increments?
I am trying to make a ggplot, but I am not sure how to change the y-axis intervals or show specific intervals on the y-axis. How do you make the y-axis of a ggplot go up by 40, with the min being 40 and the max being 200, but only…
count protein-coding genes per contig
count protein-coding genes per contig 1 If you have an annotation file, as for example, the following GTF from human: 1 havana gene 11869 14409 . + . gene_id “ENSG00000223972”; gene_version “5”; gene_name “DDX11L1”; gene_source “havana”; gene_biotype “transcribed_unprocessed_pseudogene”; 1 havana transcript 11869 14409 . + . gene_id “ENSG00000223972”; gene_version “5”;…
Mbedtls PK – Loading the private and public key into the same context – Generic
Hello, Is there any hack that makes it possible to load a private key into a pk context where you’ve only loaded a public key (or vise versa)? I can see, that it is possible to instantiate a single context with both, but I cannot parse both PEM files into…
Edit and re-head BAM file
Edit and re-head BAM file 0 Hi there I have a BAM file which needs to be edited and re-headed. Now, I’m aware of how to do so the problem is that for some reason the sed command I’m using does not catch the sequence I have to remove… Below,…
Removing indels in VCF file
Hello, I am trying to do something very simple, but running into confusing behaviour. I have a VCF file of multiple samples and want to remove all indels so that I can generate sequences with identical coordinates with bcftools consensus. I removed indels by specifying bcftools view –include ‘TYPE=”snp” ||…
PLINK: Whole genome data analysis toolset – dbSNP
1. Introduction 2. Basic information 3. Downloads and general notes 4. Command reference table 5. Basic usage/data formats 6. Data management 7. Summary stats 8. Inclusion trim 9. Population stratification 10. IBS/IBD estimation 11. Bond 12. Family-based community 13. Permutation process 14. LD calculations 15. Multimarker tests 16. Conditional haplotype…
–update-name not working – possibly because of comma delim?
Hi there, I have two separate sets of gwas data (‘A’ and ‘B’) with a file per chr. I am running a targeted gene/pathway analysis so I have filtered out snps per each gene location (209 snps = 209 filtered gene-specific files). For dataset A I ran the following commands:…
KING kinship inference error
KING kinship inference error 2 Hello, I’m trying to use KING for estimating kinship between individuals. I have a vcf file that resulted from a de novo assembly of ddrad seq data. First, I tried to convert my vcf file to .bed file using Plink2.0, but it failed saying that…
Genetic immune escape landscape in primary and metastatic cancer
Inference of HLA-I tumor status with LILAC Inference of the correct HLA-I tumor status is fundamental to identifying GIE alterations (Fig. 1a), to estimate the neoepitope repertoire and burden and to predict the response to immune checkpoint inhibitors14,15 (ICIs). We have developed LILAC, a framework that performs HLA-I typing for…
A draft human pangenome reference
Sample selection We identified parent–child trios from the 1KG in which the child cell line banked within the NHGRI Sample Repository for Human Genetic Research at the Coriell Institute for Medical Research was listed as having zero expansions and two or fewer passages, and rank-ordered representative individuals as follows. Loci…
Finding TF motifs enriched in series of ATAC-seq peaks (using fimo?)
Finding TF motifs enriched in series of ATAC-seq peaks (using fimo?) 1 Hi all, I have a series of peaks located in a .txt file (chr / start / end) and would like to know if there are tf motifs enriched in each of the individual peaks. For eg, I…
How can I create more accurate admixture plot like this?
How can I create more accurate admixture plot like this? 0 I made simple vcf file processing as plink –vcf Asp_multicall2.vcf.gz –double-id –allow-extra-chr \ –set-missing-var-ids @:# \ –set-missing-var-ids @:# \ for K in 1 2 3 4 5 do admixture –cv plink.bed $K | tee log${K}.out done As you can…
Annotate vcf file using GNOMAD
Hi, I use a loop for that. Something like this to inspire you: # Enter folder where gnomAD data are here: gnomAD=”/path/to/gnomAD/database/release/3.1.2/gnomad.genomes.v3.1.2.sites.” # Enter the folder where your results are and will be annotated further cd /path/to/your/results/folder/ # Enter the name of the final results’ file from SnpSift ann=”results.ann.gnomAD.genomes.v3.1.2.vcf” #…
Will this method for genomic population structure analysis with VCF file help good for my work?
Will this method for genomic population structure analysis with VCF file help good for my work? 0 I was finding method for handling multi-calling VCF file and convert format for analyzing population genomics for fungi, which means dividing species or strains by genotype difference. So I found this site’s instruction….
Jobs lost as assets of biotech company are sold off
Administrators confirmed 30 jobs have been lost at Leeds-headquartered biotech business, 4D Pharma. Interpath, which has been handling the administration of the group, confirmed the intellectual property of 4D Pharma, including its patent library, pipeline assets, and bioinformatics platform technology, was sold to CJ Bioscience Inc (CJ). CJ is an independent…
Introduction to Computational Evolutionary Biology
R is a very flexible programming language, and it allows developers to create their own data structures (called classes) for their packages. Over the years, some packages have become so popular that the classes they use to store data are now used the “standard” representations for particular types of data….
Technical Enzymes Market Revenue Analysis, Company Revenue Share, Global Forecast till 2032
Reports And Data Technical Enzymes Market report also sheds light on supply chains and the changes in the trends of the upstream raw materials and downstream distributors. NEW YORK, NY, UNITED STATES, May 8, 2023 /EINPresswire.com/ — The global market size for technical enzymes was USD 12.1 billion in 2022…
r – unable to plot sd bars in ggplot
I was trying to insert the sd bars in geom_point in a data of decomposition rates of different time observations and species using the code below. My data have two factors time and specie and the varable “rem”. I used the aggregate function to group species and time and I…
How to read large size tab separated multiple files in R exe using SLURM?
I’m attempting to write a function that reads several tab-separated files (thousands) in SLURM via sbatch using R exe code, does some tasks, and then outputs tab-separated files. The program works effectively with files smaller than 2 GB but fails to execute when the file size exceeds 2 GB. However,…
High number of duplicates and low percentage properly paired
High number of duplicates and low percentage properly paired 0 I have some paired end sequencing data that I have trimmed using cutadapt. It was sequenced on an illumina novaseq 6000 and is low coverage RADseq data (2-3x). My cutadapt script used forward and reverse adapters from illumina : cutadapt…
Tophat: Genome indexing error
Hi everyone, I have 2 samples, paired end RNA sequence from illumina. I have done the QC and trimming. now after triiming with trimgalore I have A_R1.val_fq A_R2.val_fq B_R1.val_fq B_R2.val_fq Now I performed tophat for genome mapping. I ran this command tophat -p4 -G GRCh38_latest_genomic.fna -o tophat_result bowtie_index A_R1_val.fq but…
Obtaining TPM values from STAR alignment and counts with featurecounts using R’s tidyverse syntax (dplyr and tidyr)
Hello! I have a table of counts that I got by aligning rna seq samples with STAR and using featureCounts, and my goal is to get TPM values for each gene of the table. As a first step, I imported my table into R and modified it a bit to…
Low SNP Overlap with Michigan 1KG and TopMed reference panel
I extracted three samples (HG02024 – HG02026) from the 1000 Genomes Project’s 30x alignment files, employing the Genome Analysis Toolkit (GATK) best practice pipeline. This process involved performing base quality score recalibration, identifying and removing duplicate reads, utilizing the HaplotypeCaller to generate a genomic VCF (gVCF) file, and calling variants…
r – Add label at the top af a barplote using a diferent column – ggplot2
I have a dataframe in R #creating a dataset codigo <- paste0(“COD”, sprintf(“%03d”, 1:4)) vgene <- paste0(“VG”, LETTERS[1:8]) set.seed(123) codigo <- sample(codigos, 40, replace = TRUE) vgene <- sample(vgene, 40, replace = TRUE) datos_vgene <- data.frame(codigo, vgene) codigo vgene 1 COD001 VGG 2 COD002 VGG 3 COD003 VGC 4 COD004…
r – How to add multiple simulation plot lines to plot generated with ggplot?
A few changes to make this happen: Fix a typo, “Confidence Interval” -> “…vals” I recommend named values in scale_color_manual instead of values and labels; it should be the same, I’ve found myself in do-loops in the past where the resolution was to do this. Pivot/reshape the simPaths_df data so…
r – ggplot x-axis labels with all x values, multiple lines on geom_line
I’m plotting ggplot with geom_line. The x-axis is the YEAR, and y-axis is a continuous variable N. How can I ggplot all and of the years on the x-axis? Applying factor(YEAR) on the aesthetic doesn’t seems to work because I’m splitting by type on linetype = STATUS Also the usual…
Can’t convert .ped .map into .bim .fam .bed. Why?
(PLINK): Can’t convert .ped .map into .bim .fam .bed. Why? 0 Hi. I have a .ped and .map that I have previously created from a .vcf. Note that this VCF was NOT created by aligning reads to a reference genome, but rather by de novo (using the “STACKS” pipeline). I…
Bioconductor – Bioconductor 3.17 Released
Home Bioconductor 3.17 Released April 26, 2023 Bioconductors: We are pleased to announce Bioconductor 3.17, consisting of 2230 software packages, 419 experiment data packages, 912 annotation packages, 27 workflows and 3 books. There are 79 new software packages, 7 new data experiment packages, no new annotation packages, 2 new workflows,…
error in Genome Mepping by BWA tools in Linux
$ gmap_build -D:\btau8refflat.gtf Unknown option: D:btau8refflat.gtf -k flag not specified, so building main hash table with default 15-mers -j flag not specified, so building regional hash tables with default 6-mers gmap_build: Builds a gmap database for a genome to be used by GMAP or GSNAP. Part of GMAP package, version…
Potential segfault bug in featureCounts using long read data
Hi, I think I might have found a bug in featureCounts from Rsubread (v2.12.3). I am trying to find reads overlapping exon junctions from a personalised reference, using Nanopore long read BAMs. I am afraid I cannot share fully reproducible code as I am using my own reference, but this…
Merge the same gene and get percentage of gene from the output file of mosdepth tool
Merge the same gene and get percentage of gene from the output file of mosdepth tool 0 I have get this output by using mosdepth. It give me coverage at 1X, 5X,10X,15X ,20X , 30X. In some regions genes are repeated as you can see in the example. Can be…
Should you specify “-p” for paired end reads using featureCounts?
Should you specify “-p” for paired end reads using featureCounts? 0 I’m trying to understand whether or not I should be using the -p flag for featureCounts. Here’s the explanation: -p If specified, fragments (or templates) will be counted instead of reads. This option is only applicable for paired-end reads;…
Gnu Parallel – Parallelize Serial Command Line Programs Without Changing Them
Article describing tool (for citations): O. Tange (2011): GNU Parallel – The Command-Line Power Tool, ;login: The USENIX Magazine, February 2011:42-47. Author’s website for obtaining code: www.gnu.org/software/parallel/ All new computers have multiple cores. Many bioinformatics tools are serial in nature and will therefore not use the multiple cores. However, many…
Illumina HumanHT-12 V3.0 expression beadchip reading data
Edit November 28, 2020: Further reproducible code: A: GPL6883_HumanRef-8_V3_0_R0_11282963_A (illumina expression beadchip) — Most Illumina ‘chip’ studies that I have seen on GEO do not contain the raw data IDAT files. You can start with the tab-delimited file, but will also require the annotation file (contained in the *_RAW.tar file),…
Highly variable hearing loss due to POU4F3 (c.37del) is revealed by longitudinal, frequency specific analyses
ADSNHL is caused by a novel variant in POU4F3 (DFNA15) Targeted Sanger sequencing for known DFNA/B pathogenic variants in the NL population yielded no hits. Genome-wide genotyping and linkage analysis on one branch of the North American kindred yielded three equally positive “logarithm of the odds” scores (LOD = 2.08) suggestive of…
plink produces does pruning according to log file but prune.in is full of dots
Hi, I’m running a big SNP database (160GB vcf file) I pruned with plink; plink –vcf SNPs_clean1.vcf –allow-extra-chr –indep-pairwise 50 10 0.1 –out SNP_50_10_01 It starts running producing the temporary files and the output files prune.in and prune.out). In the log file it shows having filtered the variants, 49595998 of…
What is the best way to retrieve gene name from variant mutation?
What is the best way to retrieve gene name from variant mutation? 0 Hi all! I was wondering if any of you may suggest a clean a fast way to retrieve gene name from a list of variants that look like this: CHR_POS_REF_ALT > chr1_33333_A_C Of course I can generate…
Manifest Reference Strand
Manifest Reference Strand 0 Hi I am trying to generate a vcf from SNP array data (genomestudio). So far I can convert to plink format from genomestudio and create a vcf via plink2 commands: plink2 \ –pedmap gs-out-plink \ –make-pgen \ –sort-vars \ –merge-x \ –out test-1 then: plink2 \…
chromosome identifier not found in FASTA file, prompting ‘skipping’ error, when chr identifier is there
Hi, I have a FASTA file that looks like this: >D00829_0044_ACADL6ANXX_PEdi_Ay_VS_2:6:1214:4434:71826 TGATCTTGGCCAAGTCACTTCCTCCTTCAGGAACATTGCAGTGG >D00829_0044_ACADL6ANXX_PEdi_Ay_VS_2:1:2203:12900:37094 ACTTTGGTCAAGTCACTTCCTCCTTCAGGAACATTGCAGTGG >D00829_0044_ACADL6ANXX_PEdi_Ay_VS_2:1:2213:1470:83635 Which I created through the code below. The .bam file in question is an aDNA file from a Neanderthal (cdna.eva.mpg.de/neandertal/Chagyrskaya/BAM/). samtools bam2fq chr22.rh.bam > chag.22.fq bioawk -c fastx ‘{print “>”$name”\n”$seq}’ chag.22.fq > chag.22.fa I also…
How to generate a .raw file from vcf in plink
How to generate a .raw file from vcf in plink 1 Hi all. I am trying to convert my vcf file to a readable format in R using plink. Specifically, I want a .raw file to follow up the workflow of a recent paper. I have tried: plink –vcf myfile.vcf.gz…
Help in replicating LDSC heritability estimates
Hi, I am trying to replicate the heritability estimates based on the insomnia GWAS summary statistics using LDSC. However, I have encountered a problem as my estimates seem to be only about half of the original estimates listed in Table S1. Despite my efforts to locate the error, I have…
adding transcripts ID to the REVEL score
adding transcripts ID to the REVEL score 0 I am annotating variants for clinical diagnosis. One of the data required by my team is REVEL. We work with specific transcripts so when clinicians do variant interpretation, they need to check if our preferred transcript is among the transcripts considered. REVEL…
Convert VCF files to input file format requiered by XP-CLR1.0 software
Convert VCF files to input file format requiered by XP-CLR1.0 software 0 I attempt to do a selective sweep analysis by using the software XP-CLR1.0. My genotypic data is VCF format so wonder if someone with experience on this package could recommend any tool to convert from VCF to the…
divide coordinates into equal number of bins
First you need to convert to GRanges. gr <- paste0(coords$Chr,’:’,coords$Start,’-‘,coords$End) This should be built in to the standard GRanges package: web.mit.edu/~r/current/arch/i386_linux26/lib/R/library/GenomicRanges/html/tile-methods.html gr <- GRanges( seqnames=Rle(c(“chr1”, “chr2”, “chr1”, “chr3”), c(1, 3, 2, 4)), ranges=IRanges(1:10, end=11), strand=Rle(strand(c(“-“, “+”, “*”, “+”, “-“)), c(1, 2, 2, 3, 2)), seqlengths=c(chr1=11, chr2=12, chr3=13)) # split every…
GATK g CNV code Aborts Before Running Main Code
GATK g CNV code Aborts Before Running Main Code – Bioinformatics Stack Exchange …
ggplot2 – R: ggplot is giving a wrong and obscure graph
Your code to convert the %m-$d-%Y dates into Date format is not working the way you intended. dates <- c(“1-2-2018”, “1-3-2018”) for (i in 1:length(dates)) { dates[[i]] <- as.Date(dates[[i]], format = “%m-%d-%Y”) } str(dates) #chr [1:2] “17533” “17534” Compare to dates <- c(“1-2-2018”, “1-3-2018”) dates <- as.Date(dates, format = “%m-%d-%Y”)…
Getting chromosome of unusual chromosome names e.g. ‘CHR_HSCHR8_8_CTG1’
Getting chromosome of unusual chromosome names e.g. ‘CHR_HSCHR8_8_CTG1’ 0 I made a biomaRt query: library(biomaRt) mart = useMart(‘ensembl’, dataset=”hsapiens_gene_ensembl”) genes = getBM(attributes = c(“chromosome_name”,”start_position”, “hgnc_symbol”, “uniprot_gn_symbol”, “uniprot_gn_id”), mart = mart, values = list(“protein_coding”,c(1:22))) Most of the chromosome_name values are regular numbers 1 to 22. However, some are unusual, such as…
rs3750846 RefSNP Report – dbSNP
ALFA Allele FrequencyThe ALFA project provide aggregate allele frequency from dbGaP. More information is available on the project page including descriptions, data access, and terms of use. Release Version: 20201027095038 Help Frequency tab displays a table of the reference and alternate allele frequencies reported by various studies and populations. Table lines,…
Reading BGEN file using Hail on a Spark cluster results in corrupted matrix table – Hail Query & hailctl
I have a BGEN file that I obtained from a PGEN file via PLINK2. The command I used is as follows: plink2 –pfile my_file –export bgen-1.2 bits=8 –out /some/path –output-chr chrM and I used hl.index_bgen() to create an idx2 index. Now when I have the BGEN file and the idx2…
Which ATAC-seq SAM/BAM fields are required for bigwig generation by deeptools bamCoverage?
Which ATAC-seq SAM/BAM fields are required for bigwig generation by deeptools bamCoverage? 0 Hi, I plan to convert SAM file into BED file using bedops sam2bed tool and perform “chr start and end coordinates” data points manipulation to break di-nucleosome and tri-nucleosome data points into mono-nucleosome data points, then convert…
Annotated dendogram (tree) from tbl_graph
Annotated dendogram (tree) from tbl_graph 0 @ee934b4b Last seen 15 hours ago France Hello, I have a problem because I have little experience with the tbl_graph format and how to use it well with ggtree (or other script to draw a tree). I have created a tbl_graph (from clustree) of…
10x Cellranger SN RNA-seq BAM file output looks off
I’m working with SN RNA-seq data that I processed using Cellranger v. 7.1.0 (which uses STAR for alignment with default settings). Using the samtools flagstat command to look at a BAM file output: 284890368 + 0 in total (QC-passed reads + QC-failed reads) 0 + 0 secondary 0 + 0…
Understanding how cohesin makes DNA loops in the human genome and its role in Cornelia de Lange syndrome
NIPBL KD affects cohesin-STAG1 and cohesin-STAG2 in opposite ways. a Asynchronously growing HeLa cells mock transfected (control) or transfected with siRNA against NIPBL (NIPBL KD) were analyzed by flow cytometry 72 h later. Contour plots of the indicated proteins in control (gray plots) and NIPBL KD cells (colored plots) were overlapped…
Life Expectancy Data from Kaggle v1
The data-set related to life expectancy and health factors for 193 countries has been collected from the same WHO data repository website and its corresponding economic data was collected from United Nation website. It was collected from WHO and United Nations website with the help of Deeksha Russell and Duan…
SNP ID (rsID) to Chr no. and Position
SNP ID (rsID) to Chr no. and Position 0 Hi, I have SNP arrays with rs ID, log R and BAF values, but I don’t have the ID’s chr. number and position. The platform that were used are: GPL6433 Illumina HumanHap550 Genotyping BeadChip v1 GPL6434 Illumina HumanHap550 Genotyping BeadChip v3…
Michigan Imputation server failed job
Michigan Imputation server failed job 0 This is the first time I use Michigan Imputation Server. My imputation worked as it should on all chromosomes (1 to 22) apart from chromosome 9, where the job starts and then it fails and I don’t know why and how to fix this…
how to get genomic range for hi-c datasets in hdf5 file
how to get genomic range for hi-c datasets in hdf5 file 0 Hello everyone, i have hi-c contact map in hdf5 file. I stored the data for chr9 in a variable. But it doesn’t have genomic range. Does anyone know how to add colnames and rownames with respect to genomic…
Something wrong with my code concerning bsseq visualization
Here is my code: #### library(bsseq) library(bsseqData) ## —-showData—————————————————————– data(BS.cancer.ex) BS.cancer.ex <- updateObject(BS.cancer.ex) BS.cancer.ex pData(BS.cancer.ex) ## —-smooth,eval=FALSE——————————————————– # BS.cancer.ex.fit <- BSmooth( # BSseq = BS.cancer.ex, # BPPARAM = MulticoreParam(workers = 1), # verbose = TRUE) ## —-showDataFit————————————————————– data(BS.cancer.ex.fit) BS.cancer.ex.fit <- updateObject(BS.cancer.ex.fit) BS.cancer.ex.fit ## —-cpgNumbers————————————————————— ## The average coverage of CpGs…
TSV with multi-allelic sites to VCF
TSV with multi-allelic sites to VCF 0 Dear all, I am trying to convert a TSV output of some tools to a valid VCF. This is what a sample of the file looks like: mutID Chr Position Ref Alt nANNO N_eps N_PSI tAD 474 chr2 50282084 GT G mappable 0.293935566…
UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0x8b in position 1: invalid start byte
I use python package to load and analysis vcf.gz files, for ch in chs: vcf_to_1240K_hdf(in_vcf_path = f”/mnt/gpfs/Users/wangjincheng/aDNA/All_data/Analysis4/19.genotype_imputation/GLIMPSE_test/GLIMPSE_ligated/merged_chr{ch}.vcf.gz”, path_vcf = f”/mnt/gpfs/Users/wangjincheng/aDNA/All_data/Analysis4/19.genotype_imputation/GLIMPSE_test/ancIBD/chr{ch}.vcf”, path_h5 = f”/mnt/gpfs/Users/wangjincheng/aDNA/All_data/Analysis4/19.genotype_imputation/GLIMPSE_test/ancIBD/chr{ch}.h5″, marker_path = f”/mnt/gpfs/Users/wangjincheng/aDNA/reference/data/filters/snps_bcftools_ch{ch}.csv”, map_path = f”/mnt/gpfs/Users/wangjincheng/aDNA/reference/data/afs/v51.1_1240k.snp”, af_path = f”/mnt/gpfs/Users/wangjincheng/aDNA/reference/data/afs/v51.1_1240k_AF_ch{ch}.tsv”, col_sample_af = “”, buffer_size=20000, chunk_width=8, chunk_length=20000, ch=ch) but get this error below: UnicodeDecodeError Traceback (most recent call last)…
vegan – Input distance matrix from QIIME2 to R
I am trying to coerce a distance matrix from QIIME2 to class dist in R. Upstream processing up to the generation of the distance matrix was performed as described in the QIIME2 document “Moving Pictures”. When I call as.dist, I get the following warning messages: imported_dist_matrix <- as.dist(dist_bray_df) ## Warning…
ggplot2 – ggplot: “No non-missing arguments to min/max; returning Inf”
I’m attempting to recreate this plot (my version: lat/lon by year), but keep getting these warnings after running the ggplot code: sms2 |> mutate(fCYR = factor(CYR)) |> ggplot(aes(x = Longitude, y = Latitude, fill = est, group = fCYR)) + geom_raster(aes(x = Longitude, y = Latitude, fill = est, group…
How to plot coverage and depth statistics of a bam file
This is the same problem I recently had, but I was not able to find a good solution. I was finally able to work a method which I am sharing here. While this question is 2 years old but I am hoping there are many individuals trying to work out…
qiime2 distance matrix as input for R – General Discussion
Hello, I’m trying to analyse skin Microbiota from human samples. (V3-V4 Illumina Miseq.) I finished the moving pictures tutorial up to the point of creating the distance matrices, everything looks neat. Now I’d like to take these results into R to create PCoA Plots. 1.) I exported the Bray-Curtis-Distance-Matrix from…
r – ggplot reorder y-axis by value of x-axis
I have this data set: bizdev company date category <chr> <chr> <date> <chr> 1 Jerry Harding Jones, Figueroa and Moon 2020-04-06 Mid-tier 2 Steve Vargas Stanley, Porter and Mccoy 2020-04-24 Indie 3 Steve Vargas Gardner and Sons 2020-04-19 Indie 4 Jerry Harding Parks-Chen 2020-04-11 Indie 5 Jerry Harding Bond Group…
TxDB.Hsapiens.UCSC.hg38.knownGene with locateVariants() identifying SNPs from various chromosome being part of the same gene
I am trying to annotate a list of SNPs using the hg38 genome (knownGene) and locateVariants(). The program is able to successfully run and provide “GeneIDs” for several of the loci. However, some GeneIDs are applied to SNPs in completely different regions and on completely different chromosomes. When I cross…
ChrX allele frequency in males and females
ChrX allele frequency in males and females 0 Hi, I have a question about the allele frequency output files (.frq) from VCFtools. My question is specific to chromosome X variants. I am writing out the allele frequencies for males and females separately to identify the variants with different allele frequencies….
[Solved] I need support for my R Studio class. I have a file as cvs and I…
I need support for my R Studio class. I have a file as cvs and I need to make an x-y scatter plot of date on the x-axis and hind-foot length on the y-axis, with different colors for the different genera. Image transcription text 25 surveys$date = ymd(paste(surveys$year, surveys$month, surveys$day,…
How to get human cDNA sequences together with UTR regions?
How to get human cDNA sequences together with UTR regions? 2 Dear all, I have downloaded the human genome and gtf files from Gencode. Based on these two files I want to generate a fasta file that has cDNA sequences including 5′ and 3′ UTRs for protein-coding genes only. What…
Find genes which are related to SNPs
Find genes which are related to SNPs 0 Hi Dears Is there any chance to get SNPEff or some files which work like that in NCBI or other dataset?? Actually I just want to find genes related to SNPs which are in a map file based on Oar_v4.0 or Oar_v3.1…
Batch correction and Deseq2
Hi, I would like to ask that the deta input fort Deseq2 is required a count metric. However, after I look into my count data, I would that there is a batch effect in my count data. How should I normalized my read count data before applying in Deseq2 and…
r – Cloropleth using ggplot
id field in GeoJSON is not handled by broom::tidy() as-is, so there are no State codes in the resulting data frame and join will not work as intended. Though the column named id is still there, thus left_join() will not throw an error. If you check your spdf_Brasil_MVI, you may…
How to create a graph that shows pairwise Fst values at single SNP loci across the all chromosomes in R
this is not a complete answer, but it may help to empower you to create your own graphs. i try to keep track of a lot of different visualization related tools (cmdcolin.github.io/awesome-genome-visualization/), but indeed, just hand-coding the solution in R is a good approach. I get my mind blown by…
samtools idxstats versus samtools view command
samtools idxstats versus samtools view command 1 Hi, I have mapped RNA-seq data to the human genome concatenated with a viral genome (26 chromosomes in total) with bowtie and need to get some numbers to calculate FPKM values manually for one viral gene, to retrieve the “total number of reads”…
Realigning BAM files to new reference
Hi, I am looking to create a panel of normals for a somatic variant caller. For normals, I have been provided with a set of WES bam files that have been preprocessed according to GATK best practices. However, they have been aligned to another reference genome than my case samples…
Failure working with the tmp directory
Nextflow Gatk DepthOfCoverage: Failure working with the tmp directory 0 I have nextflow workflow for which the process DepthOfCoverage failed to work with the defined tmp directory –tmp-dir tmp process pf_read_depth { tag “tag” scratch true publishDir … input: tuple val(pair_id), path(pf_bam) path refdir output: file(“final_${pair_id}.tsv”) script: “”” samtools index…
Python pgenlib assertion error when reading pgen file with flipped alleles
Hi Chris I want to use –ref-allele to create a .pgen file with a new set of reference alleles and use pgenlib in Python to read the data. I directly generated the .pgen file using plink2 \ –pfile /n/scratch3/users/j/jz286/imp_geno/ukb_imp_chr${CHROM}_v3\ –ref-allele ${REF_FILE} 10 3\ –make-pgen \ –out…
allele frequency from vcf
I am trying to figure out why the allele frequencies I get from Plink and vcftools are different from what I have on the vcf file. My vcf file was produced by GATK jointgenotyping and filtered using vcftools. I have 448 samples in there. Here is a line from the…
Extract reads within given region, and their mates
Extract reads within given region, and their mates 1 Hi there, I want to extract all the reads located inside a given location. I would like to extract also the mates of those reads, because I’m working with PE data. I know that I can extract the reads using samtools…
data wrangling – Expanding dataframe in R
An option in tidyverse would be to expand the data with uncount (similar to rep(…) in base R), then convert the ‘Dummy_N’ column to binary by creating a logical vector with the sequence (row_number()) and its value, grouped by ‘Group’ column library(dplyr)# version >= 1.1.0 library(tidyr) df %>% uncount(Total_N, .remove…
line 1 did not have 7 elements. –Putting bismark extractor files into methimpute
Error: line 1 did not have 7 elements. –Putting bismark extractor files into methimpute 0 file <- system.file(“extdata”,”file.txt”, package=”methimpute”) data(arabidopsis_chromosomes) arabidopsis_chromosomes$chromosome <- sub(‘chr’, ”, arabidopsis_chromosomes$chromosome) bismark.data <- importBismark(file, chrom.lengths=arabidopsis_chromosomes) I am trying to interpret my bismark results into methimpute but it gives me an error of : line 1 did…
Creating a trendline on a bar plot with ggplot2 in R
The discrete fill scale of your plot shows that the round variable you use for x and fill is not numeric, preventing geom_smooth from working properly. Reproduction of the issue library(dplyr) library(ggplot2) mydf <- iris %>% mutate(round = as.character(as.numeric(as.factor(Species)))) %>% group_by(round) %>% summarise(means = mean(Sepal.Length), se = sd(Sepal.Length)/sqrt(n())) %>% ungroup()…
Admixture error in removing loci with no genotypes
I am attempting to run ADMIXTURE on plink derived .bed files and continue to get the error “Error: detected that all genotypes are missing for a SNP locus. Please apply quality-control filters to remove such loci.” I have already performed mutiple quality control steps prior to this point. E.g: Filtered…