Tag: GREP

Merge multiple text files to create a combined dataframe and rename columns in R – General

Hi, I have multiple .txt files (each file contains 4 columns; an identifier Gene column, a raw_counts and other columns). I would like to merge those files into a combined dataframe using the common gene column. I was able to import multiple .txt files together, merge based on identifier column,…

Continue Reading Merge multiple text files to create a combined dataframe and rename columns in R – General

Unable to install bioconda packages in conda environments

From your command line it appears you are on windows. There are several veresions of pybedtools on bioconda, however, if I grep through them, they are all for the linux platform. If you’re on Windows 10, you could consider setting up the ‘windows subsystem for linux’ (and possibly Xming), installing…

Continue Reading Unable to install bioconda packages in conda environments

Detailed differences between sambamba and samtools

3 month , My first post in the new student group , The false-positive mutation appears because duplicates mark Not enough ?, Tells the story of supplementary read It won’t be GATK MarkDuplicates Marked as duplicates The problem of . after , In response to this question , I began…

Continue Reading Detailed differences between sambamba and samtools

How to find unique and common genus by comparing 4 different files containing genus ?

How to find unique and common genus by comparing 4 different files containing genus ? 0 HI,I have 4 files having genus names in them generated from soil meta sequencing of different agricultural fields like F1 F2 F3 F4A B A CD C F DF E G YH T Y…

Continue Reading How to find unique and common genus by comparing 4 different files containing genus ?

[slurm-users] gres/gpu count lower than reported

Hello Fellow Slurm Admins,   I have a new Slurm installation that was working and running basic test jobs until I added gpu support. My worker nodes are now all in drain state, with gres/gpu count reported lower than configured (0 < 4)   This is in spite of the…

Continue Reading [slurm-users] gres/gpu count lower than reported

Filtering bam file based on depth determined through samtools depth

Filtering bam file based on depth determined through samtools depth 1 Hi All, I have a bam file and I calculated read depth using samtools depth and I now want to filter the bam file to have only the contigs that have a depth between a certain value. I was…

Continue Reading Filtering bam file based on depth determined through samtools depth

deeptools plotHeatMap – Convert bed files to gene list?

I might suggest limiting your search to genes: gtf2bed < Mus_musculus.GRCm38.102.gtf | grep -w “gene” > sorted-mm10.genes.bed But that’s up to you. Otherwise, I think you may also get transcripts/exons, which may be more than you want. Again, up to you. If hnf4a-ko-downreg-clusters.bed is the file containing peaks, as described…

Continue Reading deeptools plotHeatMap – Convert bed files to gene list?

Fastp file merge append | Develop Paper

Interpretation of fastq file formatwww.jianshu.com/p/39115d21ee17 Sometimes, the sequencing results of a species will return two double ended fastps.r1.fq.gz l1.fq.gzr2.fq.gz l2.fq.gzThe content of sequencing data is actually one piece, but it is divided into two parts during transmission.When we use it, we are used to merging it into a double ended…

Continue Reading Fastp file merge append | Develop Paper

Htseq is giving me 0 counts using the GFF3 of miRBase

Hello! I am trying to annotate a miRNA-seq so that it gives me mature miRNAs where I already have 5p and 3p. For this, I have used the index mm10.fa and the miRBase mmu.gff3. I have aligned with HISAT2 and am trying to count with HTSeq, however I get 0…

Continue Reading Htseq is giving me 0 counts using the GFF3 of miRBase

mbedtls_ctr_drbg_reseed_internal() goes straight to z_arm_usage_fault() in mbedtls cert req example – Nordic Q&A – Nordic DevZone

I am trying to get this example program github.com/ARMmbed/mbedtls/blob/mbedtls-2.11.0/programs/x509/cert_req.c (stripped out all the file handling stuff) to run on an nrf9160, but it fails to seed the RNG. I stepped through the mbedtls_ctr_drbg_seed() function, right to the call to mbedtls_ctr_drbg_reseed_internal(), in which the first line is this: if( ctx->entropy_len >…

Continue Reading mbedtls_ctr_drbg_reseed_internal() goes straight to z_arm_usage_fault() in mbedtls cert req example – Nordic Q&A – Nordic DevZone

Trouble with bedtools getfasta

Trouble with bedtools getfasta 0 I am trying to extract sequences from a .fasta file based on a bed file using bedtools getfasta and I am getting the following error. The command run was the following: bedtools getfasta -fi genomic.fasta -bed bedfile.bed -fo output.fasta WARNING. chromosome (chr1) was not found…

Continue Reading Trouble with bedtools getfasta

Pytorch cuda is unavailable even installed CUDA and pytorch with cuda. How to fix?

My environment is (Ubuntu 20.04 with NVIDIA GTX 1080Ti): $ nvidia-smi | grep CUDA | NVIDIA-SMI 470.74 Driver Version: 470.74 CUDA Version: 11.4 | $ nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2021 NVIDIA Corporation Built on Sun_Aug_15_21:14:11_PDT_2021 Cuda compilation tools, release 11.4, V11.4.120 Build cuda_11.4.r11.4/compiler.30300941_0 After…

Continue Reading Pytorch cuda is unavailable even installed CUDA and pytorch with cuda. How to fix?

How to fix deeptoolsintervals fatal error: python.h? ( Python, Python 3.X )

Problem : ( Scroll to solution ) I tried installing pip3 install deeptoolsintervals the error is: deeptoolsintervals/tree/tree.c:1:20: fatal error: Python.h: No such file or directory compilation terminated. error: command ‘/usr/bin/x86_64-linux-gnu-gcc’ failed with exit code 1 I have Ubuntu 16.04.7 and Python versions installed: ls /usr/bin | grep python dh_python2 dh_python3…

Continue Reading How to fix deeptoolsintervals fatal error: python.h? ( Python, Python 3.X )

Convert list of Accession Numbers to Full Taxonomy

Using NCBI Entrez direct. $ esearch -db assembly -query “GCA_000005845” | elink -target taxonomy | efetch -format native -mode xml | grep ScientificName | awk -F “>|<” ‘BEGIN{ORS=”, “;}{print $3;}’ Escherichia coli str. K-12 substr. MG1655, cellular organisms, Bacteria, Proteobacteria, Gammaproteobacteria, Enterobacterales, Enterobacteriaceae, Escherichia, Escherichia coli, Escherichia coli K-12, If…

Continue Reading Convert list of Accession Numbers to Full Taxonomy

Ubuntu Manpage: samtools reheader – replaces the header in the input file

Provided by: samtools_1.13-2_amd64 NAME samtools reheader – replaces the header in the input file SYNOPSIS samtools reheader [-iP] [-c CMD | in.header.sam ] in.bam DESCRIPTION Replace the header in in.bam with the header in in.header.sam. This command is much faster than replacing the header with a BAM→SAM→BAM conversion. By default…

Continue Reading Ubuntu Manpage: samtools reheader – replaces the header in the input file

r – Avoiding eval-parse or do.call

I am trying to select a theme from ggplot2 based on some string given. For demo purposes, consider the following code: library(dplyr); library(ggplot2) mtcars %>% ggplot(aes(mpg, wt))+ geom_point() -> p all_ggplot2_funs <- getNamespaceExports(“ggplot2”) p + eval(parse(text=paste0(all_ggplot2_funs[grep(“theme_”, all_ggplot2_funs)][15], “()”))) This works fine and would allow me to use theme_minimal. However, from…

Continue Reading r – Avoiding eval-parse or do.call

Average Read length

Average Read length 3 Hello Everyone! Is there a standard tool commonly used to calculate the average read length of fastq files? If yes please mention it here because I want to know the size of average reads of my fastq files so that I can decide the cutoff for…

Continue Reading Average Read length

Index of /~psgendb/doc/bioLegato/blreads

Name Last modified Size Description Parent Directory   –   SOAPdenovo2.hints.html 2019-05-04 15:52 3.9K   Trimmomatic.hints.html 2019-05-20 13:32 6.3K   Trinity.hints.html 2019-04-23 11:39 2.4K   adaptercheck.hints.html 2021-05-13 12:27 8.0K   adaptercheck.html 2021-05-12 17:45 4.9K   adaptercheck_output.png 2021-05-12 17:17 51K   fastq_pair.hints.html 2019-04-05 13:16 3.4K   gffcompare.hints.html 2018-07-18 14:05 3.2K  …

Continue Reading Index of /~psgendb/doc/bioLegato/blreads

[BUG] (python-fastapi) OneOf class not generated

Bug Report Checklist Description For oneOf fields, the python-fastapi server generator creates a function that returns a OneOf* class, but that class itself is not generated. For example, for an API like /status: get: summary: Get the status of the upstream server responses: 200: description: successful operation content: application/json: schema:…

Continue Reading [BUG] (python-fastapi) OneOf class not generated

Extracting Number of SNPs via parsing MD tags

Hello all, I’m having a bit of difficulty wrapping my head around a task involving extracting the total number of SNPs from an alignment via creating a string parser/grep command which would be able to extract only the SNPs and ignoring indels. I am currently using a python script utilising…

Continue Reading Extracting Number of SNPs via parsing MD tags

In slurm; count number of folders in directory, encounter directory error

I have a lot of data to run through using slurm and I figured I could use a for loop sequence as it’s based on a range. The data takes a long time to generate so altering the output structure is not an option. The problem: When running my job…

Continue Reading In slurm; count number of folders in directory, encounter directory error

query sequence is input sequence or its reverse complement

query sequence is input sequence or its reverse complement 0 >sp|O14920.1|IKKB_HUMAN RecName: Full=Inhibitor of nuclear factor kappa-B kinase subunit beta; Short=I-kappa-B-kinase beta; Short=IKK-B; Short=IKK-beta; Short=IkBKB; AltName: Full=I-kappa-B kinase 2; Short=IKK2; AltName: Full=Nuclear factor NF-kappa-B inhibitor kinase beta; Short=NFKBIKB; AltName: Full=Serine/threonine protein kinase IKBKB MSWSPSLTTQTCGAWEMKERLGTGGFGNVIRWHNQETGEQIAIKQCRQELSPRNRERWCLEIQIMRRLTH PNVVAARDVPEGMQNLAPNDLPLLAMEYCQGGDLRKYLNQFENCCGLREGAILTLLSDIASALRYLHENR IIHRDLKPENIVLQQGEQRLIHKIIDLGYAKELDQGSLCTSFVGTLQYLAPELLEQQKYTVTVDYWSFGT LAFECITGFRPFLPNWQPVQWHSKVRQKSEVDIVVSEDLNGTVKFSSSLPYPNNLNSVLAERLEKWLQLM LMWHPRQRGTDPTYGPNGCFKALDDILNLKLVHILNMVTGTIHTYPVTEDESLQSLKARIQQDTGIPEED QELLQEAGLALIPDKPATQCISDGKLNEGHTLDMDLVFLFDNSKITYETQISPRPQPESVSCILQEPKRN LAFFQLRKVWGQVWHSIQTLKEDCNRLQQGQRAAMMNLLRNNSCLSKMKNSMASMSQQLKAKLDFFKTSI…

Continue Reading query sequence is input sequence or its reverse complement

NCBI’s Efetch not working

Any help would be much appreciated. My goal is to run the following for loop to generate a list of sample_id (which is actually isolation site) for a list of SRAs. However I get an error (see below) for each and every SRA. for sra in `awk ‘NR>1{print $1}’ metadata.txt`…

Continue Reading NCBI’s Efetch not working

Count 5’End Mapped To A Specific Genomic Position

Count 5’End Mapped To A Specific Genomic Position 7 I got several SAM/BAM files, and I am interested in 5’ends of the mapped reads. Is there any tools or scripts to count how many 5’ends are mapped at a specific genomic position? N.B. I am not try to count the…

Continue Reading Count 5’End Mapped To A Specific Genomic Position

How to extract genomic upstream region of a protein identified by its NCBI accession number?

How to extract genomic upstream region of a protein identified by its NCBI accession number? 1 I have a list of NCBI protein accession numbers. I would like to extract out the upstream genomic region of the corresponding gene’s nucleotide sequence. I will be thankful to you if you can…

Continue Reading How to extract genomic upstream region of a protein identified by its NCBI accession number?

PhD Student Needed for Machine Learning (Deep Learning and Classical) in Molecular Biology

Job:PhD Student Needed for Machine Learning (Deep Learning and Classical) in Molecular Biology 0 Several openings are available immediately (or as late as Fall 2022) Looking for a highly motivated PhD student for Computational Biology research, with an algorithm development focus. The Ecological and Evolutionary Signal-processing (EESI) and Informatics lab…

Continue Reading PhD Student Needed for Machine Learning (Deep Learning and Classical) in Molecular Biology

Use grep to loop a command in a script

Hello, I am doing a measurement of the HWE per Population. I have done this already without trouble with 10 populations, but now I’m doing it with 89 populations so I’d like to create a script. I use this command to create a list with all the populations and their…

Continue Reading Use grep to loop a command in a script

Exon coordinates and sequence

I did it like that: 1- Download refGene.txt.gz and hg19.fasta from the UCSC goldenpath. ( note: convert hg19.2bit to hg19.fa using twoBitToFa ) 2- Create a bed file with exon coordiniate using my awk script // to_transcript.awk BEGIN { OFS =”t” } { name=$2 name2=$13 sens = $4 ==”+” ?…

Continue Reading Exon coordinates and sequence

Correct way to make multiple comparisons on DESeq2?

I have a project where I have done RNA-seq (paired-end sequencing on Illumina HiSeq) of a worm at different days of development i.e. Ages 0-12. For each age, I have sequenced 3 replicate specimens. I’m new to DESeq2 and I was wondering if what I did below is correct. library(DESeq2)…

Continue Reading Correct way to make multiple comparisons on DESeq2?

Answer: Highly mapped to introns

I think your problem is that your bed file doesn’t match the genome/gtf you used. I think it’s too old. My $gtf is the version 104 one like yours. zcat hg19_Ensembl_gene.bed.gz | head chr1 **66999065** 67210057 **ENST00000237247** 0 + 67000041 67208778 0 27 25,123,64,25,84,57,55,176,12,12,25,52,86,93,75,501,81,128,127,60,112,156,133,203,65,165,1302, 0,863,92464,99687,100697,106394,109427,110161,127130,134147,137612,138561,139898,143621,146295,148486,150724,155765,156807,162051,185911,195881,200365,205952,207275,207889,209690, grep ENST00000237247 $gtf 1 havana…

Continue Reading Answer: Highly mapped to introns

Converting between UCSC id and gene symbol with bioconductor annotation resources

You need to use the Homo.sapiens package to make that mapping. > library(Homo.sapiens) Loading required package: AnnotationDbi Loading required package: stats4 Loading required package: BiocGenerics Loading required package: parallel Attaching package: ‘BiocGenerics’ The following objects are masked from ‘package:parallel’: clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, clusterMap, parApply, parCapply, parLapply, parLapplyLB, parRapply,…

Continue Reading Converting between UCSC id and gene symbol with bioconductor annotation resources

Produce PCA bi-plot for 1000 Genomes Phase III

Note1 – Previous version: Produce PCA bi-plot for 1000 Genomes Phase III in VCF format (old) Note2 – this data is for hg19 / GRCh37 Note3 – GRCh38 data is available HERE The tutorial has been updated based on the 1000 Genomes Phase III imputed genotypes. The original tutorial was…

Continue Reading Produce PCA bi-plot for 1000 Genomes Phase III

Exec format error in unmapped bam file

Exec format error in unmapped bam file 0 Hello I created unmapped bam file from fastq file (sample 1). When I tried to search the bam file using query name, I got the ‘Exec format error’ #1_ucheck.bam: unmapped bam file from Sample 1 fastq file code: samtools view 1_ucheck.bam |…

Continue Reading Exec format error in unmapped bam file

Quick Way To Combine Two Datasets Using Only Common Markers

Quick Way To Combine Two Datasets Using Only Common Markers 6 Is there a quick way to combine two datasets so that only the common markers are kept? Currently, if I have two datasets, I have to first get the intersection of the two BIM/MAP files, then extract those markers…

Continue Reading Quick Way To Combine Two Datasets Using Only Common Markers

Running htseq-count to “grab” long non coding gene_id names

Running htseq-count to “grab” long non coding gene_id names 0 hi all, new to bioinformatics. so bare with me.. I am trying find long non coding RNA from RNA-seq data. As i checked the human gtf file there are 2 different types of long non coding RNA, “lnc_RNA” and “lncRNA”,…

Continue Reading Running htseq-count to “grab” long non coding gene_id names

keep only genes expressed in sample at the same time as a particular gene of interest

Filter DGElist object: keep only genes expressed in sample at the same time as a particular gene of interest 0 Hello, need some help here as I’m kind of stuck with the edgeR DGElist format. I have a DGE list named x with the following dimensions: > dim(x ) [1]…

Continue Reading keep only genes expressed in sample at the same time as a particular gene of interest

Merge contigs in fasta file

Merge contigs in fasta file 1 Hello All, I am running variant calling on some species whose reference genomes have a very high number of contigs (sometimes >400,000). The variant caller I am using splits the job by the number of chromosomes, and is overwhelmed when this number is too…

Continue Reading Merge contigs in fasta file

Cleaning Blast results

Dear All I am new at writing codes and simple scripts . I have local blastp result file that has multiple hits fora single accession number. I have been able to separate a file with all acession numbers and the “Score” line using”grep” command. The file gave the results as…

Continue Reading Cleaning Blast results

command for common between three files

command for common between three files 1 I have three text files and I want to know the difference between the 3 files and the common between 3 files. It looks like this: 1st file: hsa_circ_0072810 hsa_circ_0072811 hsa_circ_0072813 hsa_circ_0098750 hsa_circ_0125807 hsa_circ_0000295 hsa_circ_0134603 hsa_circ_0001196 hsa_circ_0097585 hsa_circ_0097586 hsa_circ_0006118 hsa_circ_0080950 hsa_circ_0102355 hsa_circ_0000175 hsa_circ_0000934…

Continue Reading command for common between three files

List of human protein coding genes with given name (known function?)

List of human protein coding genes with given name (known function?) 2 Hello, To put it simply, I am doing differential expression analysis on human RNA-seq data and I want to focus my analysis of genes that are: 1) Protein coding, so no SNOR or MIR 2) Genes with a…

Continue Reading List of human protein coding genes with given name (known function?)

Extract fastq reads by lists of sequences

Extract fastq reads by lists of sequences 0 Hello, I have lists of sequence which I would like to find fastq reads that contain these sequences. Is there a tool or any possible programming to find fastq reads from specific lists of sequences?? My lists of sequences look like following,…

Continue Reading Extract fastq reads by lists of sequences

r – How to replace row names in DESeq2 rlogTransformation matrix with actual gene name info present on another sheet?

I’m new to R and DESeq2 and I’m trying to run differential expression as below library(DESeq2) count_file_names <- grep(“counts”,list.files(“HTSeq_counts”),value=T) host_type < c(“Damaged”,”Control”) sample_information <-data.frame(sampleName = count_file_names, fileName = count_file_names, condition = host_type) DESeq_data <- DESeqDataSetFromHTSeqCount(sampleTable = sample_information, directory = “HTSeq_counts”, design = ~condition) colData(DESeq_data)$condition <- factor(colData(DESeq_data)$condition,levels = c(‘Damaged’,’Control’)) rld <-…

Continue Reading r – How to replace row names in DESeq2 rlogTransformation matrix with actual gene name info present on another sheet?

How to replace row names in DESeq2 rlogTransformation matrix with actual gene name info present on another sheet?

I’m new to R and DESeq2 and I’m trying to run differential expression as below library(DESeq2) count_file_names <- grep(“counts”,list.files(“HTSeq_counts”),value=T) host_type < c(“Damaged”,”Control”) sample_information <-data.frame(sampleName = count_file_names, fileName = count_file_names, condition = host_type) DESeq_data <- DESeqDataSetFromHTSeqCount(sampleTable = sample_information, directory = “HTSeq_counts”, design = ~condition) colData(DESeq_data)$condition <- factor(colData(DESeq_data)$condition,levels = c(‘Damaged’,’Control’)) rld <-…

Continue Reading How to replace row names in DESeq2 rlogTransformation matrix with actual gene name info present on another sheet?

Filter dosage file by list of SNP IDs

Filter dosage file by list of SNP IDs 1 Hello, does anyone by any chance know of a fast/computationally efficient way to select lines in a .dosage file if the first column’s SNP ID is also contained within a .txt document of SNP IDs? The .dosage file is in the…

Continue Reading Filter dosage file by list of SNP IDs

geneiD-genetranscript annotations

Hello, Trying to generate a frame with 2 columns: transcript_id and gene_id, in LINUX (gtf from esembl) grep -P -o ‘ESNCAGd{11} Equus_caballus.EquCab3.0.104.gtf’ > ensecag.txt grep -P -o ‘ESNCATd{11} Equus_caballus.EquCab3.0.104.gtf’ > ensecat.txt wc -l enseca* # To see if both files have the same length They are not the same length:…

Continue Reading geneiD-genetranscript annotations

How to get the total genic and intergenic length of a chromosome?

It looks like you have a .gtf file. That means you can extract the exon lines from the .gtf file and count and sum up the exonic intervals. You can generate a sorted .bed file of exon coordinates by: grep -P ‘texont’ your.gtf | cut -f 1,4,5 | sort -k1,1…

Continue Reading How to get the total genic and intergenic length of a chromosome?

Split Fasta file and rename output files with contig names

Split Fasta file and rename output files with contig names 2 Hello! I am trying to split a large fasta file (19,336 lines) into individual contigs. The file set up is as follows: >k141_284136 flag=1 multi=3.0000 len=1875 AGCCTACATTGGCAAGGTACTGCTTTTGTCGCCCATCGTTGGCGAATTTGCTAATGAGAACACACGGAT >k141_407195 flag=1 multi=5.0000 len=1723 GCCAGTAGTTTTCAGATTTTCAATTACTTTCTTTGCTTCTTTTAACGCAGCCGCAAAGTTGTCATCAAGTTCTCCACCCTGTGCAATATGTTTATATAGAATGCTGCTTACTTTGTCAGCAA >k141_169332 flag=1 multi=3.0000 len=20 ATTATCCATCCTATTCATCGCTTGATGAAATGTTGCAAAATTCCAAAGATTTTCAGCGTCAAATCGTTCGTATATCCTAATTAAACACCGCTAAAAGTTATGTCTAAGCAATCTTTAA I am…

Continue Reading Split Fasta file and rename output files with contig names

Vcf file sorting

Vcf file sorting 1 I got vcf file from my instructor. It is VEP annoted with over 50 options separated by ||. I noticed that the vcf is not arrange to appropriate columns so I decided to sort it. I used this code to sort my vcf file according position:…

Continue Reading Vcf file sorting

More than one archive specified. Try –help.

Package: routine-update Version: 0.0.6 Severity: important Hi Andreas, when working on making sure the python-biopython watch file was appropriately fixed, I saw routine-update choke with the following error: $ routine-update gbp:info: Fetching from default remote for each branch gbp:info: Branch ‘master’ is already up to date. gbp:info: Branch ‘pristine-tar’ is already up to date. gbp:info: Branch…

Continue Reading More than one archive specified. Try –help.

SRA/ENA library layout is inconsistent with the data source

project number: PRJNA505380 An example of Run accession: SRR8244780 Issue: Inconsistency between the library layout of Run and data source. As the library layout both in ENA and SRA labeled, Runs in Bioproject PRJNA505380 should be pair-end reads data. But some of them only have a single fastq and without…

Continue Reading SRA/ENA library layout is inconsistent with the data source

How To Filter Mapped Reads With Samtools

Hi, You get a bam (machine readable sam) file after mapping, and it contains information about mapped and unmapped reads. To get the unmapped reads from a bam file use: samtools view -f 4 file.bam > unmapped.sam the output will be in sam to get the output in bam, use:…

Continue Reading How To Filter Mapped Reads With Samtools

Biostar Systems

Comment: STAR vs Novoalign IGV Browser visualization by chasem &utrif; 10 That is good to know that it isn’t just my set of reads…still concerning, though. Comment: STAR vs Novoalign IGV Browser visualization by chasem &utrif; 10 I was not expecting this — not sure what to make of it…

Continue Reading Biostar Systems

How to fix GTF files by adding specific strings into empty gene_id “”

How to fix GTF files by adding specific strings into empty gene_id “” 1 Hi, I want to repair GTF file by adding a unique string (such as Product name) to empty gene_id “”. I would really appreciate it if anyone could provide any solution. For example: grep -m1 ‘gene_id…

Continue Reading How to fix GTF files by adding specific strings into empty gene_id “”

Download nucleotide sequence with locus_tag

Download nucleotide sequence with locus_tag 1 I have a list of locus_tag, my idea was to download them using esearch but the downloaded file is not the desired gene, instead the nucleotide sequence of the entire contig is downloaded. in this example my gene of interest to download has 830…

Continue Reading Download nucleotide sequence with locus_tag

TPM to logFC and pvalues

Hi, I assume you have to find differential expression between two cell lines (Cx and Dx groups). Since you need logFC and Pvalue, this R code can work. And you can use obtained matrix (mysample) to calculate FDR of your interest. mysample <- read.table(“./mymatrix.csv”, sep=”,”, header=TRUE) for(i in 2:nrow(mysample)) {…

Continue Reading TPM to logFC and pvalues

Difference in alignment length between FASTA and HitTable

Difference in alignment length between FASTA and HitTable 0 Hello all, I’ve a horrible feeling this is going to be a stupidly obvious answer but I’ve had no luck finding a similar question amongst the forum or in the BLAST manual. I’ve used BLAST on some sequences. I’ve then downloaded…

Continue Reading Difference in alignment length between FASTA and HitTable

EOF marker absent in VCF

EOF marker absent in VCF – can this be safely ignored? 0 Hi, I generated a VCF file using a bcftools mpileup | bcftools call pipeline. I have done this before, and the file produced then looks fine. However, the log for this one had [W::bgzf_read_block] EOF marker is absent….

Continue Reading EOF marker absent in VCF

Error while subsetting VCF – error doesn’t check out with (z)grep

Error while subsetting VCF – error doesn’t check out with (z)grep 0 I’m using bcftools view -s to subset a VCF.gz file. I ran into an error: [E::vcf_parse_format] Number of columns at chr9:44897051 does not match the number of samples (90 vs 99) To look at this site, I ran…

Continue Reading Error while subsetting VCF – error doesn’t check out with (z)grep

Integrated Dimension Reduction Plot for CD4/CD8 sorted Feedback

Integrated Dimension Reduction Plot for CD4/CD8 sorted Feedback 1 Hello, I have recently followed adopted the Harvard Chan Bioinformatics Core guidelines for SC QC/Normalization/Clustering (hbctraining.github.io/scRNA-seq_online/schedule/links-to-lessons.html). I have integrated CD4+/CD8+ T cells from two time points. I recently received feedback that my integrated dimension reduction plot clustering looked problematic. Specifically, the…

Continue Reading Integrated Dimension Reduction Plot for CD4/CD8 sorted Feedback

How to separate sub-families from transposons sequence based fasta files?

How to separate sub-families from transposons sequence based fasta files? 1 I’m working on the classification of transposable elements. I want to retrieve sequences of their sub-classes in separate files. Is there any code or tool present to separate their sub-families because dataset contains thousands of sequence entries for different…

Continue Reading How to separate sub-families from transposons sequence based fasta files?

Where To Find Annotation File For Agilent Microarray?

An easier way that has [probably] only come about since this question was posted is via biomaRt in R. You can build annotation tables for Agilent 4×44 arrays for mouse and human as follows: require(biomaRt) Homo sapiens # agilent_wholegenome_4x44k_v1 mart <- useMart(‘ENSEMBL_MART_ENSEMBL’) mart <- useDataset(‘hsapiens_gene_ensembl’, mart) annotLookup <- getBM( mart…

Continue Reading Where To Find Annotation File For Agilent Microarray?