Category: gcta

Anyone could help me with cutadapt?

I’, trying to trim my primers off from Illumina sequences. I ‘ve amplified with: Diat_rbcL_F1 AGGTGAAGTAAAAGGTTCWTACTTAAA, Diat_rbcL_F2 AGGTGAAGTTAAAGGTTCWTAYTTAAA and Diat_rbcL_F3 AGGTGAAACTAAAGGTTCWTACTTAAA as Forward primers and Diat_rbcL_R1 5’CCTTCTAATTTACCWACWACTG 3’ (Reverse Complement: 3’CAGTWGTWGGTAAATTAGAAGG 5’) and Diat_rbcL_R2 5’CCTTCTAATTTACCWACAACAG 3’ (Reverse Complement: 3’CTGTTGTWGGTAAATTAGAAGG 5’) Reverse primers. First, I used PEAR to assembly paired reads…

Continue Reading Anyone could help me with cutadapt?

Download FASTA sequences for known viral reference genomes

Take a look at this report file for viral genomes. I would only need DNA based viruses, and ones that infect humans You can filter/parse out entries you need from it. Then download the genome sequence using EntrezDirect: $ efetch -db nuccore -id NC_030449.1 -format fasta >NC_030449.1 Unidentified circular ssDNA…

Continue Reading Download FASTA sequences for known viral reference genomes

Revision DNA sequencing and genotyping – 300820 – Genes, Genomics And

GGHH 300820 2021 Workshop: DNA and RNA sequencing and Genotyping Why do we need to sequence the human genome? Why do we need to sequence genomes from different populations? DNA Sequencing Targeted DNA sequencing (Sanger Sequencing): sequencing small targeted regions of the genome to tests for the presence/absence of mutations…

Continue Reading Revision DNA sequencing and genotyping – 300820 – Genes, Genomics And

samtools to count the number of reads mapped to each spike-in for each sample

samtools to count the number of reads mapped to each spike-in for each sample 0 My goal is to use STAR to create a new genome with the spike-ins listed below by combining both hg38.fa and spike-in. Once I have the genomes created, I’ll align FASTQs to this newly created…

Continue Reading samtools to count the number of reads mapped to each spike-in for each sample

sam2tsv listing incorrect reference sequence & positions

Duplicate of: github.com/lindenb/jvarkit/issues/190 Hi can anyone help me resolve the issue I’m having with sam2tsv. It is a nifty piece of software but I have been encountering issues with it regarding the numbering of nucleotides it shows for the reference sequence. Here’s what sam2tsv tells me: The nucleotide string marked…

Continue Reading sam2tsv listing incorrect reference sequence & positions

How to assemble read with a minimum 2 coverage per site

How to assemble read with a minimum 2 coverage per site 0 Hi, I have a query regarding read assembly I have a bam file I made a consensus sequence but I want to make a consensus sequence with a minimum of 2 coverage per site instead of full coverage…

Continue Reading How to assemble read with a minimum 2 coverage per site

Exclude specified range of bases from multiple sequences in a FASTA file

Exclude specified range of bases from multiple sequences in a FASTA file 0 Hi, I am trying to eliminate a range of bases from sequences within a FASTA file in multiple places based on the header ID and positions that I mention. For example; I have file; A.fa >ID1 TTGTTCAACGGATCCACCTGTTGCCAAGAGTGCTTCAGTACATTGCTCACGGCTGAATCCCATATCCATCAAAGCACAAGATTTGAATTCACTCGAGGATCTGCTTCGTCGACCATTGGAAATGAAAAAATTACAATTACACATTGAATTTGTAAAGCTTGAAATTAATGAACTTACCAAAATAGATTTGCACACAGAAGCAACAGCTTGGCCGTGTTACAACTTGTAACGGGTAAAGACAAAATCGCTAACAACGGTTGTAGGCCACCATGTTCCACAAATTCACGACA…

Continue Reading Exclude specified range of bases from multiple sequences in a FASTA file

Identify Mapped reads and Unmaped Mate pairs

Identify Mapped reads and Unmaped Mate pairs 0 Hey, I have a sorted bam file which i got by mapping with reference. I want to identify two things from this bam file. How many reads mapped to each gene/contig in my bam file. As this was paired-end data, i want…

Continue Reading Identify Mapped reads and Unmaped Mate pairs

How to make BLASTN be aware of short read?

I’m using blastn (anaconda.org/bioconda/blast) to find similar sequences of a target sequence against a FASTA file. But my read is quite short (68 bases). I realised that blastn won’t report any hit. But there is actually a very good one in the FASTA file after checking manually. Here is the…

Continue Reading How to make BLASTN be aware of short read?

biopython write fasta

Step 1 − Create a file named blast_example.fasta in the Biopython directory and give the below sequence information as input. 3. “””Bio.SeqIO support for the “fasta” (aka FastA or Pearson) file format. Then we save this line of text to the output file: Now we have finished all the genes,…

Continue Reading biopython write fasta

grep command for fasta header

grep command for fasta header 0 I used this command:– grep -Fw -A 1 -f header.txt test.fa >test_result.fa But it extracts only 1 header, not the whole which are present in my header.txt file. my header.txt file looks like:— hsa_circ_0000006 hsa_circ_0000014 hsa_circ_0000015 hsa_circ_0000042 hsa_circ_0000070 hsa_circ_0000072 hsa_circ_0000131 hsa_circ_0000133 hsa_circ_0000160 hsa_circ_0000175 hsa_circ_0000211…

Continue Reading grep command for fasta header

Understanding Conditional analysis

Understanding Conditional analysis 1 Hello, I’m new to data analysis and I’m having some trouble understanding the process and need help with a few questions. I am running a conditional analysis on chromosome 19 in relation to Alzheimer’s disease, I understand that conditional analysis tests whether SNPs have association independent…

Continue Reading Understanding Conditional analysis

cDNA: Genbank U77627

chaperonin containing TCP1, subunit 7 cDNA: Genbank AF506229   hi800a, 2191 and hi3979 are all in the 1st intron   cgacggcgacgagtcgactagagagttctaggtgcattgtgggtacctca cccGGCACCGTCCTCCTCTATTCCGCAATCATGATGgtgagtgtgtacat gcctgttttaccttcaaatcctttgtcatctgtccattttactttattct tggtgttttgacataaaagcctggggacaatatagcataaagtacatgtt tcttcgattatttttgttttcaatgaattcataagctgacgtttttcatt gatgctagttagctttgtggctaacgattcattcttcaatttgagaatgc cagtttgaattatgagctttgta hi2191 (5π-3π) ttgtgacc approx hi800a (3π-5π) caacgtgcaaaatgcgacgtctgtgtaataatgtgcacattttttttatt atgcttttattttatttttttcttcggattgttttaatcgtataggttga taaaggatgccatggtttagtttagcatcatttaggtaacgtcaatggag aaagagctaacggtaatgatcttttacctttaaaacctcaagggttgttc ctctcagctgtcactccaaaaaacgtaatggaacggttttttttctagga caaagacttttttattacagacacacaaagatgttagcgatttttgctta tgacagcaagttgtttgtatgacagcaagtcgtttgagaaaccctgaaca g hi3979 (5π-3π) cctttaactattagacgctaatgtttttatgatttccaagtaacttttat gaagcatatttaaaccttttcagaaaaaagtgatttacacacttggtaag attataataatctgcataatatggagaatgtcagtattattatatatgtg tactagtttggttttctcttaagagtactaaaattagttgtcattaaaaa aggcgacgtattaatgataatgcattaattcttgaattttttcctacagT CCACTCCAGTCATCCTCTTGAAAGAGGGCACAGACACCTCTCAGGGGGTC CCACAACTGGTCAGCAACATAAATGCCTGCCAGGTTGTGGCAGAGGCTGT…

Continue Reading cDNA: Genbank U77627

Characterization and expression of DNA sequences encoding the growth hormone gene in African Pygmy Mouse (Mus minutoides)

Abstract We determined the nucleotide sequence of the growth hormone (Gh) gene in Mus minutoides, one of the smallest mammals, where was predicted to be distinct in the functional regions between M. minutoides and Mus musculus. To investigate the evolutionary characteristics of Gh in M. minutoides, we constructed a phylogenetic…

Continue Reading Characterization and expression of DNA sequences encoding the growth hormone gene in African Pygmy Mouse (Mus minutoides)

How to extract unique mapped results from Bowtie2 bam results?

How to extract unique mapped results from Bowtie2 bam results? I used samtools view -bq 1 WG.bam > unique.bam However, my results contain 54792 lines, why it is not 42097?   After I have the subset of those reads, how can I extract them from sam or bam file to create a…

Continue Reading How to extract unique mapped results from Bowtie2 bam results?

Primer3 issue with the Sequence_ID

Primer3 issue with the Sequence_ID 0 Hi everyone, I am trying to sign primers for my sequences but I have an error that I do not understand the problem. I made my finputfile.txt and I made the command below: aka@aka:~/Desk/Primer$ /home/aka/primer3/src/primer3_core < input.txt > result.txt input.txt PRIMER_TASK=generic PRIMER_PICK_LEFT_PRIMER=1 PRIMER_PICK_INTERNAL_OLIGO=0 PRIMER_PICK_RIGHT_PRIMER=1…

Continue Reading Primer3 issue with the Sequence_ID

Segemehl -D option doesnt work for allowing differences

Segemehl -D option doesnt work for allowing differences 0 I am trying to map short some specific short reads (19~20nts) against long reads of a fasta file (F1.fasta). I used Segemehl tool and indexed the F1.fasta file (long reads) and then used the command line below to perform the alignment:…

Continue Reading Segemehl -D option doesnt work for allowing differences

Exec format error in unmapped bam file

Exec format error in unmapped bam file 0 Hello I created unmapped bam file from fastq file (sample 1). When I tried to search the bam file using query name, I got the ‘Exec format error’ #1_ucheck.bam: unmapped bam file from Sample 1 fastq file code: samtools view 1_ucheck.bam |…

Continue Reading Exec format error in unmapped bam file

mutiple fasta to single fasta

mutiple fasta to single fasta 2 i have a huge reference genome with a lot of contigs, it looks something like this. >aalba5_s00000010 TTGTCTGCTTCACAGTACAGCTAGAAAATTATGAATTCATTTCCCCACATCAAGCAACCCCTGCTTATTC >aalba5_s00000011 ACTTGGAATGGGATCTTGTTGGGGGGCCAACAGAACCATAAGGGCAATGGCTGCAATCTTTGATAAGATC >aalba5_s00000012 TGTAGCAAACAGCTACGGAAAAATTTTAAAAATTTTCGAAATTTAAATCTGGGGTTCCCTTTCCTGTGTA GATGTATTCCCTTTTTAAAGGTTTTCCTAGGACTTGCAGTCATTAATGAGACGTCTTCTCATGATATCCT AATTTTTGGAAGATGCCTCCTACATCAGGAATCTTTGCTGCCACTTGTCTCTTTCATCAGCCAGATGTCT how can i subset this that i have a file each with the filename of the name of the contig…

Continue Reading mutiple fasta to single fasta

Kaggle Learn Pandas

Listing Results Kaggle learn pandas Learn Pandas Tutorials Kaggle Learn Kaggle.com All Courses 5 hours agoPrepares you for these Learn Courses: Geospatial Analysis, Data Cleaning. Tags: pandas. Instructor. Aleksey Bilogur. Educator. Aleksey is a civic data specialist and … 1. Creating, Reading and WritingCreating, Reading and Writing | Kaggle2. Indexing,…

Continue Reading Kaggle Learn Pandas

Replace fasta header using bash : bioinformatics

Hello people, I got stucked with my new script and perhaps you can help me. Its goal is to take an input table with querys and subjects (originated by a local blast) and replace query names with subject names in the corresponding fasta file. In detail, the table input file…

Continue Reading Replace fasta header using bash : bioinformatics

Calculation of the GRM in GCTA

Calculation of the GRM in GCTA 0 Hi experts, I am new to the field. I have some questions wrt the calculation of the GRM: by using the GCTA software: Let’s say I have a population of genome data for N= 4800000, and the original genotype SNPs, and also the…

Continue Reading Calculation of the GRM in GCTA

Split Fasta file and rename output files with contig names

Split Fasta file and rename output files with contig names 2 Hello! I am trying to split a large fasta file (19,336 lines) into individual contigs. The file set up is as follows: >k141_284136 flag=1 multi=3.0000 len=1875 AGCCTACATTGGCAAGGTACTGCTTTTGTCGCCCATCGTTGGCGAATTTGCTAATGAGAACACACGGAT >k141_407195 flag=1 multi=5.0000 len=1723 GCCAGTAGTTTTCAGATTTTCAATTACTTTCTTTGCTTCTTTTAACGCAGCCGCAAAGTTGTCATCAAGTTCTCCACCCTGTGCAATATGTTTATATAGAATGCTGCTTACTTTGTCAGCAA >k141_169332 flag=1 multi=3.0000 len=20 ATTATCCATCCTATTCATCGCTTGATGAAATGTTGCAAAATTCCAAAGATTTTCAGCGTCAAATCGTTCGTATATCCTAATTAAACACCGCTAAAAGTTATGTCTAAGCAATCTTTAA I am…

Continue Reading Split Fasta file and rename output files with contig names

samtools returns error – cigar and query sequence are of different length even though cigar and query sequence are the same length

samtools returns error – cigar and query sequence are of different length even though cigar and query sequence are the same length 1 I have written a program in python that processes and outputs mapped results as either a bam or sam file. It works fine until I have indels…

Continue Reading samtools returns error – cigar and query sequence are of different length even though cigar and query sequence are the same length

How to determine exact tRNA sequence

How to determine exact tRNA sequence 0 I use the tRNAscan-SE v2.0.9 for trna prediction. One of the trna predicted is trnG-GCC (with CORE 16.5). The sequence is shown below at coordinate 91435-91506: AGCGGAAGGATGAACCCTCAACCTCAGCCTTGGCAAGGCTATGCTCTACCATTAAGATTAAGCTATTTCCGC I also use MITOFY webserver for annotation. The same trnG-GCC (with CORE 25.78) was predicted by the…

Continue Reading How to determine exact tRNA sequence

Disappearing CB, the bam tag after samtools sort -t CB

I’ve been trying to setup an analysis pipline for RNAvelocity in AWS EC2. I used one of the 10x dataset, 10k Peripheral blood mononuclear cells (PBMCs) from a healthy donor, Single Indexed, as a test model to setup the pipeline. For speed and cost saving, I first used samtools to…

Continue Reading Disappearing CB, the bam tag after samtools sort -t CB

Split fastq according to barcodes

Hello, everyone: I’m recently analyze my scRNA-seq data, the first step is to splitting fastq files according to my barcode file which looks like this: sc1 AACGTGAT sc2 AAACATCG sc3 ATGCCTAA sc4 AGTGGTCA sc5 ACCACTGT sc6 ACATTGGC sc7 CAGATCTG sc8 CATCAAGT sc9 CGCTGATC sc10 ACAAGCTA sc11 CTGTAGCC sc12 AACGCTTA My…

Continue Reading Split fastq according to barcodes

Download nucleotide sequence with locus_tag

Download nucleotide sequence with locus_tag 1 I have a list of locus_tag, my idea was to download them using esearch but the downloaded file is not the desired gene, instead the nucleotide sequence of the entire contig is downloaded. in this example my gene of interest to download has 830…

Continue Reading Download nucleotide sequence with locus_tag

Finding 16 mer not present in GRCh38

Thanks for the question – it has kept me busy this Sunday morning / afternoon. As implied by others, this poses a computational challenge but is not insurmountable. For motif searching generally, I usually use AWK. My approach here was to: generate all possible k-mers of the chosen size (run…

Continue Reading Finding 16 mer not present in GRCh38

illumina adapter specifying and removing using fastp

Dear all, Recently, I have been asked to do preprocessing of some fastq files produced by Illumina (I don’t know which machine produced data). This is information of a fastq file (forward); @A00957:111:H5MTHDSX2:3:1101:2718:1063 1:N:0:TCCGCGAA+AGGCTATA CTGACCTCAAGTGATCTACCCACCTCGGTCTCCCAAAGTGCTGGGATTACAGGCAGGAGCCACTGCCCCTGGCCCTAATCATAGATCGGAAGAGCACACGTCTGAACTCCAGTCACTCCGCGAAATCTCGTATGCCGGCGTCTGCTTGAAA when I asked adapter sequences from the company, they provided me them as D710-501 TCCGCGAATATAGCCT…

Continue Reading illumina adapter specifying and removing using fastp

illumina adapter specifying and removing using fastp

Dear all, Recently, I have been asked to do preprocessing of some fastq files produced by Illumina (I don’t know which machine produced data). This is information of a fastq file (forward); @A00957:111:H5MTHDSX2:3:1101:2718:1063 1:N:0:TCCGCGAA+AGGCTATA CTGACCTCAAGTGATCTACCCACCTCGGTCTCCCAAAGTGCTGGGATTACAGGCAGGAGCCACTGCCCCTGGCCCTAATCATAGATCGGAAGAGCACACGTCTGAACTCCAGTCACTCCGCGAAATCTCGTATGCCGGCGTCTGCTTGAAA when I asked adapter sequences from the company, they provided me them as D710-501 TCCGCGAATATAGCCT…

Continue Reading illumina adapter specifying and removing using fastp