You can use pipe:
tmp_df = df.
drop("Gene type", axis=1).
rename(columns = {
"Gene stable ID": "ENSG",
"Gene name": "gene_name",
"miRBase accession": "MI",
"miRBase ID": "mirna_name"
}).
pipe(lambda x: x.assign(species = x.mirna_name.str[:3]))
tmp_df
Out[365]:
ENSG gene_name MI mirna_name species
0 ENSG00000274494 MIR6832 MI0022677 hsa-mir-6832 hsa
1 ENSG00000283386 MIR4659B MI0017291 hsa-mir-4659b hsa
2 ENSG00000221456 MIR1202 MI0006334 hsa-mir-1202 hsa
3 ENSG00000199102 MIR302C MI0000773 hsa-mir-302c hsa
As @Tom pointed out, this can also be done without using pipe in this case:
df.
drop("Gene type", axis=1).
rename(columns = {
"Gene stable ID": "ENSG",
"Gene name": "gene_name",
"miRBase accession": "MI",
"miRBase ID": "mirna_name"
}).
assign(species = lambda x: x.mirna_name.str[:3])
result = df.drop("Gene type", axis=1).
rename(columns = {
"Gene stable ID": "ENSG",
"Gene name": "gene_name",
"miRBase accession": "MI",
"miRBase ID": "mirna_name"
}).assign(species = df['miRBase ID'].str[:3])
You can reference the renamed column as df[column_name].
I found pandas-ply which introduces a magic symbol X
for that purpose:
import pandas as pd
from pandas_ply import X, install_ply
install_ply(pd)
df
.drop("Gene type", axis=1)
.rename(columns = {
"Gene stable ID": "ENSG",
"Gene name": "gene_name",
"miRBase accession": "MI",
"miRBase ID": "mirna_name"
})
.ply_select("*", species = X.mirna_name.str[:3])
would be nice to have this in native pandas, though.
Read more here: Source link