assign in pandas pipeline – Stackify

You can use pipe:

tmp_df = df.
         drop("Gene type", axis=1).
         rename(columns = {
            "Gene stable ID": "ENSG",
            "Gene name": "gene_name",
            "miRBase accession": "MI",
            "miRBase ID": "mirna_name"
         }).
         pipe(lambda x: x.assign(species = x.mirna_name.str[:3]))

tmp_df
Out[365]: 
              ENSG gene_name         MI     mirna_name species
0  ENSG00000274494   MIR6832  MI0022677   hsa-mir-6832     hsa
1  ENSG00000283386  MIR4659B  MI0017291  hsa-mir-4659b     hsa
2  ENSG00000221456   MIR1202  MI0006334   hsa-mir-1202     hsa
3  ENSG00000199102   MIR302C  MI0000773   hsa-mir-302c     hsa

As @Tom pointed out, this can also be done without using pipe in this case:

df.
         drop("Gene type", axis=1).
         rename(columns = {
            "Gene stable ID": "ENSG",
            "Gene name": "gene_name",
            "miRBase accession": "MI",
            "miRBase ID": "mirna_name"
         }).
         assign(species = lambda x: x.mirna_name.str[:3])

result = df.drop("Gene type", axis=1).
     rename(columns = {
        "Gene stable ID": "ENSG",
        "Gene name": "gene_name",
        "miRBase accession": "MI",
        "miRBase ID": "mirna_name"
     }).assign(species = df['miRBase ID'].str[:3])

You can reference the renamed column as df[column_name].

I found pandas-ply which introduces a magic symbol X for that purpose:

import pandas as pd 
from pandas_ply import X, install_ply
install_ply(pd)

df
     .drop("Gene type", axis=1)
     .rename(columns = {
        "Gene stable ID": "ENSG",
        "Gene name": "gene_name",
        "miRBase accession": "MI",
        "miRBase ID": "mirna_name"
     })
     .ply_select("*", species = X.mirna_name.str[:3])

would be nice to have this in native pandas, though.

Read more here: Source link