Learning the rules of the Scrabble game behind lymphocyte diversity

Reporter, terrier, retire, retro, port, trio, toe.

What do these seemingly random words have in common? They can all be formed from bits and pieces of the word repertoire (hats off to any Scrabble fans in the audience). This concept—of adding, deleting, and rearranging the letters of a fixed character set to form a diverse collection of words—is readily apparent in our everyday use of language, but also turns out to be fundamental to the way our immune systems work. You see, in order for our B and T cells to recognize pathogens, they need B- and T-cell receptors which are specific to the dazzling variety of pathogens which we may encounter over our lifetimes, some of which our body has never encountered. How do they manage to create such a diverse repertoire of receptors? The same way that I created the words above, except instead of using letters of a word, they randomly shuffle around three types of genes (termed V-, D-, and J-genes) to form receptors in an intricate molecular ballet known as V(D)J recombination.

Each unique B- and T-cell receptor (BCR or TCR) has a variable binding region consisting of a single V-, D-, and J-gene chosen from a collection (or ‘word bank’, in keeping with the Scrabble theme) present in the human genome. Receptor diversity results from a combination of shuffling, addition, and deletion of nucleotides comprising these V/D/J genes—in particular, deletion of small DNA sequences at the ends of these genes is catalyzed by a protein called Artemis. While scientists have a decent grasp on the mechanism behind V/D/J shuffling, the mechanism of nucleotide deletion by Artemis is much less well understood, despite being crucial for generating a diverse immune repertoire. Magdalena Russell, a graduate student in the Matsen Group at Fred Hutch’s Public Health Sciences Division, is out to change that. Her recent publication in eLife, undertaken with support from Dr. Noah Simon of the UW School of Public Health and Dr. Phil Bradley of the Fred Hutch Public Health Sciences Division, takes a crack at Artemis’ mechanism of action using a statistical approach.

“Artemis first caught our eye as a result of a previous genome-wide association study (GWAS) which we undertook to identify genetic variants in individuals with affected TCR repertoires,” Russell noted. “It was previously known that Artemis was important for nucleotide trimming in this context, but as it turns out, the exact mechanism by which Artemis does its cutting is still not really understood.” Russell—a proud computational biologist—set out to fill this knowledge gap using statistical inference. Fancy mathematics aside, the concept which Russell and colleagues employed is relatively straightforward: using a previously-generated dataset comprising TCR sequences from nearly 700 individuals, the team created a probabilistic model whose ultimate goal is to predict the trimming probability of an inputted sequence. Russell then trains the model on a subset of the sequence data, instructing it to pay attention to certain interpretable features of the sequences (including DNA shape, GC-content, or sequence length). By changing the features and examining which ones affect (or don’t affect) the model’s prediction accuracy, Russell is able to identify relevant sequence features which provide clues towards Artemis’ function in vivo.
 

Read more here: Source link