When creating a neural network prediction model, you have to set values for the architecture (number hidden layers, number hidden nodes in each layer, hidden activation, etc.) and training (optimizer, batch size, etc.) In some scenarios you can manually experiment with these hyperparameter values. In other scenarios, you can set up lists of possible values and then use random search or grid search.
A more sophisticated approach is to use evolutionary optimization to find a good set of architecture and training values. This is a project I’ve been looking at recently. As part of my experiments, I put together a demo that parameterizes a network and training values and then computes a fitness value. The idea is best explained by code.
Suppose you want to predict the political leaning of a person (conservative = 0, moderate = 1, liberal = 2) from their sex (male = -1, female = +1), age (divided by 100), State (Michigan = 100, Nebraska = 010, Oklahoma = 001), and income (divided by $100,000). Now, consider this code:
# first, create train_ds and test_ds print("Setting 6-(10-10)-3 tanh 10 0.01 1000 SGD") f = fitness(n_hid=10, activ='tanh', trn_ds=train_ds, tst_ds=test_ds, bs=10, lr=0.01, me=1000, opt="sgd") print("Fitness = %0.4f " % f)
The fitness function creates a 6-(10-10)-3 neural network classifier with tanh() hidden node activation, and trains it using a batch size of 10, stochastic gradient descent with a learning rate of 0.01, and 1000 epochs. The return value is a measure of how good the network is, often called a fitness value in evolutionary optimization terminology.
The fitness() function is very short and simple because the function farms out most of the work to program-defined train() and accuracy() functions:
def fitness(n_hid=10, activ='tanh', trn_ds=None, tst_ds=None, bs=10, lr=0.01, me=1000, opt="sgd"): T.manual_seed(1) # prepare np.random.seed(1) net = Net(n_hid, activ).to(device) # create net.train() train(net, trn_ds, bs, lr, me, opt) # train net.eval() acc_train = accuracy_quick(net, trn_ds) # evaluate acc_test = accuracy_quick(net, tst_ds) return (acc_train + acc_test) / 2
I decided to define fitness as the average of the accuracy of the trained network on the training and test data. This is something I need to give more thought to.
I don’t believe it’s feasible to create a general purpose framework for parameterization — each problem is significantly different. The real decisions are what to parameterize and what to hard-code. For example, my demo hard-codes the architecture with a fixed two hidden layers rather than a variable number of layers.
The parameterization is just the first part of an evolutionary optimization system. My next steps will be to add functions to generate random solutions, select two parent solutions, combine two parents to produce a child solution, and mutate child solutions.
Fascinating stuff (to me anyway).
Evolution has produced some strange animals. Left: Tullimonstrum, informally known as the Tully monster, is an extinct invertebrate that lived about 300 million years ago. It was about 14 inches long and had two primitive eye stalks. Right: Opabinia is an extinct arthropod that lived about 500 million years ago. It was about three inches long and had five eyes. Images like these in my head are one of several reasons why I don’t eat calamari.
Demo code below. The training and test data can be found at jamesmccaffrey.wordpress.com/2022/09/01/multi-class-classification-using-pytorch-1-12-1-on-windows-10-11/.
# people_politics_encoded.py # predict politics type from sex, age, state, income # PyTorch 2.0.1-CPU Anaconda3-2022.10 Python 3.9.13 # Windows 10/11 # experiemnt for hyperparameter evolutionary optimization import numpy as np import torch as T device = T.device('cpu') # apply to Tensor or Module # ----------------------------------------------------------- class PeopleDataset(T.utils.data.Dataset): # sex age state income politics # -1 0.27 0 1 0 0.7610 2 # +1 0.19 0 0 1 0.6550 0 # sex: -1 = male, +1 = female # state: michigan, nebraska, oklahoma # politics: conservative, moderate, liberal def __init__(self, src_file): all_xy = np.loadtxt(src_file, usecols=range(0,7), delimiter="\t", comments="#", dtype=np.float32) tmp_x = all_xy[:,0:6] # cols [0,6) = [0,5] tmp_y = all_xy[:,6] # 1-D self.x_data = T.tensor(tmp_x, dtype=T.float32).to(device) self.y_data = T.tensor(tmp_y, dtype=T.int64).to(device) # 1-D def __len__(self): return len(self.x_data) def __getitem__(self, idx): preds = self.x_data[idx] trgts = self.y_data[idx] return preds, trgts # as a Tuple # ----------------------------------------------------------- class Net(T.nn.Module): def __init__(self, n_hid, activ='tanh'): super(Net, self).__init__() self.hid1 = T.nn.Linear(6, n_hid) # 6-(nh-nh)-3 self.hid2 = T.nn.Linear(n_hid, n_hid) self.oupt = T.nn.Linear(n_hid, 3) if activ == 'tanh': self.activ = T.nn.Tanh() elif activ == 'relu': self.activ = T.nn.ReLU() # use default weight init def forward(self, x): z = self.activ(self.hid1(x)) z = self.activ(self.hid2(z)) z = T.log_softmax(self.oupt(z), dim=1) # NLLLoss() return z # ----------------------------------------------------------- def accuracy_quick(model, dataset): # assumes model.eval() X = dataset[0:len(dataset)][0] Y = dataset[0:len(dataset)][1] with T.no_grad(): oupt = model(X) # [40,3] logits arg_maxs = T.argmax(oupt, dim=1) # argmax() is new num_correct = T.sum(Y==arg_maxs) acc = (num_correct * 1.0 / len(dataset)) return acc.item() # ----------------------------------------------------------- def train(net, ds, bs, lr, me, opt="sgd"): # dataset, bat_size, lrn_rate, max_epochs, optimizer train_ldr = T.utils.data.DataLoader(ds, batch_size=bs, shuffle=True) loss_func = T.nn.NLLLoss() if opt == 'sgd': optimizer = T.optim.SGD(net.parameters(), lr=lr) elif opt == 'adam': optimizer = T.optim.Adam(net.parameters(), lr=lr) print("\nStarting training ") le = me // 5 # log interval: 5 log prints for epoch in range(0, me): epoch_loss = 0.0 # for one full epoch for (batch_idx, batch) in enumerate(train_ldr): X = batch[0] # inputs Y = batch[1] # correct class/label/politics optimizer.zero_grad() oupt = net(X) loss_val = loss_func(oupt, Y) # a tensor epoch_loss += loss_val.item() # accumulate loss_val.backward() optimizer.step() if epoch % le == 0: print("epoch = %5d | loss = %10.4f" % \ (epoch, epoch_loss)) print("Done ") # ----------------------------------------------------------- def fitness(n_hid=10, activ='tanh', trn_ds=None, tst_ds=None, bs=10, lr=0.01, me=1000, opt="sgd"): T.manual_seed(1) # prepare np.random.seed(1) net = Net(n_hid, activ).to(device) # create net.train() train(net, trn_ds, bs, lr, me, opt) # train net.eval() acc_train = accuracy_quick(net, trn_ds) # evaluate acc_test = accuracy_quick(net, tst_ds) return (acc_train + acc_test) / 2 # ----------------------------------------------------------- def main(): # 0. get started print("\nBegin People predict politics type ") # 1. create DataLoader objects print("\nCreating People Datasets ") train_file = ".\\Data\\people_train.txt" train_ds = PeopleDataset(train_file) # 200 rows test_file = ".\\Data\\people_test.txt" test_ds = PeopleDataset(test_file) # 40 rows # 2. compute fitness for architecture and train parameters print("\nSetting 6-(10-10)-3 tanh 10 0.01 1000 SGD") f = fitness(n_hid=10, activ='tanh', trn_ds=train_ds, tst_ds=test_ds, bs=10, lr=0.01, me=1000, opt="sgd") print("\nFitness = %0.4f " % f) print("\nSetting 6-(8-8)-3 relu 10 0.01 1000 Adam") f = fitness(n_hid=8, activ='relu', trn_ds=train_ds, tst_ds=test_ds, bs=10, lr=0.01, me=1000, opt="adam") print("\nFitness = %0.4f " % f) # 3. TODO: verify trained model is valid # 4. TODO: save trained model print("\nEnd People predict politics encoding demo") if __name__ == "__main__": main()
Read more here: Source link