4. Surname Classification with RNN

In this example, we see surname classification in which character sequences (surnames) are classified to nationality of origin.

Inferring demographic information (like nationality) from publicly observable data has applications from product recommendations to ensuring fair outcomes for users across different demographics. However, demographic and other self­-identifying attributes are collectively called “protected attributes”. We must exercise care in the use of such attributes in modeling and in products. We begin by splitting the characters of each surname and treating them the same way we treated words. Aside from the data difference, character­ level models are mostly similar to word­based models in structure and implementation.

from argparse import Namespace
import os
import json

import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
from tqdm.notebook import tqdm

4.1. Vocabulary, Vectorizer, Dataset

We treat each surname as a sequence of characters. As usual, we implement a dataset class, that returns the vectorized surname as well as the integer representing its nationality. Additionally returned is the length of the sequence, which is used in downstream computations to know where the final vector in the sequence is located. This is a part of the familiar sequence of steps - implementing Dataset, a Vectorizer, and a Vocabulary before the actual training can take place.

class Vocabulary(object):
    """Class to process text and extract vocabulary for mapping"""

    def __init__(self, token_to_idx=None):
            token_to_idx (dict): a pre-existing map of tokens to indices

        if token_to_idx is None:
            token_to_idx = {}
        self._token_to_idx = token_to_idx

        self._idx_to_token = {idx: token 
                              for token, idx in self._token_to_idx.items()}
    def to_serializable(self):
        """ returns a dictionary that can be serialized """
        return {'token_to_idx': self._token_to_idx}

    def from_serializable(cls, contents):
        """ instantiates the Vocabulary from a serialized dictionary """
        return cls(**contents)

    def add_token(self, token):
        """Update mapping dicts based on the token.

            token (str): the item to add into the Vocabulary
            index (int): the integer corresponding to the token
        if token in self._token_to_idx:
            index = self._token_to_idx[token]
            index = len(self._token_to_idx)
            self._token_to_idx[token] = index
            self._idx_to_token[index] = token
        return index
    def add_many(self, tokens):
        """Add a list of tokens into the Vocabulary
            tokens (list): a list of string tokens
            indices (list): a list of indices corresponding to the tokens
        return [self.add_token(token) for token in tokens]

    def lookup_token(self, token):
        """Retrieve the index associated with the token 
            token (str): the token to look up 
            index (int): the index corresponding to the token
        return self._token_to_idx[token]

    def lookup_index(self, index):
        """Return the token associated with the index
            index (int): the index to look up
            token (str): the token corresponding to the index
            KeyError: if the index is not in the Vocabulary
        if index not in self._idx_to_token:
            raise KeyError("the index (%d) is not in the Vocabulary" % index)
        return self._idx_to_token[index]

    def __str__(self):
        return "<Vocabulary(size=%d)>" % len(self)

    def __len__(self):
        return len(self._token_to_idx)

The first stage in the vectorization pipeline is to map each character token in the surname to a unique integer. To accomplish this, we use the SequenceVocabulary data structure, which we first introduced and described in Section 5.2.2 of Document Classification using CNN and GloVe Embeddings. Recall that this data structure not only maps characters in the names to integers, but also utilizes four special­purpose tokens:

  • the UNK token,

  • the MASK token,

  • the BEGIN­OF­SEQUENCE token, and

  • the END­OF­SEQUENCE token.

The first two tokens are vital for language data: the UNK token is used for unseen Out­Of­Vocabulary (OOV) tokens in the input and the MASK token enables handling variable­-length inputs. The latter two tokens provide the model with sentence boundary features and are prepended and appended to the sequence, respectively.

class SequenceVocabulary(Vocabulary):
    def __init__(self, token_to_idx=None, unk_token="<UNK>",
                 mask_token="<MASK>", begin_seq_token="<BEGIN>",

        super(SequenceVocabulary, self).__init__(token_to_idx)

        self._mask_token = mask_token
        self._unk_token = unk_token
        self._begin_seq_token = begin_seq_token
        self._end_seq_token = end_seq_token

        self.mask_index = self.add_token(self._mask_token)
        self.unk_index = self.add_token(self._unk_token)
        self.begin_seq_index = self.add_token(self._begin_seq_token)
        self.end_seq_index = self.add_token(self._end_seq_token)

    def to_serializable(self):
        contents = super(SequenceVocabulary, self).to_serializable()
        contents.update({'unk_token': self._unk_token,
                         'mask_token': self._mask_token,
                         'begin_seq_token': self._begin_seq_token,
                         'end_seq_token': self._end_seq_token})
        return contents

    def lookup_token(self, token):
        """Retrieve the index associated with the token 
          or the UNK index if token isn't present.
            token (str): the token to look up 
            index (int): the index corresponding to the token
            `unk_index` needs to be >=0 (having been added into the Vocabulary) 
              for the UNK functionality 
        if self.unk_index >= 0:
            return self._token_to_idx.get(token, self.unk_index)
            return self._token_to_idx[token]

The overall vectorizer is SurnameVectorizer, which populates the SequenceVocabulary by surname characters, and the normal Vocabulary for nationalities.

class SurnameVectorizer(object):
    """ The Vectorizer which coordinates the Vocabularies and puts them to use"""   
    def __init__(self, char_vocab, nationality_vocab):
            char_vocab (Vocabulary): maps characters to integers
            nationality_vocab (Vocabulary): maps nationalities to integers
        self.char_vocab = char_vocab
        self.nationality_vocab = nationality_vocab

    def vectorize(self, surname, vector_length=-1):
            title (str): the string of characters
            vector_length (int): an argument for forcing the length of index vector
        indices = [self.char_vocab.begin_seq_index]
                       for token in surname)

        if vector_length < 0:
            vector_length = len(indices)

        out_vector = np.zeros(vector_length, dtype=np.int64)         
        out_vector[:len(indices)] = indices
        out_vector[len(indices):] = self.char_vocab.mask_index
        return out_vector, len(indices)

    def from_dataframe(cls, surname_df):
        """Instantiate the vectorizer from the dataset dataframe
            surname_df (pandas.DataFrame): the surnames dataset
            an instance of the SurnameVectorizer
        char_vocab = SequenceVocabulary()
        nationality_vocab = Vocabulary()

        for index, row in surname_df.iterrows():
            for char in row.surname:

        return cls(char_vocab, nationality_vocab)

    def from_serializable(cls, contents):
        char_vocab = SequenceVocabulary.from_serializable(contents['char_vocab'])
        nat_vocab =  Vocabulary.from_serializable(contents['nationality_vocab'])

        return cls(char_vocab=char_vocab, nationality_vocab=nat_vocab)

    def to_serializable(self):
        return {'char_vocab': self.char_vocab.to_serializable(), 
                'nationality_vocab': self.nationality_vocab.to_serializable()}
class SurnameDataset(Dataset):
    def __init__(self, surname_df, vectorizer):
            surname_df (pandas.DataFrame): the dataset
            vectorizer (SurnameVectorizer): vectorizer instatiated from dataset
        self.surname_df = surname_df 
        self._vectorizer = vectorizer

        self._max_seq_length = max(map(len, self.surname_df.surname)) + 2

        self.train_df = self.surname_df[self.surname_df.split=='train']
        self.train_size = len(self.train_df)

        self.val_df = self.surname_df[self.surname_df.split=='val']
        self.validation_size = len(self.val_df)

        self.test_df = self.surname_df[self.surname_df.split=='test']
        self.test_size = len(self.test_df)

        self._lookup_dict = {'train': (self.train_df, self.train_size), 
                             'val': (self.val_df, self.validation_size), 
                             'test': (self.test_df, self.test_size)}

        # Class weights
        class_counts = self.train_df.nationality.value_counts().to_dict()
        def sort_key(item):
            return self._vectorizer.nationality_vocab.lookup_token(item[0])
        sorted_counts = sorted(class_counts.items(), key=sort_key)
        frequencies = [count for _, count in sorted_counts]
        self.class_weights = 1.0 / torch.tensor(frequencies, dtype=torch.float32)

    def load_dataset_and_make_vectorizer(cls, surname_csv):
        """Load dataset and make a new vectorizer from scratch
            surname_csv (str): location of the dataset
            an instance of SurnameDataset
        surname_df = pd.read_csv(surname_csv)
        train_surname_df = surname_df[surname_df.split=='train']
        return cls(surname_df, SurnameVectorizer.from_dataframe(train_surname_df))
    def load_dataset_and_load_vectorizer(cls, surname_csv, vectorizer_filepath):
        """Load dataset and the corresponding vectorizer. 
        Used in the case in the vectorizer has been cached for re-use
            surname_csv (str): location of the dataset
            vectorizer_filepath (str): location of the saved vectorizer
            an instance of SurnameDataset
        surname_df = pd.read_csv(surname_csv)
        vectorizer = cls.load_vectorizer_only(vectorizer_filepath)
        return cls(surname_df, vectorizer)

    def load_vectorizer_only(vectorizer_filepath):
        """a static method for loading the vectorizer from file
            vectorizer_filepath (str): the location of the serialized vectorizer
            an instance of SurnameVectorizer
        with open(vectorizer_filepath) as fp:
            return SurnameVectorizer.from_serializable(json.load(fp))

    def save_vectorizer(self, vectorizer_filepath):
        """saves the vectorizer to disk using json
            vectorizer_filepath (str): the location to save the vectorizer
        with open(vectorizer_filepath, "w") as fp:
            json.dump(self._vectorizer.to_serializable(), fp)

    def get_vectorizer(self):
        """ returns the vectorizer """
        return self._vectorizer

    def set_split(self, split="train"):
        self._target_split = split
        self._target_df, self._target_size = self._lookup_dict[split]

    def __len__(self):
        return self._target_size

    def __getitem__(self, index):
        """the primary entry point method for PyTorch datasets
            index (int): the index to the data point 
            a dictionary holding the data point's:
                features (x_data)
                label (y_target)
                feature length (x_length)
        row = self._target_df.iloc[index]
        surname_vector, vec_length = \
            self._vectorizer.vectorize(row.surname, self._max_seq_length)
        nationality_index = \

        return {'x_data': surname_vector, 
                'y_target': nationality_index, 
                'x_length': vec_length}

    def get_num_batches(self, batch_size):
        """Given a batch size, return the number of batches in the dataset
            batch_size (int)
            number of batches in the dataset
        return len(self) // batch_size


def generate_batches(dataset, batch_size, shuffle=True,
                     drop_last=True, device="cpu"): 
    A generator function which wraps the PyTorch DataLoader. It will 
      ensure each tensor is on the write device location.
    dataloader = DataLoader(dataset=dataset, batch_size=batch_size,
                            shuffle=shuffle, drop_last=drop_last)

    for data_dict in dataloader:
        out_data_dict = {}
        for name, tensor in data_dict.items():
            out_data_dict[name] = data_dict[name].to(device)
        yield out_data_dict

4.2. Model

The SurnameClassifier model is composed of an Embedding layer, the ElmanRNN, and a Linear layer. We assume that the input to the model is tokens represented as a set of integers after they have been mapped to integers by the SequenceVocabulary. The model first embeds the integers using the embedding layer. Then, using the RNN, sequence representation vectors are computed. These vectors represent the hidden state for each character in the surname. Because the goal is to classify each surname,** the vector corresponding to the final character position in each surname is extracted**. One way to think about this vector is that the final vector is a result of passing over the entire sequence input, and hence it’s a summary vector for the surname. These summary vectors are passed through the Linear layer to compute a prediction vector. The prediction vector is used in the training loss, or we can apply the softmax function to create a probability distribution over surnames.

The arguments to the model are

  • the size of the embeddings - hyperparameter,

  • the number of embeddings (i.e., vocabulary size) - determined by the data,

  • the number of classes - determined by the data, and

  • the hidden state size of the RNN - hyperparameter.


Although the two hyperparameters can take on any value, it is usually good to start with something small that ensures fast training, as a way verifying that the model actually works.

4.2.1. Retrieving the last vector of each sequence

You will notice that the forward() function in the model requires the lengths of the sequences. The lengths are used to retrieve the final vector of each sequence in the tensor that is returned from the RNN with a function named column_gather(), shown below. The function iterates over batch row indices and retrieves the vector that’s at the position indicated by the corresponding length of sequence.

def column_gather(y_out, x_lengths):
    '''Get a specific vector from each batch datapoint in `y_out`.

    More precisely, iterate over batch row indices, get the vector that's at
    the position indicated by the corresponding value in `x_lengths` at the row

        y_out (torch.FloatTensor, torch.cuda.FloatTensor)
            shape: (batch, sequence, feature)
        x_lengths (torch.LongTensor, torch.cuda.LongTensor)
            shape: (batch,)

        y_out (torch.FloatTensor, torch.cuda.FloatTensor)
            shape: (batch, feature)
    x_lengths = x_lengths.long().detach().cpu().numpy() - 1

    out = []
    for batch_index, column_index in enumerate(x_lengths):
        out.append(y_out[batch_index, column_index])

    return torch.stack(out)

4.2.2. The ElmanRNN model

class ElmanRNN(nn.Module):
    """ an Elman RNN built using the RNNCell """
    def __init__(self, input_size, hidden_size, batch_first=False):
            input_size (int): size of the input vectors
            hidden_size (int): size of the hidden state vectors
            bathc_first (bool): whether the 0th dimension is batch
        super(ElmanRNN, self).__init__()
        self.rnn_cell = nn.RNNCell(input_size, hidden_size)
        self.batch_first = batch_first
        self.hidden_size = hidden_size

    def _initial_hidden(self, batch_size):
        return torch.zeros((batch_size, self.hidden_size))

    def forward(self, x_in, initial_hidden=None):
        """The forward pass of the ElmanRNN
            x_in (torch.Tensor): an input data tensor. 
                If self.batch_first: x_in.shape = (batch, seq_size, feat_size)
                Else: x_in.shape = (seq_size, batch, feat_size)
            initial_hidden (torch.Tensor): the initial hidden state for the RNN
            hiddens (torch.Tensor): The outputs of the RNN at each time step. 
                If self.batch_first: hiddens.shape = (batch, seq_size, hidden_size)
                Else: hiddens.shape = (seq_size, batch, hidden_size)
        if self.batch_first:
            batch_size, seq_size, feat_size = x_in.size()
            x_in = x_in.permute(1, 0, 2)
            seq_size, batch_size, feat_size = x_in.size()
        hiddens = []

        if initial_hidden is None:
            initial_hidden = self._initial_hidden(batch_size)
            initial_hidden = initial_hidden.to(x_in.device)

        hidden_t = initial_hidden
        for t in range(seq_size):
            hidden_t = self.rnn_cell(x_in[t], hidden_t)
        hiddens = torch.stack(hiddens)

        if self.batch_first:
            hiddens = hiddens.permute(1, 0, 2)

        return hiddens

class SurnameClassifier(nn.Module):
    """ A Classifier with an RNN to extract features and an MLP to classify """
    def __init__(self, embedding_size, num_embeddings, num_classes,
                 rnn_hidden_size, batch_first=True, padding_idx=0):
            embedding_size (int): The size of the character embeddings
            num_embeddings (int): The number of characters to embed
            num_classes (int): The size of the prediction vector 
                Note: the number of nationalities
            rnn_hidden_size (int): The size of the RNN's hidden state
            batch_first (bool): Informs whether the input tensors will 
                have batch or the sequence on the 0th dimension
            padding_idx (int): The index for the tensor padding; 
                see torch.nn.Embedding
        super(SurnameClassifier, self).__init__()

        self.emb = nn.Embedding(num_embeddings=num_embeddings,
        self.rnn = ElmanRNN(input_size=embedding_size,
        self.fc1 = nn.Linear(in_features=rnn_hidden_size,
        self.fc2 = nn.Linear(in_features=rnn_hidden_size,

    def forward(self, x_in, x_lengths=None, apply_softmax=False):
        """The forward pass of the classifier
            x_in (torch.Tensor): an input data tensor. 
                x_in.shape should be (batch, input_dim)
            x_lengths (torch.Tensor): the lengths of each sequence in the batch.
                They are used to find the final vector of each sequence
            apply_softmax (bool): a flag for the softmax activation
                should be false if used with the Cross Entropy losses
            the resulting tensor. tensor.shape should be (batch, output_dim)
        x_embedded = self.emb(x_in)
        y_out = self.rnn(x_embedded)

        if x_lengths is not None:
            y_out = column_gather(y_out, x_lengths)
            y_out = y_out[:, -1, :]

        y_out = F.relu(self.fc1(F.dropout(y_out, 0.5)))
        y_out = self.fc2(F.dropout(y_out, 0.5))

        if apply_softmax:
            y_out = F.softmax(y_out, dim=1)

        return y_out
def set_seed_everywhere(seed, cuda):
    if cuda:

def handle_dirs(dirpath):
    if not os.path.exists(dirpath):

4.3. Settings

args = Namespace(
    # Data and path information
    # Model hyper parameter
    # Training hyper parameter
    # Runtime hyper parameter

# Check CUDA
if not torch.cuda.is_available():
    args.cuda = False

args.device = torch.device("cuda" if args.cuda else "cpu")
print("Using CUDA: {}".format(args.cuda))

if args.expand_filepaths_to_save_dir:
    args.vectorizer_file = os.path.join(args.save_dir,

    args.model_state_file = os.path.join(args.save_dir,
# Set seed for reproducibility
set_seed_everywhere(args.seed, args.cuda)

# handle dirs
Using CUDA: True
if args.reload_from_files and os.path.exists(args.vectorizer_file):
    # training from a checkpoint
    dataset = SurnameDataset.load_dataset_and_load_vectorizer(args.surname_csv, 
    # create dataset and vectorizer
    dataset = SurnameDataset.load_dataset_and_make_vectorizer(args.surname_csv)

vectorizer = dataset.get_vectorizer()

classifier = SurnameClassifier(embedding_size=args.char_embedding_size, 

4.4. Training Routine

The training routine follows the standard formula. For a single batch of data, apply the model and compute the prediction vectors. Use the CrossEntropyLoss() loss function and the ground truth to compute a loss value. Using the loss value and an optimizer, compute the gradients and update the weights of the model using those gradients. Repeat this for each batch in the training data. Proceed similarly with the validation data, but set the model in eval mode so as to prevent backpropagating. Instead, the validation data is used only to give a less­biased sense of how the model is performing. Repeat this routine for a specific number of epochs or a stopping condition (e.g. loss is less than a threshold, or loss stoped changing in the most recent two epochs) is met.

def make_train_state(args):
    return {'stop_early': False,
            'early_stopping_step': 0,
            'early_stopping_best_val': 1e8,
            'learning_rate': args.learning_rate,
            'epoch_index': 0,
            'train_loss': [],
            'train_acc': [],
            'val_loss': [],
            'val_acc': [],
            'test_loss': -1,
            'test_acc': -1,
            'model_filename': args.model_state_file}

def update_train_state(args, model, train_state):
    """Handle the training state updates.

     - Early Stopping: Prevent overfitting.
     - Model Checkpoint: Model is saved if the model is better
    :param args: main arguments
    :param model: model to train
    :param train_state: a dictionary representing the training state values
        a new train_state

    # Save one model at least
    if train_state['epoch_index'] == 0:
        torch.save(model.state_dict(), train_state['model_filename'])
        train_state['stop_early'] = False

    # Save model if performance improved
    elif train_state['epoch_index'] >= 1:
        loss_tm1, loss_t = train_state['val_loss'][-2:]
        # If loss worsened
        if loss_t >= loss_tm1:
            # Update step
            train_state['early_stopping_step'] += 1
        # Loss decreased
            # Save the best model
            if loss_t < train_state['early_stopping_best_val']:
                torch.save(model.state_dict(), train_state['model_filename'])
                train_state['early_stopping_best_val'] = loss_t

            # Reset early stopping step
            train_state['early_stopping_step'] = 0

        # Stop early ?
        train_state['stop_early'] = \
            train_state['early_stopping_step'] >= args.early_stopping_criteria

    return train_state

def compute_accuracy(y_pred, y_target):
    _, y_pred_indices = y_pred.max(dim=1)
    n_correct = torch.eq(y_pred_indices, y_target).sum().item()
    return n_correct / len(y_pred_indices) * 100
classifier = classifier.to(args.device)
dataset.class_weights = dataset.class_weights.to(args.device)
loss_func = nn.CrossEntropyLoss(dataset.class_weights)
optimizer = optim.Adam(classifier.parameters(), lr=args.learning_rate)
scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer=optimizer,
                                           mode='min', factor=0.5,

train_state = make_train_state(args)

epoch_bar = tqdm(desc='training routine', 

train_bar = tqdm(desc='split=train',
val_bar = tqdm(desc='split=val',

    for epoch_index in range(args.num_epochs):
        train_state['epoch_index'] = epoch_index

        # Iterate over training dataset

        # setup: batch generator, set loss and acc to 0, set train mode on
        batch_generator = generate_batches(dataset, 
        running_loss = 0.0
        running_acc = 0.0

        for batch_index, batch_dict in enumerate(batch_generator):
            # the training routine is these 5 steps:

            # --------------------------------------    
            # step 1. zero the gradients

            # step 2. compute the output
            y_pred = classifier(x_in=batch_dict['x_data'], 

            # step 3. compute the loss
            loss = loss_func(y_pred, batch_dict['y_target'])
            running_loss += (loss.item() - running_loss) / (batch_index + 1)

            # step 4. use loss to produce gradients

            # step 5. use optimizer to take gradient step
            # -----------------------------------------
            # compute the accuracy
            acc_t = compute_accuracy(y_pred, batch_dict['y_target'])
            running_acc += (acc_t - running_acc) / (batch_index + 1)

            # update bar
            train_bar.set_postfix(loss=running_loss, acc=running_acc, epoch=epoch_index)


        # Iterate over val dataset

        # setup: batch generator, set loss and acc to 0; set eval mode on

        batch_generator = generate_batches(dataset, 
        running_loss = 0.
        running_acc = 0.

        for batch_index, batch_dict in enumerate(batch_generator):
            # compute the output
            y_pred = classifier(x_in=batch_dict['x_data'], 

            # step 3. compute the loss
            loss = loss_func(y_pred, batch_dict['y_target'])
            running_loss += (loss.item() - running_loss) / (batch_index + 1)

            # compute the accuracy
            acc_t = compute_accuracy(y_pred, batch_dict['y_target'])
            running_acc += (acc_t - running_acc) / (batch_index + 1)
            val_bar.set_postfix(loss=running_loss, acc=running_acc, epoch=epoch_index)


        train_state = update_train_state(args=args, model=classifier, 


        train_bar.n = 0
        val_bar.n = 0

        if train_state['stop_early']:
except KeyboardInterrupt:
    print("Exiting loop")
# compute the loss & accuracy on the test set using the best available model


classifier = classifier.to(args.device)
dataset.class_weights = dataset.class_weights.to(args.device)
loss_func = nn.CrossEntropyLoss(dataset.class_weights)

batch_generator = generate_batches(dataset, 
running_loss = 0.
running_acc = 0.

for batch_index, batch_dict in enumerate(batch_generator):
    # compute the output
    y_pred =  classifier(batch_dict['x_data'],
    # compute the loss
    loss = loss_func(y_pred, batch_dict['y_target'])
    loss_t = loss.item()
    running_loss += (loss_t - running_loss) / (batch_index + 1)

    # compute the accuracy
    acc_t = compute_accuracy(y_pred, batch_dict['y_target'])
    running_acc += (acc_t - running_acc) / (batch_index + 1)

train_state['test_loss'] = running_loss
train_state['test_acc'] = running_acc
print("Test loss: {};".format(train_state['test_loss']))
print("Test Accuracy: {}".format(train_state['test_acc']))
Test loss: 1.8524044704437257;
Test Accuracy: 41.06249999999999

4.4.1. Inference

def predict_nationality(surname, classifier, vectorizer):
    vectorized_surname, vec_length = vectorizer.vectorize(surname)
    vectorized_surname = torch.tensor(vectorized_surname).unsqueeze(dim=0)
    vec_length = torch.tensor([vec_length], dtype=torch.int64)
    result = classifier(vectorized_surname, vec_length, apply_softmax=True)
    probability_values, indices = result.max(dim=1)
    index = indices.item()
    prob_value = probability_values.item()

    predicted_nationality = vectorizer.nationality_vocab.lookup_index(index)

    return {'nationality': predicted_nationality, 'probability': prob_value, 'surname': surname}
# surname = input("Enter a surname: ")
classifier = classifier.to("cpu")
for surname in ['McMahan', 'Nakamoto', 'Wan', 'Cho']:
    print(predict_nationality(surname, classifier, vectorizer))
{'nationality': 'Irish', 'probability': 0.4722009301185608, 'surname': 'McMahan'}
{'nationality': 'Japanese', 'probability': 0.7476573586463928, 'surname': 'Nakamoto'}
{'nationality': 'Chinese', 'probability': 0.6742308139801025, 'surname': 'Wan'}
{'nationality': 'Korean', 'probability': 0.3817710876464844, 'surname': 'Cho'}

Your Turn

  • Play with the hyperparameters to get a sense of what affects performance and by how much, and to tabulate the results.

  • The model implemented in here is general and not restricted to characters. The embedding layer in the model can map any discrete item in a sequence of discrete items. Consider reuse the code here for sentence or document classification.