Machine Learning and Text Classification BBC Texts

In this Google Colab, I set up a shallow neural net and train it on a supervised data set of BBC News Texts with category tags. For the full description of this project, see my post Supervised Machine Learning & BBC Text Classification on my portfolio website.

Libraries: Pandas, Numpy, PyTorch, Sklearn, Plotly
Data: www.kaggle.com/datasets/hgultekin/bbcnewsarchive

The data set has the following distribution:

The neural net is set up like this:

class ClassificationNet(nn.Module):

    def __init__(self):
        super(ClassificationNet, self).__init__()
        self.layers = nn.Sequential(
        nn.Linear(23699, 512), #feature number, first layer size
        nn.Hardtanh(),
        nn.Linear(512,5) #we have 5 categories!
        )

    def forward(self, x): #forward pass, input shape(data points, feature number)
        sm = nn.Softmax(dim=1) #activation function
        x = sm(self.layers(x))
        return x

net = ClassificationNet()

# learning rate
lrt=0.01
# optimizer
optimizer = optim.Adam(net.parameters(), lr=lrt)
# loss/criterion
criterion = nn.CrossEntropyLoss()
# epochs / number of training iterations
epochs = 50

After training, we get the following graph for our training and validation loss:

Our evaluation on the test set produces the following confusion matrix:

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
images		images
BBC_TextClassification.ipynb		BBC_TextClassification.ipynb
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning and Text Classification BBC Texts

About

Releases

Packages

Languages

License

ycvogt/machine_learning

Folders and files

Latest commit

History

Repository files navigation

Machine Learning and Text Classification BBC Texts

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages