Investigating Intrinsic Intentionality in Neural Networks

This project investigates whether neural networks develop inherent, discoverable meaning in their weight structures - what philosophers might call "intrinsic intentionality". Specifically, I examine whether the role of output neurons (which digit they classify) can be determined solely from their weight patterns, without requiring knowledge of that specific network's training history.

Research Question

Can we determine what a neuron "means" (which digit it classifies) just by looking at its weights and their relationships to other neurons? If so, this suggests that neural networks develop consistent, interpretable internal structures that transcend individual training runs.

Methodology

Training Base Networks
- Multiple neural networks are trained on MNIST classification
- Networks vary in architecture (hidden dimensions: 25-100 neurons)
- Each network develops its own unique weights through training
Decoding Experiment
- Extract output layer weights from trained networks
- Shuffle the neurons (rows) randomly
- Use a Set Transformer to predict which neuron corresponds to which digit
- Crucially: The decoder generalizes across different random seeds of base networks

Project Structure

underlying/: Base neural networks trained on MNIST
decoder/: Set Transformer for weight pattern analysis

Results

The experiments demonstrate that neural networks do develop discoverable patterns in their output layer weights that indicate their functional role. I tested this hypothesis across three training paradigms:

Untrained Networks (Control)
- Random weights, no training
- Decoder accuracy: ~10% (chance level)
- Confirms decoder isn't exploiting spurious patterns
Standard Training
- Normal backpropagation
- Decoder accuracy: ~25%
- Shows emergence of meaningful weight patterns
Training with Dropout
- Backpropagation with dropout
- Decoder accuracy: ~75%
- Significant improvement in decodability

Key Findings

Output neurons develop consistent, decodable patterns in their connectivity
Dropout dramatically improves decodability (25% → 75%)

Interpretation

The superior performance with dropout likely occurs because:

Dropout encourages neurons to rely on population activity rather than single-neuron pathways
Output neurons representing similar digits share more features in their input weights
This creates more consistent and detectable patterns in the relational weight structure

These results suggest that even simple MNIST-predicting ANNs structurally encode representations, making them decodable without knowing anything about how the network was linked to the 'outside world' (i.e. input pixels and output classes). Please consider reading my article for more details and to learn about possible implications in philosophy of mind.

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
data		data
decoder		decoder
images		images
underlying		underlying
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
decoding_exploration.ipynb		decoding_exploration.ipynb
report.pdf		report.pdf
requirements.txt		requirements.txt
todo.txt		todo.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Investigating Intrinsic Intentionality in Neural Networks

Research Question

Methodology

Project Structure

Results

Key Findings

Interpretation

About

Releases

Packages

Contributors 2

Languages

entropicbloom/intentionality

Folders and files

Latest commit

History

Repository files navigation

Investigating Intrinsic Intentionality in Neural Networks

Research Question

Methodology

Project Structure

Results

Key Findings

Interpretation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages