Skip to content

StarOne01/bfloat16

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A lightweight C++ implementation of the Brain Floating Point (bfloat16) format.

Overview

bfloat16 is a 16-bit floating point format developed by Google Brain for use in machine learning applications. It preserves the dynamic range of 32-bit floating point (using the same 8-bit exponent) while reducing precision by storing only 7 bits of mantissa (compared to 23 bits in float32).

Features

  • Header-only C++ implementation
  • Conversions between bfloat16 and float32
  • Basic arithmetic operations
  • Standard mathematical functions
  • IEEE 754 compatibility

Implementation Details

This implementation represents bfloat16 as:

  • 1 sign bit
  • 8 exponent bits (same as float32)
  • 7 mantissa bits

The format provides a balance between range and precision that is particularly suitable for neural network training and inference.

Building and Installation

This is a header-only library. Simply include the header files in your project.

License

This project is licensed under the MIT License - see the LICENSE file for details.

References

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

About

A lightweight C++ implementation of the Brain Floating Point (bfloat16) format.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published