Unicode::Categories

Returns a list which General Categories a Unicode string belongs to.

Unicode version: 17.0.0 (September 2025)

Gemfile

gem "unicode-categories"

Usage

require "unicode/categories"

# All general categories of a string
Unicode::Categories.categories("A 2") # => ["Lu", "Nd", "Zs"]
Unicode::Categories.categories("A 2", format: :long)
# => ["Decimal_Number", "Space_Separator", "Uppercase_Letter"]

# Also aliased as .of
Unicode::Categories.of("\u{10c50}") # => ["Cn"]

# Single character
Unicode::Categories.category("☼", format: :long) # => "Other_Symbol"

The list of categories is always sorted alphabetically.

Hints

Regex Matching

If you have a string and want to match a substring/character from a specific Unicode block, you actually won't need this gem. Instead, you can use the Regexp Unicode Property Syntax \p{}:

"Find decimal numbers (like 2 or 3) within a string".scan(/\p{Nd}+/) # => ["2", "3"]

See Idiosyncratic Ruby: Proper Unicoding for more info.

List of General Categories

You can retrieve a list of all General Categories like this:

require "unicode/categories"
puts \
  "Short | Long\n" +
  "------|-----\n" +
  Unicode::Categories.names(format: :table).to_a.map{ |r| "   %s | %s" % r }.join("\n")

Short	Long
Cc	Control
Cf	Format
Cn	Unassigned
Co	Private_Use
Cs	Surrogate
LC	Cased_Letter
Ll	Lowercase_Letter
Lm	Modifier_Letter
Lo	Other_Letter
Lt	Titlecase_Letter
Lu	Uppercase_Letter
Mc	Spacing_Mark
Me	Enclosing_Mark
Mn	Nonspacing_Mark
Nd	Decimal_Number
Nl	Letter_Number
No	Other_Number
Pc	Connector_Punctuation
Pd	Dash_Punctuation
Pe	Close_Punctuation
Pf	Final_Punctuation
Pi	Initial_Punctuation
Po	Other_Punctuation
Ps	Open_Punctuation
Sc	Currency_Symbol
Sk	Modifier_Symbol
Sm	Math_Symbol
So	Other_Symbol
Zl	Line_Separator
Zp	Paragraph_Separator
Zs	Space_Separator

See unicode-x for more Unicode related micro libraries.

MIT License

Copyright (C) 2016-2025 Jan Lelis https://janlelis.com. Released under the MIT license.
Unicode data: https://www.unicode.org/copyright.html#Exhibit1

Name		Name	Last commit message	Last commit date
Latest commit History 97 Commits
.github/workflows		.github/workflows
data		data
lib/unicode		lib/unicode
spec		spec
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
Gemfile		Gemfile
MIT-LICENSE.txt		MIT-LICENSE.txt
README.md		README.md
Rakefile		Rakefile
unicode-categories.gemspec		unicode-categories.gemspec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Unicode::Categories

Gemfile

Usage

Hints

Regex Matching

List of General Categories

MIT License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

janlelis/unicode-categories

Folders and files

Latest commit

History

Repository files navigation

Unicode::Categories

Gemfile

Usage

Hints

Regex Matching

List of General Categories

MIT License

About

Topics

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages