cari

A Rust implementation of the popular HTML parsing utility pup.

Description

cari is a command-line utility for parsing and extracting data from HTML documents using CSS selectors. It serves as a drop-in replacement for pup, built on the actively maintained scraper/html5ever stack for modern CSS selection and DOM handling.

Features

Parse HTML from stdin or files
Extract data using CSS selectors
Multiple output formats:
- Prettified HTML (default)
- Plain text extraction
- JSON representation
- Attribute values
Support for pseudo-selectors:
- :first-child, :last-child, :nth-child(n)
- :first-of-type, :last-of-type, :nth-of-type(n)
- :not(), :contains(), :parent-of()
- :empty, :only-child, :only-of-type
Colorized output
Configurable indentation
Depth limiting for output
Character set support

Installation

From Source

To build and install cari from source:

git clone https://github.com/elliottophellia/cari.git
cd cari
cargo build --release

The compiled binary will be available at target/release/cari.

Using Cargo

cargo install cari

Usage

USAGE:
    cari [OPTIONS] [SELECTORS]... [DISPLAY]

OPTIONS:
    -h, --help              Show this help message
    -v, --version           Show version information
    -t, --text              Print text content
    -j, --json              Print as JSON
    --html                  Print as HTML (default)
    -a, --attr NAME         Print attribute value
    -i, --indent LEVEL      Indent output (spaces)
    -n, --number            Print the number of elements selected
    -p, --plain             Don't escape HTML
    --pre                   Preserve preformatted text
    -c, --color             Colorize output
    --no-color              Disable colorized output
    -f, --file FILE         Read input from file instead of stdin
    -l, --limit DEPTH       Restrict number of levels printed
    --charset CHARSET       Specify input charset (utf-8, latin-1, ascii)
    --escape-html           Escape HTML in text output (default)
    --no-escape-html        Do not escape HTML in text output

DISPLAY FUNCTIONS:
    text{}                  Print text content
    html{}                  Print as HTML
    json{}                  Print as JSON
    attr{NAME}              Print attribute value

Examples

Extract text from title elements

cat index.html | cari 'title'

Extract href attributes from links

cat index.html | cari 'a attr{href}'

Output JSON representation of paragraphs in divs

cat index.html | cari 'div > p' json{}

Extract src attributes from images

# Flag style
cat index.html | cari 'img' --attr src

# pup style
cat index.html | cari 'img' attr{src}

Read from file instead of stdin

cari -f index.html 'title'

Limit output depth

cari -l 2 -f page.html 'body'

Use in pipelines

curl -s https://example.com | cari 'main > p text{}'

Contributing

Please read CONTRIBUTING.md for details on our code of conduct and the process for submitting pull requests.

Changelog

See CHANGELOG.md for a detailed history of changes between versions.

License

                    GNU GENERAL PUBLIC LICENSE
                       Version 3, 29 June 2007

 Copyright (C) 2007 Free Software Foundation, Inc. <https://fsf.org/>
 Everyone is permitted to copy and distribute verbatim copies
 of this license document, but changing it is not allowed.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
src		src
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

cari

Description

Features

Installation

From Source

Using Cargo

Usage

Examples

Extract text from title elements

Extract href attributes from links

Output JSON representation of paragraphs in divs

Extract src attributes from images

Read from file instead of stdin

Limit output depth

Use in pipelines

Contributing

Changelog

License

About

Uh oh!

Releases 1

Uh oh!

Languages

License

elliottophellia/cari

Folders and files

Latest commit

History

Repository files navigation

cari

Description

Features

Installation

From Source

Using Cargo

Usage

Examples

Extract text from title elements

Extract href attributes from links

Output JSON representation of paragraphs in divs

Extract src attributes from images

Read from file instead of stdin

Limit output depth

Use in pipelines

Contributing

Changelog

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Uh oh!

Languages