Skip to content

Feat: Add Access Token Authentication for Private Repositories #257

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 36 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,16 +89,39 @@ gitingest --help

This will write the digest in a text file (default `digest.txt`) in your current working directory.

### Accessing Private Repositories with Tokens

You can provide a Personal Access Token (PAT) to clone private repositories from supported platforms (GitHub, GitLab, Codeberg, Bitbucket).
**Important:** This token is used **only** for the clone operation and is **never stored or logged** by Gitingest.

1. **Generate a Token:** Go to your Git provider's settings (e.g., GitHub Developer settings) and generate a Personal Access Token. Grant it the minimum required scope, which is typically read access to repositories (e.g., `repo` scope on GitHub, `read_repository` on GitLab).
2. **Use the Token (CLI):** Pass the token using the `--access-token` option:

```bash
gitingest https://github.com/your-user/your-private-repo --access-token YOUR_TOKEN
gitingest https://gitlab.com/your-group/your-private-repo --access-token YOUR_TOKEN
```

*Security Note:* Be mindful of your shell history when passing tokens directly on the command line.
3. **Use the Token (Web UI):** Paste the token into the "Access Token (Optional, for private repos)" field on [gitingest.com](https://gitingest.com) or your self-hosted instance.

*Note: If using a token with an unsupported Git host, the token will be ignored.*

## 🐍 Python package usage

```python
# Synchronous usage
from gitingest import ingest

# Public repo or local path
summary, tree, content = ingest("path/to/directory")

# or from URL
summary, tree, content = ingest("https://github.com/cyclotruc/gitingest")

# Private repo with token
summary, tree, content = ingest(
"https://github.com/your-user/your-private-repo",
access_token="YOUR_TOKEN"
)
```

By default, this won't write a file but can be enabled with the `output` argument.
Expand All @@ -108,7 +131,13 @@ By default, this won't write a file but can be enabled with the `output` argumen
from gitingest import ingest_async
import asyncio

# Public repo or local path
result = asyncio.run(ingest_async("path/to/directory"))

# Private repo with token
summary, tree, content = asyncio.run(
ingest_async("https://gitlab.com/your-group/your-private-repo", access_token="YOUR_TOKEN")
)
```

### Jupyter notebook usage
Expand All @@ -119,6 +148,11 @@ from gitingest import ingest_async
# Use await directly in Jupyter
summary, tree, content = await ingest_async("path/to/directory")

# Private repo with token (use await directly in Jupyter)
summary, tree, content = await ingest_async(
"https://github.com/your-user/your-private-repo",
access_token="YOUR_TOKEN"
)
```

This is because Jupyter notebooks are asynchronous by default.
Expand Down
1 change: 1 addition & 0 deletions requirements-dev.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,4 @@ pre-commit
pylint
pytest
pytest-asyncio
pytest-mock
19 changes: 17 additions & 2 deletions src/gitingest/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,13 +18,15 @@
@click.option("--exclude-pattern", "-e", multiple=True, help="Patterns to exclude")
@click.option("--include-pattern", "-i", multiple=True, help="Patterns to include")
@click.option("--branch", "-b", default=None, help="Branch to clone and ingest")
@click.option("--access-token", default=None, help="Access token for private repositories (e.g., GitHub, GitLab)")
def main(
source: str,
output: Optional[str],
max_size: int,
exclude_pattern: Tuple[str, ...],
include_pattern: Tuple[str, ...],
branch: Optional[str],
access_token: Optional[str],
):
"""
Main entry point for the CLI. This function is called when the CLI is run as a script.
Expand All @@ -46,9 +48,11 @@ def main(
A tuple of patterns to include during the analysis. Only files matching these patterns will be processed.
branch : str, optional
The branch to clone (optional).
access_token : str, optional
Access token for private repositories (optional).
"""
# Main entry point for the CLI. This function is called when the CLI is run as a script.
asyncio.run(_async_main(source, output, max_size, exclude_pattern, include_pattern, branch))
asyncio.run(_async_main(source, output, max_size, exclude_pattern, include_pattern, branch, access_token))


async def _async_main(
Expand All @@ -58,6 +62,7 @@ async def _async_main(
exclude_pattern: Tuple[str, ...],
include_pattern: Tuple[str, ...],
branch: Optional[str],
access_token: Optional[str],
) -> None:
"""
Analyze a directory or repository and create a text dump of its contents.
Expand All @@ -80,6 +85,8 @@ async def _async_main(
A tuple of patterns to include during the analysis. Only files matching these patterns will be processed.
branch : str, optional
The branch to clone (optional).
access_token : str, optional
Access token for private repositories (optional).

Raises
------
Expand All @@ -93,7 +100,15 @@ async def _async_main(

if not output:
output = OUTPUT_FILE_NAME
summary, _, _ = await ingest_async(source, max_size, include_patterns, exclude_patterns, branch, output=output)
summary, _, _ = await ingest_async(
source=source,
max_file_size=max_size,
include_patterns=include_patterns,
exclude_patterns=exclude_patterns,
branch=branch,
output=output,
access_token=access_token,
)

click.echo(f"Analysis complete! Output written to: {output}")
click.echo("\nSummary:")
Expand Down
77 changes: 72 additions & 5 deletions src/gitingest/cloning.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,28 @@
import os
from pathlib import Path
from typing import Optional
from urllib.parse import urlparse

from gitingest.schemas import CloneConfig
from gitingest.utils.git_utils import check_repo_exists, ensure_git_installed, run_command
from gitingest.utils.timeout_wrapper import async_timeout

TIMEOUT: int = 60

# Known hosts and their token authentication methods (add more as needed)
# Method: 'prefix' (https://<token>@host/...),
# 'oauth2' (https://oauth2:<token>@host/...),
# 'user' (<user>:<token>@host - requires username, not implemented)
KNOWN_HOST_AUTH = {
"github.com": {"method": "prefix"},
"gitlab.com": {"method": "oauth2"},
"codeberg.org": {"method": "prefix"},
"bitbucket.org": {"method": "prefix", "user": "x-token-auth"},
}


@async_timeout(TIMEOUT)
async def clone_repo(config: CloneConfig) -> None:
async def clone_repo(config: CloneConfig, access_token: Optional[str] = None) -> None:
"""
Clone a repository to a local path based on the provided configuration.

Expand All @@ -24,6 +36,8 @@ async def clone_repo(config: CloneConfig) -> None:
----------
config : CloneConfig
The configuration for cloning the repository.
access_token : str, optional
Access token for private repositories (optional).

Raises
------
Expand All @@ -46,9 +60,17 @@ async def clone_repo(config: CloneConfig) -> None:
except OSError as exc:
raise OSError(f"Failed to create parent directory {parent_dir}: {exc}") from exc

# Check if the repository exists
if not await check_repo_exists(url):
raise ValueError("Repository not found, make sure it is public")
# Construct authenticated URL based on host if token is provided
auth_url = build_auth_url(url, access_token)

# Skip existence check only if token is provided for a known host type
parsed = urlparse(url)
host = parsed.netloc.lower()
is_known_git_host = host in KNOWN_HOST_AUTH
if not (access_token and is_known_git_host):
exists = await check_repo_exists(url)
if not exists:
raise ValueError("Repository not found or inaccessible. If private, provide a token.")

clone_cmd = ["git", "clone", "--single-branch"]
# TODO re-enable --recurse-submodules
Expand All @@ -61,7 +83,7 @@ async def clone_repo(config: CloneConfig) -> None:
if branch and branch.lower() not in ("main", "master"):
clone_cmd += ["--branch", branch]

clone_cmd += [url, local_path]
clone_cmd += [auth_url, local_path]

# Clone the repository
await ensure_git_installed()
Expand All @@ -83,3 +105,48 @@ async def clone_repo(config: CloneConfig) -> None:

# Check out the specific commit and/or subpath
await run_command(*checkout_cmd)


def build_auth_url(url: str, access_token: Optional[str] = None) -> str:
"""
Build an authenticated URL for cloning a repository.

Parameters
----------
url : str
The original repository URL.
access_token : str, optional
Access token for private repositories (optional).

Returns
-------
str
The authenticated URL.
"""
parsed = urlparse(url)
final_url = url

# Return the original URL if no access token is provided
if access_token:
if not parsed.scheme or not parsed.netloc:
print(f"Warning: Could not parse URL '{url}' for token auth: invalid URL")

host = parsed.netloc.lower()
host_info = KNOWN_HOST_AUTH.get(host)
if host_info:
method = host_info["method"]
if method == "prefix":
user = host_info.get("user")
if user:
# e.g. bitbucket.org → x-token-auth:<token>@bitbucket.org/…
final_url = url.replace("https://", f"https://{user}:{access_token}@", 1)
else:
# e.g. github.com, codeberg.org → <token>@host/…
final_url = url.replace("https://", f"https://{access_token}@", 1)

elif method == "oauth2":
# gitlab.com → oauth2:<token>@gitlab.com/…
final_url = url.replace("https://", f"https://oauth2:{access_token}@", 1)

# fall‑through: shouldn't happen if KNOWN_HOST_AUTH is correct
return final_url
9 changes: 8 additions & 1 deletion src/gitingest/entrypoint.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ async def ingest_async(
exclude_patterns: Optional[Union[str, Set[str]]] = None,
branch: Optional[str] = None,
output: Optional[str] = None,
access_token: Optional[str] = None,
) -> Tuple[str, str, str]:
"""
Main entry point for ingesting a source and processing its contents.
Expand All @@ -41,6 +42,8 @@ async def ingest_async(
The branch to clone and ingest. If `None`, the default branch is used.
output : str, optional
File path where the summary and content should be written. If `None`, the results are not written to a file.
access_token : str, optional
Access token for private repositories (optional).

Returns
-------
Expand Down Expand Up @@ -71,7 +74,7 @@ async def ingest_async(
query.branch = selected_branch

clone_config = query.extract_clone_config()
clone_coroutine = clone_repo(clone_config)
clone_coroutine = clone_repo(clone_config, access_token=access_token)

if inspect.iscoroutine(clone_coroutine):
if asyncio.get_event_loop().is_running():
Expand Down Expand Up @@ -103,6 +106,7 @@ def ingest(
exclude_patterns: Optional[Union[str, Set[str]]] = None,
branch: Optional[str] = None,
output: Optional[str] = None,
access_token: Optional[str] = None,
) -> Tuple[str, str, str]:
"""
Synchronous version of ingest_async.
Expand All @@ -126,6 +130,8 @@ def ingest(
The branch to clone and ingest. If `None`, the default branch is used.
output : str, optional
File path where the summary and content should be written. If `None`, the results are not written to a file.
access_token : str, optional
Access token for private repositories (optional).

Returns
-------
Expand All @@ -147,5 +153,6 @@ def ingest(
exclude_patterns=exclude_patterns,
branch=branch,
output=output,
access_token=access_token,
)
)
Loading