Skip to content
Andrew Guschin edited this page Apr 26, 2025 · 2 revisions

mm-git

Architecture

This service doesn’t know any context about any of the repos in terms of business logic. Repo storage is a single directory without any hierarchy, and all names of repos are IDs, that are returned from backend. Backend should generate these IDs, with context of a particular repository. This service won’t check correctness of any given ID, because it doesn’t (and shouldn’t) have any information about it.

Access rules to each of the repos is stored in ACL (access-control list) table. Each record in this table refers to single user and single repo. This means that any repo can have more than one user with access permissions. The permissions are Read and Write.

Access to repositories is granted via ssh protocol or HTTP Basic Auth.

SSH Protocol

When user connects to our server via ssh, it starts the default shell for user. We can replace this shell by our own program to handle git commands.

To identify the connected user and verify their permissions in sshd_config should be turned on the option ExposeAuthInfo yes. This will expose some user information in file $SSH_USER_AUTH. The public key should be in this file, so with this we should be able to identify the user. The authentication infromation for SSH is stored in different table. The key could be identified by its fingerprint. After we got the key fingerprint, we can get the respective user and verify their ACL.

This approach implies that any user can have more than one SSH key for access and all these keys don’t have distinct ACL’s.

HTTP Basic Auth

Note

This part is not designed yet. Password authentication probably should be done on the backend side. After successful authentication, the authorization should be done on this side.

The specific details of this authentication+authorization process will be thought out later.

This kind of auth is necessary, because some students may want to get access to their repos within class on faculty’s hardware. That means SSH access would be too cumbersome.

Service and Shell split

Incoming requests via ssh should be handled as shell that is spawned as separate processes for each connection. Requests from backend should be handled by a separate service process. This means that the binary for the service should work in two modes:

  • Run as a shell process, that processes one request and exits.

  • Run as a service that is handling many requests.

This is kind of inconvenient, but manageable. The switch between modes could be done with some shell parameters.

Repo storage

All repos are stored in flat directory and referenced by their ID. This repo ID is derived from context of this repo. This context is decided on the backend side.

Note

ID could be derived from several possible fields:

  • course_id

  • task_id

  • group_id

  • user_id

ID that is derived from course_id and user_id should correspond to repo that is:

  1. Global for the whole course — this repo is not tied to any task in particular.

  2. Unique for each user — no two users has any access to each other’s repo.

ID that is derived from course_id and group_id should correspond to repo that is:

  1. Global for the whole course — this repo is not tied to any task in particular.

  2. Unique for each group of users — if course is completed in teams, rather than individually. Access to this repo is granted to any user belonging to group_id.

ID that is derived just from course_id should correspond to repo that is accessible by anyone enrolled in this course.

Name translation

To do something with repo, user needs to input the URL of repo somewhere. When user starts to interact with repos, irrespective of used protocol, we would know their user_id. So we should strive to determine most information about this repo on our side.

Currently, user needs to provide only machine-friendly course_name or course_name and task_name to us. From this information we could determine all the information to point our git shell to particular repo with correct ACL for each user.

For example, we get request to repo computer-graphics/task1.git. If this task to be done individually, we could resolve this request to repos with IDs, that are derived from course_id, task_id and user_id for each user. If this task is global for the course, then we point all requests to single repo with ID, that is derived only from course_id and task_id.

This also applies for global to course repos, such as computer-graphics.git.

All of the URLs should be in the following format:

All other information relating to ACL, groups and other users should be computed on our side. This would remove unnecessary technical information from the url and leave out only the useful parts for the user.

For example, repo url git@example.com:computer-graphics/task1.git could resolve to:

  • Different repos for each user, if this task is done individually;

  • Same repo for users in same group, if this task is done in groups;

  • Different repo for users belonging to different groups, if this task is done in groups;

  • etc.

Clone this wiki locally