Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Serve current documents #32

Merged
merged 128 commits into from
Jan 29, 2024
Merged

feat: Serve current documents #32

merged 128 commits into from
Jan 29, 2024

Conversation

n-dusan
Copy link
Contributor

@n-dusan n-dusan commented Jan 2, 2024

Closes #18
Closes #29
Closes #31

This PR adds support for:

  1. Serving current blobs (documents) for git repositories. We use git HEAD commit to get current blob.
  2. Multi-host serve for git repositories based on guarded header value.

Support for serving current blobs for git repositories.

Add stelae serve cli command, which spins up the server and serves documents from Stele data repositories in an archive.

  • First we parse an archive and initialize Actix routes to git repositories given the Stele that's specified in config.toml. Note that Actix routes need to be registered dynamically, because the routing matching patterns depend on and are read from an external configuration file repositories.json.
  • For parsing, read and map all data repositories from repositories.json in Stele's authentication repo. Here's an example of a repositories.json file that expects to serve only two data repositories: open-law-html and law-static-assets.
  • Stele can reference other Stele in dependencies.json. To serve current blobs on referenced Stele, referenced Stele needs to specify top-level scopes in repositories.json.
  • After the archive is parsed, map and initialize all data git repositories to global, read-only AppState.

For testing framework, add tests for three possible archives for serving current documents are basic, multijursdiction, multihost.
Basic archive is an archive with one stele that has multiple data repositories. Multijursdiction archive is an example of Stele referencing other Stele with dependencies.json. Multihost archive is enabled by using the current_documents_guard in config.toml

Multi-host serve for git repositories based on guarded header value.

When config.toml sets a current_documents_guard header, we expect the HTTP requests to Stele to contain a guarded header. So requests which don't match the expected guarded header aren't resolved.

dgreisen and others added 30 commits January 1, 2024 16:36
Not sure what I'm doing wrong here. I'm trying to make sure that
all Stele are owned by parent Archive struct. But this is not
working.
only partially working. I am not able to clone the git2::Repository instance, so I'm instantiating within route loop. This should not be necessary
Since Actix routes get initialized in the way they are specified, we
need to sort repositories by routes.

To make this work, we need a way to serialize HashMap of repository values
into a Vector of tuples.
This is necessary because the HashMap does not preserve order. I thought
that simply serializing into BTreeMap instead would work, however, that
data structure preserves order by sorting keys. What we instead want is
to sort the collection by values (more specifically routes within the
repository custom attribute).

Based on a comment on stackoverflow [1], it seems highly unlikely we can
sort BTreeMap by values. Instead, we implement our own serde deserializing method
that deserializes a serde Map into a Vec<(String,Repository)> using
deserialize_with [2].

[1] - https://stackoverflow.com/questions/41220872/how-if-possible-to-sort-a-btreemap-by-value-in-rust
[2] - https://serde.rs/field-attrs.html#deserialize_with
Introduce a --individual flag. If wish to serve an individual stele, use
the bool flag, else we expect config.toml to contain the root stele for
the archive.
Made more to have name accessable from the struct, as opposed to a tuple
@n-dusan n-dusan added the enhancement New feature or request label Jan 2, 2024
@n-dusan n-dusan self-assigned this Jan 2, 2024
@n-dusan n-dusan force-pushed the ndusan/serve-current-documents branch from b3cfca9 to 557b65f Compare January 2, 2024 21:24
@n-dusan n-dusan force-pushed the ndusan/serve-current-documents branch from 557b65f to ab634da Compare January 2, 2024 21:37
@n-dusan n-dusan mentioned this pull request Jan 3, 2024
@n-dusan n-dusan requested a review from tombh January 3, 2024 14:04
Copy link
Contributor

@tombh tombh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! You're clearly a Rust dev now Dušan 😏

It's great seeing all the tests.

I mention it a couple of times in the code review comments, but more generally I have a question about whether the tested Git repos' activity has to be included in the main repo? What I mean is that I see various diffs in this PR referring to Git hooks, Git logs, etc. Can those changes not be made in the moment of the test? Allowing it all to be Git ignored?

src/stelae/archive.rs Outdated Show resolved Hide resolved
src/stelae/stele.rs Outdated Show resolved Hide resolved
src/stelae/stele.rs Outdated Show resolved Hide resolved
src/stelae/types/repositories.rs Show resolved Hide resolved
src/utils/archive.rs Show resolved Hide resolved
tests/archive_testtools/config.rs Outdated Show resolved Hide resolved
/// Information about a data repository.
///
/// This struct is used to initialize a data repository in the test suite.
pub struct TestDataRepositoryContext<'repo> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit surprised that lifetimes would be needed in tests. Though I wouldn't worry too much, if it's the easiest way to solve something then it's not a big deal, the tests hopefully don't dictate the design of the actual code.

Lifetimes can get really complicated, I'm just curious if they're really needed in test code. Is there some other "hack" to stop having to think about lifetimes in tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right. We don't need to optimize testing framework by using string slices instead of owned Strings. Refactored and removed lifetimes

/// The name of the data repository.
pub name: &'repo str,
/// The paths of the data repository.
pub paths: Vec<Cow<'static, str>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think paths should be Vec<PathBuf> to match the type of Stele.archive_path. Which makes me wonder if other types in this struct need updating to match the application's underlying type.

What tips me off is that it should be rare that tests need special treatment like lifetimes and Cow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to Vec<PathBuf>, removed lifetimes and Cow

@@ -0,0 +1,174 @@
#!/usr/bin/perl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine that none of these Git hooks are ever used? I wonder if they could just be deleted? Or ignored?

@@ -0,0 +1,10 @@
0000000000000000000000000000000000000000 64215d0a7b9398be24035d95d574f329fd3140f5 n-dusan <nikolic.dusan.dey@gmail.com> 1686250251 +0200 commit (initial): Initial commit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can such Git activity be Git-ignored?

Copy link
Contributor Author

@n-dusan n-dusan Jan 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely. Added logs and hooks to gitignore

@n-dusan n-dusan force-pushed the ndusan/serve-current-documents branch from ad826db to 88dc3c8 Compare January 26, 2024 14:43
@n-dusan n-dusan requested a review from tombh January 26, 2024 15:19
@n-dusan
Copy link
Contributor Author

n-dusan commented Jan 26, 2024

Thanks, @tombh! I agree, definitely didn't need to have those Git hooks and logs, etc. I believe I've addressed all comments, so re-requested review.

/// `repositories.json` is expected to exist in /targets/repositories.json in the authentication repository.
/// # Examples
///
/// ```rust
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 🤓

"./a/b/c.html",
"./a/b/c/index.html",
];
paths.iter().map(|&x| PathBuf::from(x)).collect()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohhh yes, nice!

@n-dusan n-dusan merged commit 9aaed03 into main Jan 29, 2024
4 checks passed
@n-dusan n-dusan deleted the ndusan/serve-current-documents branch January 29, 2024 14:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for multihost Repository testing framework Serve current documents
3 participants