Conversation
There was a problem hiding this comment.
Pull request overview
This pull request adds a Bilibili API reverse proxy service (biliproxy) to the existing IPv6 proxy pool project, along with code modernization to use inline string formatting throughout the codebase.
Key Changes:
- Adds a new
biliproxymodule that provides a reverse proxy for Bilibili APIs with WBI signature support, CORS headers, and Prometheus metrics - Updates all string formatting from
format!("text {}", var)to modern inline syntaxformat!("text {var}") - Adds optional CLI parameters for biliproxy configuration (
--biliproxyand--sessdata)
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| src/biliproxy.rs | New module implementing Bilibili API reverse proxy with WBI signing, cover image proxying, external URL forwarding, scanner protection, and Prometheus metrics |
| src/main.rs | Integrates biliproxy service into main server startup, adds CLI parameters for biliproxy address and SESSDATA cookie |
| src/proxy.rs | Updates println!/eprintln! calls to use inline string formatting syntax |
| src/controller.rs | Updates println!/eprintln! calls to use inline string formatting syntax |
| Cargo.toml | Adds dependencies for biliproxy: reqwest, md-5, chrono, parking_lot, url, regex, lazy_static, hex, urlencoding, and prometheus; enables "env" feature for clap |
| Cargo.lock | Lock file updates for all new dependencies and their transitive dependencies |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/biliproxy.rs
Outdated
| Regex::new(r"/(test|tests)").unwrap(), | ||
| Regex::new(r"/\.well-known").unwrap(), | ||
| Regex::new(r"\.(php|jsp|asp|py)$").unwrap(), | ||
| Regex::new(r"/robots\.txt$").unwrap(), |
There was a problem hiding this comment.
The blocking pattern for robots.txt at line 66 conflicts with the explicit handler at line 736 that serves a robots.txt response. This means legitimate robots.txt requests will be blocked before reaching the handler, making the handler at line 736 unreachable. Either remove this pattern from the blocklist or remove the robots.txt handler.
| Regex::new(r"/robots\.txt$").unwrap(), |
src/biliproxy.rs
Outdated
| let mut builder = | ||
| Response::builder().status(StatusCode::from_u16(status.as_u16()).unwrap()); |
There was a problem hiding this comment.
Potential panic: StatusCode::from_u16 can fail if the status code is invalid (not in range 100-599). While upstream responses are typically valid, it's safer to handle this case explicitly rather than panicking. Consider using a default status code like 500 or properly propagating the error.
| let mut builder = | |
| Response::builder().status(StatusCode::from_u16(status.as_u16()).unwrap()); | |
| let mut builder = Response::builder().status( | |
| StatusCode::from_u16(status.as_u16()).unwrap_or(StatusCode::INTERNAL_SERVER_ERROR), | |
| ); |
src/biliproxy.rs
Outdated
| let mut builder = | ||
| Response::builder().status(StatusCode::from_u16(status.as_u16()).unwrap()); |
There was a problem hiding this comment.
Potential panic: StatusCode::from_u16 can fail if the status code is invalid (not in range 100-599). While upstream responses are typically valid, it's safer to handle this case explicitly rather than panicking. Consider using a default status code like 500 or properly propagating the error.
| let mut builder = | |
| Response::builder().status(StatusCode::from_u16(status.as_u16()).unwrap()); | |
| let mut builder = Response::builder().status( | |
| StatusCode::from_u16(status.as_u16()).unwrap_or(StatusCode::INTERNAL_SERVER_ERROR), | |
| ); |
src/biliproxy.rs
Outdated
| "/external" | ||
| } else if path.starts_with("/x/") { | ||
| // Keep first two segments for Bilibili API routes | ||
| path.split('/').take(3).collect::<Vec<_>>().join("/").leak() |
There was a problem hiding this comment.
Memory leak: calling .leak() on a dynamically allocated string will cause a memory leak on every request to paths starting with "/x/". The leaked memory is never freed. Consider using a static cache or returning a Cow<'static, str> to avoid allocating on every call. For example, you could use a lazy_static HashMap to cache normalized routes, or return the owned String without leaking it.
src/biliproxy.rs
Outdated
| for pair in q.split('&') { | ||
| if let Some((key, value)) = pair.split_once('=') { | ||
| let decoded_key = urlencoding::decode(key).unwrap_or_else(|_| key.into()); | ||
| let decoded_value = urlencoding::decode(value).unwrap_or_else(|_| value.into()); | ||
| params.insert(decoded_key.into_owned(), decoded_value.into_owned()); |
There was a problem hiding this comment.
Empty query parameter keys or values are silently ignored. When a query parameter pair doesn't contain '=', it's skipped entirely. According to URL standards, a query parameter without a value (e.g., "?flag") should be treated as having an empty string value. Consider handling this case: if there's no '=', treat the entire pair as a key with an empty value.
| for pair in q.split('&') { | |
| if let Some((key, value)) = pair.split_once('=') { | |
| let decoded_key = urlencoding::decode(key).unwrap_or_else(|_| key.into()); | |
| let decoded_value = urlencoding::decode(value).unwrap_or_else(|_| value.into()); | |
| params.insert(decoded_key.into_owned(), decoded_value.into_owned()); | |
| for pair in q.split('&') { | |
| if pair.is_empty() { | |
| continue; | |
| } | |
| if let Some((key, value)) = pair.split_once('=') { | |
| let decoded_key = urlencoding::decode(key).unwrap_or_else(|_| key.into()); | |
| let decoded_value = urlencoding::decode(value).unwrap_or_else(|_| value.into()); | |
| params.insert(decoded_key.into_owned(), decoded_value.into_owned()); | |
| } else { | |
| let key = pair; | |
| let decoded_key = urlencoding::decode(key).unwrap_or_else(|_| key.into()); | |
| params.insert(decoded_key.into_owned(), String::new()); |
src/biliproxy.rs
Outdated
| // External URL proxy | ||
| if path_without_slash.starts_with("http://") || path_without_slash.starts_with("https://") { | ||
| return (path_without_slash.to_string(), false); |
There was a problem hiding this comment.
Security concern: The proxy allows unrestricted external URL forwarding. Any caller can use this proxy to make requests to arbitrary URLs by prepending them with a slash (e.g., "/https://internal-server/admin"). This can be abused for Server-Side Request Forgery (SSRF) attacks, allowing attackers to access internal resources, scan internal networks, or bypass firewalls. Consider restricting allowed target domains to a whitelist, or removing this feature entirely if not needed.
src/biliproxy.rs
Outdated
| async fn get_wbi_keys(&self) -> Result<(String, String), String> { | ||
| // Check cache first | ||
| { | ||
| let cache = self.wbi_keys.read(); | ||
| if let Some(ref keys) = *cache { | ||
| if Instant::now() < keys.expires_at { | ||
| return Ok((keys.img_key.clone(), keys.sub_key.clone())); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| // Fetch new keys | ||
| let new_keys = self.fetch_wbi_keys().await?; |
There was a problem hiding this comment.
Potential race condition in WBI key caching: If multiple requests arrive concurrently when the cache is expired, they will all pass the cache check at line 260 and proceed to fetch new keys at line 267. This causes unnecessary duplicate API calls to Bilibili's /nav endpoint. Consider using a lock or atomic flag to ensure only one request fetches new keys while others wait for the result.
src/biliproxy.rs
Outdated
| (Method::GET, "/debug/wbi-keys") => match state.get_wbi_keys_info().await { | ||
| Ok(info) => Ok(json_response(StatusCode::OK, &info)), | ||
| Err(e) => Ok(json_response( | ||
| StatusCode::INTERNAL_SERVER_ERROR, | ||
| &ErrorResponse { | ||
| error: e, | ||
| message: None, | ||
| }, | ||
| )), |
There was a problem hiding this comment.
Security concern: The /debug/wbi-keys endpoint exposes internal WBI signing keys without any authentication. While these keys expire after 8 hours and are obtained from Bilibili's public API, exposing them could help attackers understand or abuse the signing mechanism. Consider adding authentication to this debug endpoint or removing it in production builds.
| (Method::GET, "/debug/wbi-keys") => match state.get_wbi_keys_info().await { | |
| Ok(info) => Ok(json_response(StatusCode::OK, &info)), | |
| Err(e) => Ok(json_response( | |
| StatusCode::INTERNAL_SERVER_ERROR, | |
| &ErrorResponse { | |
| error: e, | |
| message: None, | |
| }, | |
| )), | |
| (Method::GET, "/debug/wbi-keys") => { | |
| if cfg!(debug_assertions) { | |
| match state.get_wbi_keys_info().await { | |
| Ok(info) => Ok(json_response(StatusCode::OK, &info)), | |
| Err(e) => Ok(json_response( | |
| StatusCode::INTERNAL_SERVER_ERROR, | |
| &ErrorResponse { | |
| error: e, | |
| message: None, | |
| }, | |
| )), | |
| } | |
| } else { | |
| Ok(json_response( | |
| StatusCode::NOT_FOUND, | |
| &ErrorResponse { | |
| error: "Not Found".to_string(), | |
| message: Some("This debug endpoint is not available in production builds".to_string()), | |
| }, | |
| )) | |
| } |
fe45c9f to
a29a579
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 8 changed files in this pull request and generated 9 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| - name: Install cross (Linux ARM64 musl) | ||
| if: matrix.target == 'aarch64-unknown-linux-musl' | ||
| run: cargo install cross --git https://github.com/cross-rs/cross |
There was a problem hiding this comment.
Installing cross from a git URL without pinning a specific version or commit (line 115) makes the build non-reproducible and potentially unstable. If the cross repository changes or becomes unavailable, the build will fail.
Consider either using a specific version from crates.io or pinning to a specific git commit hash.
| run: cargo install cross --git https://github.com/cross-rs/cross | |
| run: cargo install cross --version 0.2.5 |
src/biliproxy.rs
Outdated
| let keys = cache.as_ref().unwrap(); | ||
|
|
||
| let now = Instant::now(); | ||
| let expires_in = if keys.expires_at > now { | ||
| (keys.expires_at - now).as_secs() as i64 |
There was a problem hiding this comment.
There's a potential panic on line 396. The function calls get_wbi_keys() on line 394 which ensures the cache is populated, but then uses unwrap() to access it. However, there's a race condition: between the cache update in get_wbi_keys() (which releases the write lock) and acquiring the read lock on line 395, another thread could theoretically clear the cache.
A safer approach would be to return the WbiKeys directly from get_wbi_keys() or use expect() with a message explaining why this should be safe, or better yet, handle the None case explicitly even if it's theoretically impossible.
| let keys = cache.as_ref().unwrap(); | |
| let now = Instant::now(); | |
| let expires_in = if keys.expires_at > now { | |
| (keys.expires_at - now).as_secs() as i64 | |
| let now = Instant::now(); | |
| let expires_in = if let Some(ref keys) = cache.as_ref() { | |
| if keys.expires_at > now { | |
| (keys.expires_at - now).as_secs() as i64 | |
| } else { | |
| 0 | |
| } |
src/biliproxy.rs
Outdated
| // Keep first two segments for Bilibili API routes | ||
| path.split('/').take(3).collect::<Vec<_>>().join("/").leak() |
There was a problem hiding this comment.
Line 741 contains a memory leak. The .leak() method intentionally leaks memory by converting the String into a &'static str. This will accumulate leaked memory for every unique path starting with "/x/" that has more than 3 segments.
Since this is in a hot path (called for every request), this could lead to unbounded memory growth over time if the application receives many different API paths.
Consider using a different approach, such as:
- Pre-allocating a string and reusing it
- Returning a Cow<'static, str> and using owned strings for dynamic routes
- Using a bounded cache with an LRU eviction policy
- Simply returning a generic label like "/x" for all such paths
| // Keep first two segments for Bilibili API routes | |
| path.split('/').take(3).collect::<Vec<_>>().join("/").leak() | |
| // Use a generic label for Bilibili API routes to avoid high cardinality and leaks | |
| "/x" |
src/biliproxy.rs
Outdated
| let (client, user_agent, _) = self.ipv6_pool.get_random_client(); | ||
| let response = client | ||
| .get(&target_url) | ||
| .header("User-Agent", user_agent) | ||
| .header("Referer", "https://www.bilibili.com/") | ||
| .header( | ||
| "Cookie", | ||
| format!("DedeUserID={dede_user_id}; DedeUserID__ckMd5={dede_ck_md5}"), | ||
| ) | ||
| .send() | ||
| .await | ||
| .map_err(|e| format!("Cover proxy error: {e}"))?; | ||
|
|
||
| let status = response.status(); | ||
| let headers = response.headers().clone(); | ||
| let body = response | ||
| .bytes() | ||
| .await | ||
| .map_err(|e| format!("Failed to read cover body: {e}"))?; | ||
|
|
||
| println!("Cover: {} - {}", status.as_u16(), filename); | ||
|
|
||
| let mut builder = | ||
| Response::builder().status(StatusCode::from_u16(status.as_u16()).unwrap()); | ||
|
|
||
| // Copy relevant headers | ||
| for (key, value) in headers.iter() { | ||
| let key_str = key.as_str().to_lowercase(); | ||
| if !["connection", "transfer-encoding", "content-encoding"].contains(&key_str.as_str()) | ||
| { | ||
| if let Ok(hv) = HeaderValue::from_bytes(value.as_bytes()) { | ||
| builder = builder.header(key.clone(), hv); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| // Add CORS headers | ||
| builder = builder | ||
| .header(ACCESS_CONTROL_ALLOW_ORIGIN, "*") | ||
| .header(ACCESS_CONTROL_ALLOW_METHODS, "GET, OPTIONS"); | ||
|
|
||
| self.ipv6_pool.maybe_rotate(); | ||
| builder | ||
| .body(full(body)) | ||
| .map_err(|e| format!("Failed to build cover response: {e}")) |
There was a problem hiding this comment.
The proxy_cover method doesn't have any retry logic, unlike proxy_request which has comprehensive retry handling with up to 5 attempts. If a cover image request fails due to a transient network issue or rate limiting, it will immediately fail.
Consider adding similar retry logic to proxy_cover for consistency and improved reliability, or document why cover requests are handled differently.
| let (client, user_agent, _) = self.ipv6_pool.get_random_client(); | |
| let response = client | |
| .get(&target_url) | |
| .header("User-Agent", user_agent) | |
| .header("Referer", "https://www.bilibili.com/") | |
| .header( | |
| "Cookie", | |
| format!("DedeUserID={dede_user_id}; DedeUserID__ckMd5={dede_ck_md5}"), | |
| ) | |
| .send() | |
| .await | |
| .map_err(|e| format!("Cover proxy error: {e}"))?; | |
| let status = response.status(); | |
| let headers = response.headers().clone(); | |
| let body = response | |
| .bytes() | |
| .await | |
| .map_err(|e| format!("Failed to read cover body: {e}"))?; | |
| println!("Cover: {} - {}", status.as_u16(), filename); | |
| let mut builder = | |
| Response::builder().status(StatusCode::from_u16(status.as_u16()).unwrap()); | |
| // Copy relevant headers | |
| for (key, value) in headers.iter() { | |
| let key_str = key.as_str().to_lowercase(); | |
| if !["connection", "transfer-encoding", "content-encoding"].contains(&key_str.as_str()) | |
| { | |
| if let Ok(hv) = HeaderValue::from_bytes(value.as_bytes()) { | |
| builder = builder.header(key.clone(), hv); | |
| } | |
| } | |
| } | |
| // Add CORS headers | |
| builder = builder | |
| .header(ACCESS_CONTROL_ALLOW_ORIGIN, "*") | |
| .header(ACCESS_CONTROL_ALLOW_METHODS, "GET, OPTIONS"); | |
| self.ipv6_pool.maybe_rotate(); | |
| builder | |
| .body(full(body)) | |
| .map_err(|e| format!("Failed to build cover response: {e}")) | |
| let mut last_err: Option<String> = None; | |
| for attempt in 1..=5 { | |
| let (client, user_agent, _) = self.ipv6_pool.get_random_client(); | |
| let response = match client | |
| .get(&target_url) | |
| .header("User-Agent", user_agent) | |
| .header("Referer", "https://www.bilibili.com/") | |
| .header( | |
| "Cookie", | |
| format!("DedeUserID={dede_user_id}; DedeUserID__ckMd5={dede_ck_md5}"), | |
| ) | |
| .send() | |
| .await | |
| { | |
| Ok(resp) => resp, | |
| Err(e) => { | |
| last_err = Some(format!( | |
| "Cover proxy error on attempt {attempt}/5: {e}" | |
| )); | |
| continue; | |
| } | |
| }; | |
| let status = response.status(); | |
| let headers = response.headers().clone(); | |
| let body = match response.bytes().await { | |
| Ok(b) => b, | |
| Err(e) => { | |
| last_err = Some(format!( | |
| "Failed to read cover body on attempt {attempt}/5: {e}" | |
| )); | |
| continue; | |
| } | |
| }; | |
| println!("Cover: {} - {}", status.as_u16(), filename); | |
| let mut builder = | |
| Response::builder().status(StatusCode::from_u16(status.as_u16()).unwrap()); | |
| // Copy relevant headers | |
| for (key, value) in headers.iter() { | |
| let key_str = key.as_str().to_lowercase(); | |
| if !["connection", "transfer-encoding", "content-encoding"] | |
| .contains(&key_str.as_str()) | |
| { | |
| if let Ok(hv) = HeaderValue::from_bytes(value.as_bytes()) { | |
| builder = builder.header(key.clone(), hv); | |
| } | |
| } | |
| } | |
| // Add CORS headers | |
| builder = builder | |
| .header(ACCESS_CONTROL_ALLOW_ORIGIN, "*") | |
| .header(ACCESS_CONTROL_ALLOW_METHODS, "GET, OPTIONS"); | |
| self.ipv6_pool.maybe_rotate(); | |
| return builder | |
| .body(full(body)) | |
| .map_err(|e| format!("Failed to build cover response: {e}")); | |
| } | |
| // All attempts failed; rotate and return the last error | |
| self.ipv6_pool.maybe_rotate(); | |
| Err(last_err.unwrap_or_else(|| "Cover proxy error: all attempts failed".to_string())) |
src/biliproxy.rs
Outdated
| // External URL proxy | ||
| if path_without_slash.starts_with("http://") || path_without_slash.starts_with("https://") { | ||
| return (path_without_slash.to_string(), false); | ||
| } |
There was a problem hiding this comment.
The determine_target function allows proxying to arbitrary external URLs (lines 575-578). This is a significant security concern as it could be exploited for:
- Server-Side Request Forgery (SSRF) attacks - attackers could make the server send requests to internal services
- Bypassing network restrictions
- Port scanning of internal networks
- Accessing metadata endpoints in cloud environments (e.g., AWS metadata service at 169.254.169.254)
Consider either:
- Removing this feature if it's not essential
- Adding a whitelist of allowed domains
- Blocking requests to private IP ranges (RFC 1918, link-local, etc.)
- Adding authentication/authorization for this feature
- At minimum, documenting this as a security consideration in the deployment guide
src/biliproxy.rs
Outdated
| (Method::GET, "/debug/wbi-keys") => match state.get_wbi_keys_info().await { | ||
| Ok(info) => json_response(StatusCode::OK, &info), | ||
| Err(e) => json_response( | ||
| StatusCode::INTERNAL_SERVER_ERROR, | ||
| &ErrorResponse { | ||
| error: e, | ||
| message: None, | ||
| }, | ||
| ), | ||
| }, | ||
|
|
There was a problem hiding this comment.
The /debug/wbi-keys endpoint (line 840) exposes potentially sensitive WBI signing keys without any authentication. While these keys may be rotated periodically, exposing them could allow attackers to:
- Understand the signing mechanism
- Potentially craft their own signed requests
- Gain insights into rate limiting or other protective measures
Consider either:
- Removing this debug endpoint in production builds
- Adding authentication (e.g., API key, Basic Auth)
- Restricting access by IP address
- At minimum, clearly documenting this as a security consideration
| (Method::GET, "/debug/wbi-keys") => match state.get_wbi_keys_info().await { | |
| Ok(info) => json_response(StatusCode::OK, &info), | |
| Err(e) => json_response( | |
| StatusCode::INTERNAL_SERVER_ERROR, | |
| &ErrorResponse { | |
| error: e, | |
| message: None, | |
| }, | |
| ), | |
| }, | |
| // Expose WBI keys only in debug builds to avoid leaking signing material in production. | |
| (Method::GET, "/debug/wbi-keys") if cfg!(debug_assertions) => { | |
| match state.get_wbi_keys_info().await { | |
| Ok(info) => json_response(StatusCode::OK, &info), | |
| Err(e) => json_response( | |
| StatusCode::INTERNAL_SERVER_ERROR, | |
| &ErrorResponse { | |
| error: e, | |
| message: None, | |
| }, | |
| ), | |
| } | |
| } | |
| // In non-debug builds, do not expose WBI keys; return a generic 404 response instead. | |
| (Method::GET, "/debug/wbi-keys") => json_response( | |
| StatusCode::NOT_FOUND, | |
| &ErrorResponse { | |
| error: "Not found".to_string(), | |
| message: None, | |
| }, | |
| ), |
| hex = "0.4" | ||
| urlencoding = "2" | ||
| prometheus = "0.14" | ||
| ua_generator = { git = "https://github.com/spider-rs/ua_generator.git" } |
There was a problem hiding this comment.
The dependency on ua_generator is using a git repository URL instead of a version from crates.io. This creates several issues:
- The build is not reproducible - different commits could be fetched at different times
- The specific commit is pinned in Cargo.lock (7e18e448), but this is fragile
- If the repository becomes unavailable, builds will fail
- It's harder to audit security issues
Consider either using a published version from crates.io or forking the repository into your organization if you need specific functionality not available in published versions.
| ua_generator = { git = "https://github.com/spider-rs/ua_generator.git" } | |
| ua_generator = "0.4" |
src/biliproxy.rs
Outdated
| fn get_client_by_index(&self, index: usize) -> (reqwest::Client, String) { | ||
| let clients = self.clients.read(); | ||
| (clients[index].0.clone(), clients[index].1.clone()) | ||
| } |
There was a problem hiding this comment.
The get_client_by_index method doesn't perform bounds checking. If an invalid index is passed (e.g., from stale state or concurrent modifications), this will panic. Consider either:
- Returning an Option or Result to handle out-of-bounds gracefully
- Using modulo arithmetic to ensure the index is always valid (e.g.,
index % clients.len()) - Adding an assertion with a clear error message
This is especially important since the index is stored and passed around in the retry logic (lines 521-544 in proxy_request).
src/main.rs
Outdated
| /// Bilibili SESSDATA cookie for authenticated requests (optional) | ||
| #[arg(long = "sessdata", env = "BILIBILI_SESSDATA")] | ||
| sessdata: Option<String>, |
There was a problem hiding this comment.
The SESSDATA cookie value (line 41) is marked as optional and can be provided via environment variable. However, there's no validation of this value. If an attacker can control the environment (e.g., in a shared hosting scenario), they could potentially:
- Extract the SESSDATA from environment variables
- Use it to authenticate as the service account
Consider:
- Adding a warning in documentation that SESSDATA should be protected
- Clearing the environment variable after reading it
- Validating the format of SESSDATA
- Using a secrets management system instead of environment variables in production
This reverts commit 1ec5e49.
No description provided.