CCExtractor · canihavesomecoffee · Mar 20, 2025 · Mar 18, 2025 · Mar 16, 2025
@@ -36,7 +36,7 @@ Now, create a service account with sufficient permissions (at least "Google Batc
 
 If you are not the owner of the GCP project you are working on, make sure you have sufficient permissions for creating and managing service accounts; if not, request the project owner for the same.
 
-- Create a service account [here](https://cloud.google.com/storage/docs/creating-buckets)
+- Create a service account [here](https://cloud.google.com/iam/docs/service-accounts-create)
 - Choose the service account name as per your choice, but at least provide the role of "Google Batch Service Agent" to the account. 
 
 You might also want to understand roles in GCP, you can find the official documentation [here](https://cloud.google.com/iam/docs/understanding-roles).

@@ -5,7 +5,7 @@
 * Nginx (Other possible when modifying the sample download section)
 * Python 3 (Flask and other dependencies)
 * MySQL
-* Pure-FTPD with mysql
+* Pure-FTPD with mysql (optional, only needed for FTP file uploads)
 
 ## Configuring Google Cloud Platform
 
@@ -40,9 +40,9 @@ For deployment of the platform on a Google Cloud VM instance, one would require
 Windows Server 2019 Datacenter
                 - Boot type disk: Balanced persistent disk
                 - Size: 50GB
-        - Choose the service account as the service account you just created for the platform.
-        - Select the "Allow HTTP traffic" and "Allow HTTPS traffic" checkboxes.
-        - Navigate to Advanced options -> Networking -> Network Interfaces -> External IPv4 address, and click on Create IP Address and reserve a new static external IP address for the platform.
+        - Navigate to Security and choose the service account as the service account you just created for the platform.
+        - Navigate to Network and select the "Allow HTTP traffic" and "Allow HTTPS traffic" checkboxes.
+        - Under Network Interfaces -> default, reserve a new static external IPv4 address for the platform.
 
 2. Setting up firewall settings
 
@@ -67,26 +67,27 @@ Windows Server 2019 Datacenter
 Clone the latest sample-platform repository from 
 https://github.com/CCExtractor/sample-platform.
 Note that root (or sudo) is required for both installation and running the program.
-The `sample-repository` directory needs to be accessible by `www-data`. The
+The `sample-platform` directory needs to be accessible by `www-data`, the
 recommended directory is thus `/var/www/`.
 
 ```
 cd /var/www/
 sudo git clone https://github.com/CCExtractor/sample-platform.git
 ```
+Place the service account key file that you generated earlier at the root of the sample-platform folder.
 
-### Mounting the bucket
+#### Mounting the bucket
 
 Mounting on Linux OS can be done using [Google Cloud Storage FUSE](https://cloud.google.com/storage/docs/gcs-fuse).
 
 Steps:
-- Install gcsfuse using [official documentation](https://github.com/GoogleCloudPlatform/gcsfuse/blob/master/docs/installing.md) or using the following script 
+- Install gcsfuse using [official documentation](https://cloud.google.com/storage/docs/cloud-storage-fuse/install) or using the following script
     ```
     curl -L -O https://github.com/GoogleCloudPlatform/gcsfuse/releases/download/v0.39.2/gcsfuse_0.39.2_amd64.deb
     sudo dpkg --install gcsfuse_0.39.2_amd64.deb
     rm gcsfuse_0.39.2_amd64.deb
     ```
-- Now, there are multiple ways to mount the bucket, official documentation [here](https://github.com/GoogleCloudPlatform/gcsfuse/blob/master/docs/mounting.md). 
+- Now, there are multiple ways to mount the bucket, official documentation [here](https://cloud.google.com/storage/docs/cloud-storage-fuse/mount-bucket).
 
     For Ubuntu and derivatives, assuming `/repository` to be the location of samples to be configured, an entry can be added to `/etc/fstab` file, replace _GCS_BUCKET_NAME_ with the name of the bucket created for the platform:
     ```
@@ -99,32 +100,33 @@ Steps:
     sudo mount /repository
     ```
 
-You may check if the mount was successful and if the bucket is accessible by running `ls /repository` command.
+Note that **this directory needs to be accessible by the `www-data` user**, you can verify if the mount was successful by running `sudo -u www-data ls /repository`
 
 #### Troubleshooting: Mounting of Bucket
 
 In case you get "permission denied" for `/repository`, you can check for the following reasons:
 1. Check if the service account created has access to the GCS bucket.
 2. Check the output of `sudo mount /repository` command.
+3. Check the directory permissions for `/repository`
 
-Place the service account key file at the root of the sample-platform folder. 
 
 #### MySQL installation
 The platform has been tested for MySQL v8.0 and Python 3.7 to 3.9. 
 
 It is recommended to install python and MySQL beforehand to avoid any inconvenience. Here is the [installation link](https://www.digitalocean.com/community/tutorials/how-to-install-mysql-on-ubuntu-22-04) of MySQL on Ubuntu 22.04.
 
-Next, navigate to the `install` folder and run `install.sh` with root 
+#### Installing The Platform
+Next, navigate to the `install` folder and run `install.sh` with root
 permissions.
 
 ```
 cd sample-platform/install/
 sudo ./install.sh
-```    
+```
 
 The `install.sh` will begin downloading and updating all the necessary dependencies. Once done, it'll ask to enter some details in order to set up the sample-platform. After filling in these details, the platform should be ready for use.
 
-Please read the below troubleshooting notes in case of any error or doubt.
+When the domain is asked during installation, enter the domain name that will run the platform. E.g., if the platform will run locally, enter `localhost` as the server name.
 
 ### Windows
 
@@ -156,9 +158,6 @@ or platform configuration.**
 it's **recommended** to use a valid certificate. 
 [Let's Encrypt](https://letsencrypt.org/) offers free certificates. For local
 testing, a self-signed certificate can be enough.
-* When the server name is asked during installation, enter the domain name 
-that will run the platform. E.g., if the platform will run locally, enter 
-`localhost` as the server name.
 * In case of a `502 Bad Gateway` response, the platform didn't start 
 correctly. Manually running `bootstrap_gunicorn.py` (as root!) can help to 
 determine what goes wrong. The snippet below shows how this can be done:
@@ -189,19 +188,27 @@ After the completion of the automated installation of the platform, the followin
 - `TestResults/` - Direction containing regression test results
 - `vm_data/` - Directory containing test-specific subfolders, each folder containing files required for testing to be passed to the VM instance, test files and CCExtractor build artefact.
 
-Now for tests to run, we need to download the [CCExtractor testsuite](https://github.com/CCExtractor/ccx_testsuite) release file, extract and put it in `TestData/ci-linux` and `TestData/ci-windows` folders.
+Now for tests to run, we need to download the [CCExtractor testsuite](https://github.com/CCExtractor/ccx_testsuite) release file, extract and put it in the `TestData/ci-linux` and `TestData/ci-windows` folders.
 
-## GCS configuration to serve file downloads using Signed URLs
+You also need to create a shell script named `ccextractortester` and place it in both `ci-linux` and `ci-windows`. This script is meant to launch the testsuite binary, here is what it should look like for Linux:
+```sh
+#!/bin/bash
+exec mono CCExtractorTester.exe "$@"
+```
 
-To serve file downloads directly from the private GCS bucket, Signed download URLs have been used.
+## Setting up GitHub webhooks
 
-The `serve_file_download` function in the `utility.py` file implements the generation of a signed URL for the file to be downloaded that would expire after a configured time limit (maximum limit: 7 days) and redirects the client to the URL.
+Now that the server is running, you can queue new tests either manually (via `/custom/`) or automatically through GitHub Actions.
 
-For more information about Signed URLs, you can refer to the [official documentation](https://cloud.google.com/storage/docs/access-control/signed-urls).
+To queue a test whenever a new commit/PR is made, you need to create a GitHub [webhook](https://docs.github.com/en/webhooks/about-webhooks) on the ccextractor repository (or fork of it):
+- Set the payload URL to `https://<your_domain>/start-ci`
+- Set the content type to JSON.
+- Enter the same secret that you used during installation (`GITHUB_CI_KEY`)
+- Select the Push, PR and Issue events as triggers.
 
 ## Setting up cron job to run tests
 
-Now the server being running, new tests would be queued and therefore a cron job is to be setup to run those tests.
+To run the new tests that are being queued up, a cron job is required.
 The file `mod_ci/cron.py` is to be run in periodic intervals. To setup a cron job follow the steps below:
 1. Open your terminal and enter the command `sudo crontab -e`.
 2. To setup a cron job that runs this file every 10 minutes, append this at the bottom of the file
@@ -210,6 +217,14 @@ The file `mod_ci/cron.py` is to be run in periodic intervals. To setup a cron jo
     ```
     Change the `/var/www/sample-plaform` directory, if you have installed the platform in a different directory.
 
+## GCS configuration to serve file downloads using Signed URLs
+
+To serve file downloads directly from the private GCS bucket, Signed download URLs have been used.
+
+The `serve_file_download` function in the `utility.py` file implements the generation of a signed URL for the file to be downloaded that would expire after a configured time limit (maximum limit: 7 days) and redirects the client to the URL.
+
+For more information about Signed URLs, you can refer to the [official documentation](https://cloud.google.com/storage/docs/access-control/signed-urls).
+
 ## File upload size for HTTP
 
 In order to accept big files through HTTP uploads, some files need to be 

@@ -11,10 +11,12 @@
 def run():
     from database import create_session
     from mod_auth.models import User
+    from mod_customized.models import CustomizedTest
     from mod_home.models import CCExtractorVersion, GeneralData
     from mod_regression.models import (Category, InputType, OutputType,
                                        RegressionTest, RegressionTestOutput)
     from mod_sample.models import Sample
+    from mod_test.models import Test
     from mod_upload.models import Upload
 
     db = create_session(sys.argv[1])

@@ -1348,8 +1348,14 @@
 
             for p in times:
                 parts = p.time.split(',')
-                start = datetime.datetime.strptime(parts[0], '%Y-%m-%d %H:%M:%S.%f')
-                end = datetime.datetime.strptime(parts[-1], '%Y-%m-%d %H:%M:%S.%f')
+                try:
+                    # Try parsing with microsecond precision first
+                    start = datetime.datetime.strptime(parts[0], '%Y-%m-%d %H:%M:%S.%f')
+                    end = datetime.datetime.strptime(parts[-1], '%Y-%m-%d %H:%M:%S.%f')
+                except ValueError:
+                    # Fall back to format without microseconds
+                    start = datetime.datetime.strptime(parts[0], '%Y-%m-%d %H:%M:%S')
+                    end = datetime.datetime.strptime(parts[-1], '%Y-%m-%d %H:%M:%S')
                 total_time += int((end - start).total_seconds())
 
             if len(times) != 0:
@@ -1377,7 +1383,7 @@
 
             last_running_test = end_time - start_time
             updated_average = updated_average + last_running_test.total_seconds()
-            current_average.value = updated_average // number_test
+            current_average.value = 0 if number_test == 0 else updated_average // number_test
             g.db.commit()
             log.info(f'average time updated to {str(current_average.value)}')
 

@@ -151,6 +151,7 @@ def get_data_for_test(test, title=None) -> Dict[str, Any]:
         average_prep_time = int(float(GeneralData.query.filter(GeneralData.key == prep_average_key).first().value))
 
         test_progress_last_entry = g.db.query(func.max(TestProgress.test_id)).first()
+        last_test_id = test_progress_last_entry[0] if test_progress_last_entry is not None else 0
         queued_gcp_instance = g.db.query(GcpInstance.test_id).filter(GcpInstance.test_id < test.id).subquery()
         queued_gcp_instance_entries = g.db.query(Test.id).filter(
             and_(Test.id.in_(queued_gcp_instance), Test.platform == test.platform)
@@ -159,7 +160,7 @@ def get_data_for_test(test, title=None) -> Dict[str, Any]:
             TestProgress.timestamp))).filter(TestProgress.test_id.in_(queued_gcp_instance_entries)).group_by(
             TestProgress.test_id).all()
         number_gcp_instance_test = g.db.query(Test.id).filter(
-            and_(Test.id > test_progress_last_entry[0], Test.id < test.id, Test.platform == test.platform)
+            and_(Test.id > last_test_id, Test.id < test.id, Test.platform == test.platform)
         ).count()
         average_duration = float(GeneralData.query.filter(GeneralData.key == var_average).first().value)
         queued_tests = number_gcp_instance_test