Skip to content

Conversation

@Adnan-cds
Copy link
Contributor

@Adnan-cds Adnan-cds commented Sep 20, 2024

What does this change?

Adds a submodule to store Webform submission data into a separate database for long term storage (LTS).

Personally identifiable information (PII) is optionally redacted before copying into the LTS database.

How to test

Please see the Setup and Inspection sections of the README.

How can we measure success?

Webform submissions should get copied into the LTS database during cron runs.

Have we considered potential risks?

PII redaction may miss relevant pieces information. Current redaction rules are as follows:

  • All Webform elements of type email, telephone, number, address, localgov_forms_dob.
  • Any element with the following happening in its machine id: name, mail, phone, date_of_birth, personal, title, gender, sex, ethnicity, post_code.
  • All textareas are cleaned of email, postcode, and any number. Email detection uses a simple regex which may miss complex email address formats.

Long term storage of Webform submission data in an off-site database.  Work in
progress.
Alternate Webform submission entity storage class for storing and retrieving
entities from the LTS database.
- PHP class to copy Webform submissions to Long term storage.
- Cron job to periodically copy Webform submissions to Long term storage.
- Deploy hook to copy Webform submissions to Long term storage.
- Personally Identifiable Information redaction before copying Webform
  submissions to Long term storage.
- A requirement hook implementation to warn against misconfiguration.
Fixed return value of LtsCopy::findCopyTargets().

Also, ensured that no Webform handlers run while Webform submissions are copied
to Long term storage.
Updated scope of Personally Identifiable Information.
Listing and viewing Webform submissions in Long term storage.
Fix for Personally Identifiable Information redaction.  Date elements from this
module were unaccounted for.
Spots and redacts any email address, postcode, and number within the text value
of a textarea field.
Tests following functionalities:
- LtsCopy's ability to copy both new and existing Webform submissions.
- LtsStorageForWebformSubmission's ability to copy into the LTS database.
- PIIRedactor's ability to correctly identify which Webform fields should be
  redacted.
PHPStan is incorrectly inferring the type of the $entity_type object used within
_localgov_forms_lts_get_webform_submission_storage_schema().

Inferred type is Drupal\Core\Entity\EntityTypeInterface.
Expected type is Drupal\Core\Entity\ContentEntityTypeInterface.
Actual type is Drupal\Core\Entity\ContentEntityType which is an instance of
Drupal\Core\Entity\ContentEntityTypeInterface.
@Adnan-cds Adnan-cds requested a review from finnlewis September 21, 2024 10:00
@finnlewis finnlewis self-assigned this Oct 17, 2024
@finnlewis
Copy link
Member

finnlewis commented Oct 17, 2024

Very cool @Adnan-cds .

One question, if we're removing / redacting personal information, what's the motivation for storing the data in a separate database?

@Adnan-cds
Copy link
Contributor Author

if we're removing / redacting personal information, what's the motivation for storing the data in a separate database?

This is primarily for reporting. The idea is, you connect the separate database with a reporting or data warehousing tool for further reporting and analysis.

@finnlewis finnlewis requested review from ekes and stephen-cox October 17, 2024 11:44
@finnlewis
Copy link
Member

@stephen-cox @ekes any chance of a quick code review on this one for @Adnan-cds ?

@finnlewis
Copy link
Member

@ekes is happy to read review the code this week.

@finnlewis
Copy link
Member

Discussing with @ekes and @Adnan-cds

The code looks good in general!

Discussing use cases, we think that it might be usefult to separate the redaction from the storage in a separate database:

a) some people might want redaction before storing int he Drupal database.
b) some people might want to store submissions in a separate LTS database without redacting

Also noting that this module was released today: https://www.drupal.org/project/webform_sanitize_submissions and there is also https://www.drupal.org/project/webform_sanitize

We should have a choice of PII redaction plugins before saving Webform
submissions in the long term storage database.

WIP.
@Adnan-cds Adnan-cds marked this pull request as draft October 29, 2024 10:38
Moved PII redaction functionality from the localgov_forms_lts submodule into the
localgov_forms module in the form of a plugin.
Adding the `localgov_forms_lts:copy` Drush command.  This command copies all
existing Webform submissions into the Long Term Storage (LTS).  This is
particularly useful soon after the localgov_forms_lts module has been installed.
A config form for:
- Activating or deactivating copying to LTS database.
- Selecting a PII redactor plugin if PII redaction is necessary.
These constants represent the newly added localgov_forms_lts module
configuration.
@Adnan-cds
Copy link
Contributor Author

Changes since last review:

  • Added a settings form for enabling or disabling LTS storage. This should be useful in dev/stage sites where there's no need for LTS. By default, LTS storage is disabled. Settings form path: /admin/structure/webform/config/submissions-lts
  • PII redaction functionality is now available as a plugin. The existing PII redaction code is present as a selectable plugin in the above mentioned settings form. By default, no plugin is selected meaning no PII redaction happens while Webform submissions are copied to LTS. I also looked at the two suggested modules (webform_sanitize_submissions and webform_sanitize) and found that their code are not really reusable. So added a plugin manager instead.
  • Removed the Drush deploy hook in favour of a Drush command drush localgov_forms_lts:copy. This gives some more control over the deployment process.
  • Updated the README accordingly.

Ready for review once again. @finnlewis @ekes

@Adnan-cds Adnan-cds marked this pull request as ready for review November 5, 2024 12:44
Copy link
Member

@ekes ekes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤯

Well it works, and it was copying stuff over to my database. Code review we've already discussed. So we're good there. I was wondering if it was possible to use the sample plugin PII redaction per-form; but I'm guessing not yet.

@Adnan-cds
Copy link
Contributor Author

Thanks for the review and approval. Must have taken a good few hours :(

I was wondering if it was possible to use the sample plugin PII redaction per-form; but I'm guessing not yet.

That's right. There is no per-form config settings. At some point, I would like to write another plugin that can be activated for each field in a Webform. But at this point, we just want to find out if this submodule runs into any unforeseen issues at scale.

@Adnan-cds Adnan-cds merged commit adc94e6 into 1.x Nov 20, 2024
@Adnan-cds Adnan-cds mentioned this pull request Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants