This is a tool for Lemmy Administrators to easily check and clean all images in the pict-rs storage for illegal or unethical content
Note, this script does not save any images locally and it does not send images to any extenal services. All images are stored in RAM only, checked and then forgotten.
Due to the way lemmy and pict-rs works, instance admins do not have sufficient means to check for CSAM, which puts them in big risks as image thumbnails from foreign instances are cached by default to their own object storage.
There's two big potential problems:
- Malicious users can simply open a new post, upload an image and cancel the new post, and that image will then be invisibly hosted by their instance among thousands of others with a URL known only by the malicious user. That user could then contact their provider anonymously forwarding that URL, and try to take their lemmy instance down
- Users on different instances with looser controls can upload CSAM posts and if those instances subscribed by any user in your own instance those image thumbnails will be cached to your own instance. Even if the relevant CSAM post is deleted, such images will persists in your object storage.
The lemmy safety will go directly through your pict-rs storage (either object storage or filesysystem) and scan each image for potential CSAM and automatically delete it. Covering both those problems in one go. You can also run this script constantly, to ensure no new such images can survive.
The results will also be written in an sqlite DB, which can then be used to follow-up and discover the user and instances uploading them.
Note. This tool is a blunt instrument. It is accurate enough to catch most CSAM, but not to mark only CSAM. Check the False positives and False negatives section.
This script uses your GPU to clip interrogate images and then use the results to determine if the image is a possible CSAM.
This means you need a GPU and the more powerful your GPU, the faster you can process your images.
- Install python>=3.10
- install requirements:
python -m pip install -r requirements.txt
- Copy
env_example
to.env
, then edit.env
following instructions below based on the type of storage your pict-rs is using
Use this option is you have installed pictrs-safety and set your pict-rs to validate images
- Start the script
lemmy_safety_pictrs.py
. Use-t
to specify number of threads. The more powerful your GPU, the more threads you can have.
This will run forever, polling pictrs-safety every 0.1 seconds for new images and will return a boolean with the result of the csam detection
Use this option when you have configured pict-rs to store its image in an AWS S3-compatible object storage
- Add your Object Storage credentials and connection info to
.env
- Start the script
fedi_safety_object_storage.py
Use this option when your pict-rs is running on a remote linux server where you have ssh access
- Add your pict-rs server ssh credentials and pict-rs paths to
.env
- Start the script
fedi_safety_remote_storage.py
Deleting local storage pict-rs requires an account with read/write access to the pict-rs files. You should also have set up public key authentication for that account.
Use this option when your pict-rs is on the same system you're running this script
- Add your pict-rs file location to
.env
- Start the script
fedi_safety_local_storage.py
Deleting local storage pict-rs requires an account with read/write access to the pict-rs files.
The script will record all image checked in an sqlite db called lemmy_safety.db
which will prevent it from checking the same image twice.
The script has two methods: all
and daemon
Running with the cli arg --all
will loop through all the images in your object storage and check each of them for CSAM.
Any potential image will be automatically deleted and its ID recorded in the DB for potential follow-up.
Running without the -all
arg will make the script run constantly and check all images uploaded in the past 20 minutes (can be changed using --minutes
).
Any potential image will be automatically deleted and its ID recorded in the DB for potential follow-up.
The daemon will then endlessly repeat this process after a 30 seconds wait.
Please see the dedicated instructions
The script has the potential to detect wrongly of course as the clip model is not perfect. However the library used for checking for CSAM has been robustly checked through the AI Horde and has an acceptable false-positive ratio given the risk of the alternatives.
If you are concerned about deleting too many, or not deleting enough, or want to follow-up first before taking action, you can use the --dry_run
cli arg to mark the found csam but avoid deleting them.
Roughly speaking, this tool will mark a lot of false positives. This is normal. You should be worried if it wasn't catching any false positives since it would mean potential images slipping through.
On average, <1% of all your images will be picked by this tool, most of which should either be NSFW or have children subjects.
So yes, you will lose some legitimate images, but you almost ensure you won't get CSAM as well. I will leave the cost-benefit ratio calculations to you.
Other than the classic AGPL disclaimer about me making no guarantees about this tool, I also need to mention that different juristictions in the world have different approaches to CSAM. For example some require that you send every potential positive to authorities. How that works with a tool like this which casts a very wide net is unclear.
If you are worried enough, you should consult a local lawyer.
If you want to improve this tool, feel free to send PRs.
Alternatively feel free to support my development efforts on patreon or github