FormatForge — README

Overview

In Simple Terms: FormatForge turns a single photo into ready-to-use images for Amazon, Flipkart, Instagram, and more. Upload once, and get high-quality platform-specific assets instantly, with the ability to re-edit images on the fly using simple text instructions.

FormatForge is a Streamlit application that uses the Gemini image editing model to convert a single uploaded image into platform-ready assets (Amazon, Flipkart, Zomato, Swiggy, Instagram feed/story, OLX, Spotify album cover). The app accepts up to four source images, generates 1–4 variations per selected platform, and allows in-place modification of any generated image via a simple "Modify / Chat about this image" text box.

This README documents how the app works, how to run it locally, the code structure and important implementation details, and how the modify-workflow operates.

Quick start

Clone the repository (first step):

git clone https://github.com/iam-saiteja/SKU-Ready
cd "SKU-Ready"

Ensure Python 3.10+ is installed and create/activate a virtual environment:
```
python -m venv .venv; .\.venv\Scripts\Activate.ps1
```
Install required packages using the provided requirements.txt:
```
pip install -r requirements.txt
```
Run the app:
```
streamlit run work.py
```
Open the URL printed by Streamlit (usually http://localhost:8501) and enter your Gemini API key in the sidebar.

UI walkthrough

Sidebar: Enter your Gemini API key and inspect per-platform specifications.
Left column: Upload up to 4 images (jpg/png), pick platforms, choose how many images per platform (1–4), and optionally enter "Extra Instructions" (applies to all generations). Click Generate Formatted Images to start.
Right column: View generated assets grouped by platform. For each generated image you can:
- Download the image
- Enter a modification instruction in the "Modify / Chat about this image" box and click "Apply modification" to re-edit that generated image in-place.

Generation flow and prompts

The app encodes uploaded images to base64 and calls the Gemini image-editing model using the client.models.generate_content pattern with the model gemini-2.5-flash-image-preview.
Each platform has a strict prompt in PROMPTS containing mandatory changes and a resizing rule ("scale only, do not crop the subject"). When generating multiple angles, the app appends a simple angle hint such as "front view" or "left 45 degree angle".
If the optional Extra Instructions field is filled, its text is appended to the prompt for every generation.

Gemini Integration

FormatForge leverages Gemini 2.5 Flash Image's advanced image generation and editing capabilities to transform user-uploaded images into platform-optimized assets. Key features used include:

Image-to-Image Editing: The model accepts a text prompt and a PIL Image as input, generating edited images that adhere to specific requirements like background changes, cropping, and resizing rules.
Multi-Modal Response: Configured with response_modalities=['Text', 'Image'], the model returns both textual feedback and binary image data, parsed via response.parts.as_image() or candidates.inline_data.
In-Place Modifications: For re-edits, the app sends the previously generated image back to the model with user-provided modification instructions, enabling iterative refinements without losing context.

These features are central to the app's core functionality, enabling automated, AI-driven image formatting and customization for e-commerce and social media platforms, ensuring high-quality outputs with minimal user intervention.

Modify / re-edit workflow

Each generated image has a "Modify / Chat about this image" input. When you enter instructions and click Apply modification:
- The app marks the item busy and queues the modification.
- On rerun, the queued instruction is sent to the Gemini model along with the exact generated image as the input image.
- When the model returns an edited image, the app overwrites the same file on disk and updates the session-state entry for that image (flagged modified=True).
- The UI displays the updated image immediately.

Files and important functions

work.py — main Streamlit app. Key functions:
- encode_image(image: PIL.Image) -> str — encodes a PIL image to base64.
- validate_and_fix_b64(b64_str) -> Optional[str] — heuristically validates and repairs base64 image strings returned from the model.
- call_gemini_api(image_b64, prompt, platform) -> Optional[str] — calls the Gemini model and returns a base64-encoded image string.
- resize_image_file(path, width, height) — resizes saved files to exact pixel dimensions using Pillow LANCZOS resampling.
- safe_rerun() — attempts to call st.experimental_rerun() and falls back to toggling a session timestamp to force re-render.
generated/ — directory created by the app at runtime where generated images are saved.

Platform specifications

The PLATFORMS dict in work.py defines precise requirements per platform, including size, aspect-ratio, and required transformations. The app uses these to both compose prompts and to resize saved images after generation.

Security and limitations

API keys: Enter your Gemini API key in the sidebar. The app stores it in st.session_state only for the current browser session and does not persist it to disk.
Model responses: The app attempts to robustly parse different possible SDK response shapes. However, returned images depend on the model's behavior and prompt tuning.
Local storage: Generated images are saved into a local generated/ directory in the app folder. Remove or secure this directory as needed.

Troubleshooting

If Streamlit raises deprecation errors about use_column_width, the app uses use_container_width instead.
If rerun behavior does not immediately show updated images, try refreshing the browser; the app attempts to force a rerender but browser caching can interfere.
If the model returns non-image payloads, check logs printed in the Streamlit UI for debugging messages.

Development notes

The current implementation focuses on a simple, single-node developer flow. For production: add authentication, server-side storage, rate limits, robust retry/backoff, and user quotas.

Contact

Email: iamsaitejathanniru@gmail.com
Website: https://thannirusaiteja.me
LinkedIn: https://linkedin.com/in/thannirusaiteja

Collaborate

Anyone is welcome to collaborate on this project. Recommended workflow:

Fork the repository (https://github.com/iam-saiteja/FormatForge).
Create a descriptive feature branch and make your changes.
Open a Pull Request describing the change and any testing steps.
Open an Issue first if you prefer to discuss the idea before coding.

For quick collaboration requests or questions, use the email above or open an issue on the repository.

License

This project is provided free of charge under the MIT License. Full credit should be given to the original author "iam-saiteja" when reusing or redistributing the project. See the LICENSE file for the full license text.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.devcontainer		.devcontainer
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
work.py		work.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FormatForge — README

Overview

Contents

Quick start

UI walkthrough

Generation flow and prompts

Gemini Integration

Modify / re-edit workflow

Files and important functions

Platform specifications

Security and limitations

Troubleshooting

Development notes

Contact

Collaborate

License

About

Uh oh!

Languages

License

iam-saiteja/FormatForge

Folders and files

Latest commit

History

Repository files navigation

FormatForge — README

Overview

Contents

Quick start

UI walkthrough

Generation flow and prompts

Gemini Integration

Modify / re-edit workflow

Files and important functions

Platform specifications

Security and limitations

Troubleshooting

Development notes

Contact

Collaborate

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages