A Spring Boot-based web application that provides a chat interface to interact with various Ollama AI models. This project demonstrates modern Java/Spring Boot development practices, including REST API design, error handling, retry mechanisms, and configuration management.
This application serves as a backend service that:
- Exposes a REST API for interacting with Ollama AI models
- Implements robust error handling and retry mechanisms
- Demonstrates clean architecture principles
- Provides configurable settings for different deployment scenarios
- Spring Boot REST API Development
- Retry Mechanisms with Spring Retry
- Configuration Management
- Exception Handling
- Dependency Injection
- **API Documentation (via code comments)
- Interactive chat interface
- Multiple AI model support (switchable from UI)
- Persistent model selection using localStorage
- Responsive design
- Error handling and retry mechanism
- Configurable timeouts
- Real-time status updates
βββββββββββββββ βββββββββββββββββββββββββββββββββββββββ βββββββββββββββββ
β Client βββββΆβ Spring Boot App βββββΆβ Ollama β
β (Browser/ ββββββ βββββββββββββ βββββββββββββ ββββββ API β
β Postman) β β β ControllerββββββΊβ Service β β βββββββββββββββββ
βββββββββββββββ β βββββββββββββ βββββββββββββ β
β β² β² β
β β β β
β βββββββββ΄ββββββ ββββββββ΄βββββββ β
β β Retry β β Validation β β
β β Mechanism β β & Error β β
β βββββββββββββββ ββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββ
- Responsibility: Handle HTTP requests and responses
- Key Features:
- REST endpoint for AI queries (
/api/ai/ask
) - Request validation
- Error handling with proper HTTP status codes
- Clean separation of concerns
- REST endpoint for AI queries (
- Responsibility: Business logic and external API communication
- Key Features:
- Retry mechanism for transient failures
- Timeout configuration
- Model selection
- Response processing
- Responsibility: Application-wide configuration
- Key Features:
- Retry policy with exponential backoff
- Configurable through properties
- Clean separation of configuration from business logic
- Java 17 or higher
- Maven
- Node.js (for frontend assets if needed)
- Ollama installed and running locally
-
Clone the repository
git clone <repository-url> cd demo_ollama
-
Build the application
mvn clean install
-
Start the application
mvn spring-boot:run
-
Access the application at
http://localhost:8080
Configuration is managed through application.properties
:
# Server Configuration
server.port=8080
# Ollama API Configuration
ollama.api.url=http://localhost:11434/api/generate
ollama.model=deepseek-coder:1.3b # Default model
ollama.timeout.seconds=300 # 5 minutes timeout
Sends a prompt to the AI model and returns the response.
Query Parameters:
prompt
(required): The input text for the AImodel
(optional): The AI model to use (default: llama3:8b)
Example Request:
GET /api/ai/ask?prompt=Hello%20world&model=llama3:8b
Success Response (200 OK):
"Hello! How can I assist you today?"
src/
βββ main/
β βββ java/
β β βββ com/example/ollamaAi/demo_ollama/
β β βββ config/ # Configuration classes
β β β βββ WebConfig.java # Web-specific configurations
β β β
β β βββ DemoOllamaApplication.java # Main application class
β β βββ OllamaController.java # REST Controller (API endpoints)
β β βββ OllamaService.java # Core business logic
β β βββ RetryConfig.java # Retry mechanism configuration
β β
β βββ resources/
β βββ static/ # Frontend assets
β β βββ css/ # Stylesheets
β β βββ js/ # Client-side JavaScript
β β βββ index.html # Main HTML file
β βββ application.properties # Application configuration
- Purpose: Handles HTTP requests and responses
- Key Methods:
ask(String prompt, String model)
: Main endpoint handlerhandleTypeMismatch()
: Global exception handler
- Interview Talking Points:
- REST API design principles
- Exception handling strategies
- Request validation
- Response entity usage
- Purpose: Contains business logic and external API communication
- Key Features:
@Retryable
for automatic retries- Configurable timeouts
- Model selection logic
- Interview Talking Points:
- Retry patterns
- Timeout handling
- External service integration
- Thread safety considerations
- Purpose: Configures retry behavior
- Key Components:
RetryTemplate
with exponential backoff- Configurable retry policies
- Interview Talking Points:
- Circuit breaker vs retry patterns
- Backoff strategies
- Configuration best practices
sequenceDiagram
participant C as Client
participant CT as Controller
participant S as Service
participant O as Ollama API
C->>CT: GET /api/ai/ask?prompt=Hello&model=llama3:8b
Note over CT: Validates input parameters
CT->>S: askOllama("Hello", "llama3:8b")
sequenceDiagram
participant S as Service
participant RT as RetryTemplate
participant O as Ollama API
S->>RT: Execute with retry
RT->>S: First attempt
S->>O: HTTP POST /api/generate
O-->>S: Response or Timeout
alt Success
S-->>RT: Return success
else Failure
RT->>RT: Wait (exponential backoff)
RT->>S: Retry (max 3 attempts)
end
S-->>CT: Return response
-
Success Path:
- Service receives response from Ollama
- Processes and returns the response
- Controller wraps in ResponseEntity with 200 OK
-
Error Path:
- If validation fails: 400 Bad Request
- If service fails after retries: 500 Internal Server Error
- If request times out: 504 Gateway Timeout
- Max Attempts: 3 (initial + 2 retries)
- Backoff: Exponential (1s, 2s, 4s)
- Max Delay: 10 seconds between retries
- Retry On: SocketTimeoutException, IOException
Answer: The application uses Spring Retry with the following configuration:
- Max Attempts: 3 (1 initial + 2 retries)
- Backoff Policy: Exponential (1s, 2s, 4s)
- Max Delay: 10 seconds between retries
- Retry Conditions: SocketTimeoutException and IOException
Implementation:
@Retryable(
value = {SocketTimeoutException.class, IOException.class},
maxAttempts = 3,
backoff = @Backoff(delay = 1000, multiplier = 2)
)
public String askOllama(String prompt, String model) {
// Implementation
}
Answer: To handle rate limiting, I would:
- Implement a circuit breaker pattern using Resilience4j
- Add rate limit headers to responses
- Implement a token bucket or leaky bucket algorithm
- Add proper HTTP 429 (Too Many Requests) handling
Answer: Configuration is managed through:
application.properties
for environment-agnostic settings@Value
annotations for property injection@Configuration
classes for bean definitions- Environment variables for sensitive data
Answer: Improvements could include:
- Custom exception hierarchy
- Global exception handler with
@ControllerAdvice
- Detailed error responses with error codes
- Structured logging for better debugging
- Circuit breaker pattern for cascading failures
Answer:
- Spring's default thread-per-request model
- Stateless services ensure thread safety
- Connection pooling for HTTP client
- Timeout configurations prevent resource exhaustion
- Spring Boot 3.x: Core framework
- Spring Web: REST API support
- Spring Retry: Retry mechanism
- Lombok: Boilerplate reduction
- Jackson: JSON processing
- JUnit 5 & Mockito: Unit testing
- Spring Boot DevTools: Development tools
- HTML5, CSS3, JavaScript
- Fetch API for AJAX calls
- Responsive design
- Maven: Build tool
- Git: Version control
- IDE: IntelliJ IDEA or VS Code
- Add API documentation with Swagger/OpenAPI
- Implement request/response logging
- Add health check endpoint
- Unit test coverage improvement
- Add user authentication (JWT/OAuth2)
- Implement conversation history with database
- Add rate limiting
- Containerization with Docker
- Add WebSocket support for real-time updates
- Implement model performance metrics
- Add support for file uploads and processing
- Multi-tenant support
- Spring Boot Documentation: https://spring.io/projects/spring-boot
- Ollama API Documentation: https://ollama.ai/docs
- REST API Best Practices: https://restfulapi.net/
- Spring Retry: https://docs.spring.io/spring-batch/docs/current/reference/html/retry.html
- Microservices Patterns: https://microservices.io/patterns/
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a new Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
This project is licensed under the MIT License - see the LICENSE file for details. "# Offline_Ollama-Inference"