Render.com RAG Service | James Murray

James Murray has implemented a scalable Flask-based RAG service hosted on Render.com. This service, integrated with a PHP proxy, automatically scales to handle varying loads while providing efficient AI-powered query responses in real time.

The service auto-scales from 0 to 32 instances based on CPU, with cold starts under 2s. It uses connection pooling and Redis caching for 300ms p95 latency.

Key Features

Autoscaling RAG Endpoints: Render.com native scaling with zero config.
Flask Backend: Lightweight, async-ready with Gunicorn.
PHP Proxy Integration: Seamless auth and rate limiting from PHP apps.
Redis Caching: 5-minute TTL on hot queries.
Observability: OpenTelemetry + Render logs.
Graceful Degradation: Fallback to keyword search on LLM failure.

System Design & Architecture

The service is designed for high scalability, with dynamic adjustments based on incoming traffic. It leverages Flask for backend logic, with seamless integration into a PHP-based web interface.

Technical Stack

Backend: Flask + Gunicorn
Vector DB: Weaviate
Cache: Redis
Hosting: Render.com

Related Projects

Explore other related projects: