# Skinbase Vision Stack — Usage Guide This document explains how to run and use the Skinbase Vision Stack (Gateway + CLIP, BLIP, YOLO, Qdrant services). ## Overview - Services: `gateway`, `clip`, `blip`, `yolo`, `qdrant`, `qdrant-svc` (FastAPI each, except `qdrant` which is the official Qdrant DB). - Gateway is the public API endpoint; the other services are internal. ## Model overview - **CLIP**: Contrastive Language–Image Pretraining — maps images and text into a shared embedding space. Used for zero-shot image tagging, similarity search, and returning ranked tags with confidence scores. - **BLIP**: Bootstrapping Language-Image Pre-training — a vision–language model for image captioning and multimodal generation. BLIP produces human-readable captions (multiple `variants` supported) and can be tuned with `max_length`. - **YOLO**: You Only Look Once — a family of real-time object-detection models. YOLO returns detected objects with `class`, `confidence`, and `bbox` (bounding box coordinates); use `conf` to filter low-confidence detections. - **Qdrant**: High-performance vector similarity search engine. Stores CLIP image embeddings and enables reverse image search (find similar images). The `qdrant-svc` wrapper auto-embeds images via CLIP before upserting. ## Prerequisites - Docker Desktop (with `docker compose`) or a Docker environment. - Recommended: at least 8GB RAM for CPU-only; more for model memory or GPU use. ## Start the stack Run from repository root: ```bash docker compose up -d --build ``` Stop: ```bash docker compose down ``` View logs: ```bash docker compose logs -f docker compose logs -f gateway ``` ## Health Check the gateway health endpoint: ```bash curl https://vision.klevze.net/health ``` ## Universal analyze (ALL) Analyze an image by URL (gateway aggregates CLIP, BLIP, YOLO): ```bash curl -X POST https://vision.klevze.net/analyze/all \ -H "Content-Type: application/json" \ -d '{"url":"https://files.skinbase.org/img/aa/bb/cc/md.webp","limit":5}' ``` File upload (multipart): ```bash curl -X POST https://vision.klevze.net/analyze/all/file \ -F "file=@/path/to/image.webp" \ -F "limit=5" ``` Parameters: - `limit`: optional integer to limit returned tag/caption items. ## Individual services (via gateway) These endpoints call the specific service through the gateway. ### CLIP — tags URL request: ```bash curl -X POST https://vision.klevze.net/analyze/clip \ -H "Content-Type: application/json" \ -d '{"url":"https://files.skinbase.org/img/aa/bb/cc/md.webp","limit":5}' ``` File upload: ```bash curl -X POST https://vision.klevze.net/analyze/clip/file \ -F "file=@/path/to/image.webp" \ -F "limit=5" ``` Return: JSON list of tags with confidence scores. ### BLIP — captioning URL request: ```bash curl -X POST https://vision.klevze.net/analyze/blip \ -H "Content-Type: application/json" \ -d '{"url":"https://files.skinbase.org/img/aa/bb/cc/md.webp","variants":3}' ``` File upload: ```bash curl -X POST https://vision.klevze.net/analyze/blip/file \ -F "file=@/path/to/image.webp" \ -F "variants=3" \ -F "max_length=60" ``` Parameters: - `variants`: number of caption variants to return. - `max_length`: optional maximum caption length. Return: one or more caption strings (optionally with scores). ### YOLO — object detection URL request: ```bash curl -X POST https://vision.klevze.net/analyze/yolo \ -H "Content-Type: application/json" \ -d '{"url":"https://files.skinbase.org/img/aa/bb/cc/md.webp","conf":0.25}' ``` File upload: ```bash curl -X POST https://vision.klevze.net/analyze/yolo/file \ -F "file=@/path/to/image.webp" \ -F "conf=0.25" ``` Parameters: - `conf`: confidence threshold (0.0–1.0). Return: detected objects with `class`, `confidence`, and `bbox` (bounding box coordinates). ### Qdrant — vector storage & similarity search The Qdrant integration lets you store image embeddings and find visually similar images. Embeddings are generated automatically by the CLIP service. #### Upsert (store) an image by URL ```bash curl -X POST https://vision.klevze.net/vectors/upsert \ -H "Content-Type: application/json" \ -d '{"url":"https://files.skinbase.org/img/aa/bb/cc/md.webp","id":"img-001","metadata":{"category":"wallpaper","source":"upload"}}' ``` Parameters: - `url` (required): image URL to embed and store. - `id` (optional): custom string ID for the point; auto-generated if omitted. - `metadata` (optional): arbitrary key-value payload stored alongside the vector. - `collection` (optional): target collection name (defaults to `images`). #### Upsert by file upload ```bash curl -X POST https://vision.klevze.net/vectors/upsert/file \ -F "file=@/path/to/image.webp" \ -F 'id=img-002' \ -F 'metadata_json={"category":"photo"}' ``` #### Upsert a pre-computed vector ```bash curl -X POST https://vision.klevze.net/vectors/upsert/vector \ -H "Content-Type: application/json" \ -d '{"vector":[0.1,0.2,...],"id":"img-003","metadata":{"custom":"data"}}' ``` #### Search similar images by URL ```bash curl -X POST https://vision.klevze.net/vectors/search \ -H "Content-Type: application/json" \ -d '{"url":"https://files.skinbase.org/img/aa/bb/cc/md.webp","limit":5}' ``` Parameters: - `url` (required): query image URL. - `limit` (optional, default 5): number of results. - `score_threshold` (optional): minimum cosine similarity (0.0–1.0). - `filter_metadata` (optional): filter results by metadata, e.g. `{"category":"wallpaper"}`. - `collection` (optional): collection to search. Return: list of `{"id", "score", "metadata"}` sorted by similarity. #### Search by file upload ```bash curl -X POST https://vision.klevze.net/vectors/search/file \ -F "file=@/path/to/image.webp" \ -F "limit=5" ``` #### Search by pre-computed vector ```bash curl -X POST https://vision.klevze.net/vectors/search/vector \ -H "Content-Type: application/json" \ -d '{"vector":[0.1,0.2,...],"limit":5}' ``` #### Collection management List all collections: ```bash curl https://vision.klevze.net/vectors/collections ``` Get collection info: ```bash curl https://vision.klevze.net/vectors/collections/images ``` Create a custom collection: ```bash curl -X POST https://vision.klevze.net/vectors/collections \ -H "Content-Type: application/json" \ -d '{"name":"my_collection","vector_dim":512,"distance":"cosine"}' ``` Delete a collection: ```bash curl -X DELETE https://vision.klevze.net/vectors/collections/my_collection ``` #### Delete points ```bash curl -X POST https://vision.klevze.net/vectors/delete \ -H "Content-Type: application/json" \ -d '{"ids":["img-001","img-002"]}' ``` #### Get a point by ID ```bash curl https://vision.klevze.net/vectors/points/img-001 ``` ## Request/Response notes - For URL requests use `Content-Type: application/json`. - For uploads use `multipart/form-data` with a `file` field. - The gateway aggregates and normalizes outputs for `/analyze/all`. ## Running a single service To run only one service via docker compose: ```bash docker compose up -d --build clip ``` Or run locally (Python env) from the service folder: ```bash # inside clip/ or blip/ or yolo/ uvicorn main:app --host 0.0.0.0 --port 8000 ``` ## Production tips - Add authentication (API keys or OAuth) at the gateway. - Add rate-limiting and per-client quotas. - Keep model services on an internal Docker network. - For GPU: enable NVIDIA runtime and update service Dockerfiles / compose profiles. ## Troubleshooting - Service fails to start: check `docker compose logs ` for model load errors. - High memory / OOM: increase host memory or reduce model footprint; consider GPUs. - Slow startup: model weights load on service startup — expect extra time. ## Extending - Swap or update models in each service by editing that service's `main.py`. - Add request validation, timeouts, and retries in the gateway to improve robustness. ## Files of interest - `docker-compose.yml` — composition and service definitions. - `gateway/` — gateway FastAPI server. - `clip/`, `blip/`, `yolo/` — service implementations and Dockerfiles. - `qdrant/` — Qdrant API wrapper service (FastAPI). - `common/` — shared helpers (e.g., image I/O). --- If you want, I can merge these same contents into the project `README.md`, create a Postman collection, or add example response schemas for each endpoint.