# Skinbase Vision Stack — Usage Guide This document explains how to run and use the Skinbase Vision Stack (Gateway + CLIP, BLIP, YOLO, Qdrant services). ## Overview - Services: `gateway`, `clip`, `blip`, `yolo`, `qdrant`, `qdrant-svc` (FastAPI each, except `qdrant` which is the official Qdrant DB). - Gateway is the public API endpoint; the other services are internal. ## Model overview - **CLIP**: Contrastive Language–Image Pretraining — maps images and text into a shared embedding space. Used for zero-shot image tagging, similarity search, and returning ranked tags with confidence scores. - **BLIP**: Bootstrapping Language-Image Pre-training — a vision–language model for image captioning and multimodal generation. BLIP produces human-readable captions (multiple `variants` supported) and can be tuned with `max_length`. - **YOLO**: You Only Look Once — a family of real-time object-detection models. YOLO returns detected objects with `class`, `confidence`, and `bbox` (bounding box coordinates); use `conf` to filter low-confidence detections. - **Qdrant**: High-performance vector similarity search engine. Stores CLIP image embeddings and enables reverse image search (find similar images). The `qdrant-svc` wrapper auto-embeds images via CLIP before upserting. ## Prerequisites - Docker Desktop (with `docker compose`) or a Docker environment. - Recommended: at least 8GB RAM for CPU-only; more for model memory or GPU use. ## Start the stack Before starting the stack, create a `.env` file for runtime secrets and environment overrides. Minimum example: ```bash API_KEY=your_api_key_here HUGGINGFACE_TOKEN=your_huggingface_token_here ``` Notes: - `API_KEY` protects gateway endpoints. - `HUGGINGFACE_TOKEN` is required if the configured BLIP model requires Hugging Face authentication. - Startup uses container healthchecks, so initial boot can take longer while models download and warm up. Run from repository root: ```bash docker compose up -d --build ``` Stop: ```bash docker compose down ``` View logs: ```bash docker compose logs -f docker compose logs -f gateway ``` ## Health Check the gateway health endpoint: ```bash curl https://vision.klevze.net/health ``` ## Universal analyze (ALL) Analyze an image by URL (gateway aggregates CLIP, BLIP, YOLO): ```bash curl -X POST https://vision.klevze.net/analyze/all \ -H "X-API-Key: " \ -H "Content-Type: application/json" \ -d '{"url":"https://files.skinbase.org/img/aa/bb/cc/md.webp","limit":5}' ``` File upload (multipart): ```bash curl -X POST https://vision.klevze.net/analyze/all/file \ -H "X-API-Key: " \ -F "file=@/path/to/image.webp" \ -F "limit=5" ``` Parameters: - `limit`: optional integer to limit returned tag/caption items. ## Individual services (via gateway) These endpoints call the specific service through the gateway. ### CLIP — tags URL request: ```bash curl -X POST https://vision.klevze.net/analyze/clip \ -H "X-API-Key: " \ -H "Content-Type: application/json" \ -d '{"url":"https://files.skinbase.org/img/aa/bb/cc/md.webp","limit":5}' ``` File upload: ```bash curl -X POST https://vision.klevze.net/analyze/clip/file \ -H "X-API-Key: " \ -F "file=@/path/to/image.webp" \ -F "limit=5" ``` Return: JSON list of tags with confidence scores. ### BLIP — captioning URL request: ```bash curl -X POST https://vision.klevze.net/analyze/blip \ -H "X-API-Key: " \ -H "Content-Type: application/json" \ -d '{"url":"https://files.skinbase.org/img/aa/bb/cc/md.webp","variants":3}' ``` File upload: ```bash curl -X POST https://vision.klevze.net/analyze/blip/file \ -H "X-API-Key: " \ -F "file=@/path/to/image.webp" \ -F "variants=3" \ -F "max_length=60" ``` Parameters: - `variants`: number of caption variants to return. - `max_length`: optional maximum caption length. Return: one or more caption strings (optionally with scores). ### YOLO — object detection URL request: ```bash curl -X POST https://vision.klevze.net/analyze/yolo \ -H "X-API-Key: " \ -H "Content-Type: application/json" \ -d '{"url":"https://files.skinbase.org/img/aa/bb/cc/md.webp","conf":0.25}' ``` File upload: ```bash curl -X POST https://vision.klevze.net/analyze/yolo/file \ -H "X-API-Key: " \ -F "file=@/path/to/image.webp" \ -F "conf=0.25" ``` Parameters: - `conf`: confidence threshold (0.0–1.0). Return: detected objects with `class`, `confidence`, and `bbox` (bounding box coordinates). ### Qdrant — vector storage & similarity search The Qdrant integration lets you store image embeddings and find visually similar images. Embeddings are generated automatically by the CLIP service. Qdrant point IDs must be either an unsigned integer or a UUID string. If you send another string value, the wrapper may replace it with a generated UUID and store the original value in metadata as `_original_id`. #### Upsert (store) an image by URL ```bash curl -X POST https://vision.klevze.net/vectors/upsert \ -H "X-API-Key: " \ -H "Content-Type: application/json" \ -d '{"url":"https://files.skinbase.org/img/aa/bb/cc/md.webp","id":"550e8400-e29b-41d4-a716-446655440000","metadata":{"category":"wallpaper","source":"upload"}}' ``` Parameters: - `url` (required): image URL to embed and store. - `id` (optional): point ID. Use an unsigned integer or UUID string. If omitted, a UUID is auto-generated. - `metadata` (optional): arbitrary key-value payload stored alongside the vector. - `collection` (optional): target collection name (defaults to `images`). #### Upsert by file upload ```bash curl -X POST https://vision.klevze.net/vectors/upsert/file \ -H "X-API-Key: " \ -F "file=@/path/to/image.webp" \ -F 'id=550e8400-e29b-41d4-a716-446655440001' \ -F 'metadata_json={"category":"photo"}' ``` #### Upsert a pre-computed vector ```bash curl -X POST https://vision.klevze.net/vectors/upsert/vector \ -H "X-API-Key: " \ -H "Content-Type: application/json" \ -d '{"vector":[0.1,0.2,...],"id":"550e8400-e29b-41d4-a716-446655440002","metadata":{"custom":"data"}}' ``` #### Search similar images by URL ```bash curl -X POST https://vision.klevze.net/vectors/search \ -H "X-API-Key: " \ -H "Content-Type: application/json" \ -d '{"url":"https://files.skinbase.org/img/aa/bb/cc/md.webp","limit":5}' ``` Parameters: - `url` (required): query image URL. - `limit` (optional, default 5): number of results. - `score_threshold` (optional): minimum cosine similarity (0.0–1.0). - `filter_metadata` (optional): filter results by metadata, e.g. `{"category":"wallpaper"}`. - `collection` (optional): collection to search. Return: list of `{"id", "score", "metadata"}` sorted by similarity. #### Search by file upload ```bash curl -X POST https://vision.klevze.net/vectors/search/file \ -H "X-API-Key: " \ -F "file=@/path/to/image.webp" \ -F "limit=5" ``` #### Search by pre-computed vector ```bash curl -X POST https://vision.klevze.net/vectors/search/vector \ -H "X-API-Key: " \ -H "Content-Type: application/json" \ -d '{"vector":[0.1,0.2,...],"limit":5}' ``` #### Collection management List all collections: ```bash curl -H "X-API-Key: " https://vision.klevze.net/vectors/collections ``` Get collection info: ```bash curl -H "X-API-Key: " https://vision.klevze.net/vectors/collections/images ``` Create a custom collection: ```bash curl -X POST https://vision.klevze.net/vectors/collections \ -H "X-API-Key: " \ -H "Content-Type: application/json" \ -d '{"name":"my_collection","vector_dim":512,"distance":"cosine"}' ``` Delete a collection: ```bash curl -H "X-API-Key: " -X DELETE https://vision.klevze.net/vectors/collections/my_collection ``` #### Delete points ```bash curl -X POST https://vision.klevze.net/vectors/delete \ -H "X-API-Key: " \ -H "Content-Type: application/json" \ -d '{"ids":["550e8400-e29b-41d4-a716-446655440000","550e8400-e29b-41d4-a716-446655440001"]}' ``` #### Get a point by ID ```bash curl -H "X-API-Key: " https://vision.klevze.net/vectors/points/550e8400-e29b-41d4-a716-446655440000 ``` #### Get a point by original application ID If the wrapper had to replace your string `id` with a generated UUID, the original value is preserved in metadata as `_original_id`. ```bash curl -H "X-API-Key: " https://vision.klevze.net/vectors/points/by-original-id/img-001 ``` ## Request/Response notes - For URL requests use `Content-Type: application/json`. - For uploads use `multipart/form-data` with a `file` field. - Most gateway endpoints require the `X-API-Key` header. - Remote image URLs must resolve to public hosts and return an image content type. - The gateway aggregates and normalizes outputs for `/analyze/all`. ## Running a single service To run only one service via docker compose: ```bash docker compose up -d --build clip ``` Or run locally (Python env) from the service folder: ```bash # inside clip/ or blip/ or yolo/ uvicorn main:app --host 0.0.0.0 --port 8000 ``` ## Production tips - Add authentication (API keys or OAuth) at the gateway. - Add rate-limiting and per-client quotas. - Keep model services on an internal Docker network. - For GPU: enable NVIDIA runtime and update service Dockerfiles / compose profiles. ## Troubleshooting - Service fails to start: check `docker compose logs ` for model load errors. - BLIP startup error about Hugging Face auth: set `HUGGINGFACE_TOKEN` in `.env` and rebuild `blip`. - Qdrant upsert error about invalid point ID: use a UUID or unsigned integer for `id`, or omit it and use the returned generated `id`. - Image URL rejected before download: the URL may point to localhost, a private IP, a non-`http/https` scheme, or a non-image content type. - High memory / OOM: increase host memory or reduce model footprint; consider GPUs. - Slow startup: model weights load on service startup — expect extra time. ## Extending - Swap or update models in each service by editing that service's `main.py`. - Add request validation, timeouts, and retries in the gateway to improve robustness. ## Files of interest - `docker-compose.yml` — composition and service definitions. - `gateway/` — gateway FastAPI server. - `clip/`, `blip/`, `yolo/` — service implementations and Dockerfiles. - `qdrant/` — Qdrant API wrapper service (FastAPI). - `common/` — shared helpers (e.g., image I/O). --- If you want, I can merge these same contents into the project `README.md`, create a Postman collection, or add example response schemas for each endpoint.