# Skinbase Vision Stack — Usage Guide This document explains how to run and use the Skinbase Vision Stack (Gateway + CLIP, BLIP, YOLO, Qdrant, Card Renderer, Maturity, and optional LLM services). ## Overview - Services: `gateway`, `clip`, `blip`, `yolo`, `qdrant`, `qdrant-svc`, `card-renderer`, `maturity`, `llm` (FastAPI each except `qdrant`; `llm` is a thin FastAPI shim that manages an internal `llama-server` process). - Gateway is the public API endpoint; the other services are internal. ## Model overview - **CLIP**: Contrastive Language–Image Pretraining — maps images and text into a shared embedding space. Used for zero-shot image tagging, similarity search, and returning ranked tags with confidence scores. - **BLIP**: Bootstrapping Language-Image Pre-training — a vision–language model for image captioning and multimodal generation. BLIP produces human-readable captions (multiple `variants` supported) and can be tuned with `max_length`. - **YOLO**: You Only Look Once — a family of real-time object-detection models. YOLO returns detected objects with `class`, `confidence`, and `bbox` (bounding box coordinates); use `conf` to filter low-confidence detections. - **Qdrant**: High-performance vector similarity search engine. Stores CLIP image embeddings and enables reverse image search (find similar images). The `qdrant-svc` wrapper auto-embeds images via CLIP before upserting. - **Card Renderer**: Generates branded social-card images (e.g. Open Graph previews) from artwork images. Applies smart center-weighted cropping, gradient overlays, title/username/tag text, and an optional logo. Returns binary image bytes (WebP by default). Template: `nova-artwork-v1`. - **Maturity**: Dedicated NSFW/maturity classifier. Accepts an image and returns a normalized safety signal including `maturity_label` (`safe`/`mature`), `confidence`, raw `score`, optional sublabels (e.g. `nsfw`), and an `action_hint` (`safe`, `review`, `flag_high`) designed for Nova moderation workflows. Powered by `Falconsai/nsfw_image_detection` (ViT-based, HuggingFace). Thresholds are configurable via environment variables. - **LLM**: Internal text-generation service backed by `llama.cpp` and a GGUF Qwen3 model. Exposed through the gateway for non-streaming chat completions and model discovery. Intended for Nova workflows such as creator bios, metadata suggestions, moderation helper text, and other short internal generation tasks. ## Prerequisites - Docker Desktop (with `docker compose`) or a Docker environment. - Recommended: at least 8GB RAM for CPU-only; more for model memory or GPU use. ## Start the stack Before starting the stack, create a `.env` file for runtime secrets and environment overrides. Minimum example: ```bash API_KEY=your_api_key_here HUGGINGFACE_TOKEN=your_huggingface_token_here ``` Notes: - `API_KEY` protects gateway endpoints. - `HUGGINGFACE_TOKEN` is required if the configured BLIP model requires Hugging Face authentication. - Startup uses container healthchecks, so initial boot can take longer while models download and warm up. Optional maturity configuration (can be added to `.env` to override defaults): ```bash MATURITY_MODEL=Falconsai/nsfw_image_detection MATURITY_THRESHOLD_MATURE=0.80 MATURITY_THRESHOLD_REVIEW=0.60 MATURITY_ENABLED=true ``` - `MATURITY_THRESHOLD_MATURE`: score above this → `mature` + `flag_high` (default `0.80`). - `MATURITY_THRESHOLD_REVIEW`: score above this but below mature threshold → `mature` + `review` (default `0.60`). - `MATURITY_ENABLED`: set to `false` to disable maturity endpoints at the gateway without removing the service. Optional LLM configuration: ```bash LLM_URL=http://llm:8080 LLM_ENABLED=false LLM_TIMEOUT=120 LLM_DEFAULT_MODEL=qwen3-1.7b-instruct-q4_k_m LLM_MAX_TOKENS_DEFAULT=256 LLM_MAX_TOKENS_HARD_LIMIT=1024 LLM_MAX_REQUEST_BYTES=65536 # Local llm profile only MODEL_PATH=/models/Qwen3-1.7B-Instruct-Q4_K_M.gguf LLM_CONTEXT_SIZE=4096 LLM_THREADS=4 LLM_GPU_LAYERS=0 LLM_EXTRA_ARGS= ``` Run from repository root: ```bash docker compose up -d --build ``` That starts the default vision stack only. To also start the local LLM service: ```bash docker compose --profile llm up -d --build ``` Before enabling the `llm` profile, provision the GGUF model described in [models/qwen3/README.md](models/qwen3/README.md) and set `LLM_ENABLED=true` in `.env`. For small production hosts, the preferred setup is usually to keep the gateway local and point `LLM_URL` at a separate private LLM host: ```bash LLM_ENABLED=true LLM_URL=http://private-llm-host:8080 ``` Stop: ```bash docker compose down ``` View logs: ```bash docker compose logs -f docker compose logs -f gateway ``` ## Health Check the gateway health endpoint: ```bash curl https://vision.klevze.net/health ``` Check LLM-specific gateway health: ```bash curl -H "X-API-Key: " https://vision.klevze.net/ai/health ``` ## LLM smoke test checklist Use this sequence on a machine with Docker available after you have mounted the GGUF model and enabled the gateway with `LLM_ENABLED=true`. 1. Start the gateway with the `llm` profile. ```bash docker compose --profile llm up -d --build gateway llm ``` 2. Confirm the LLM service came up cleanly. ```bash docker compose ps llm docker compose logs --tail=100 llm ``` 3. Check the repo-owned internal health endpoint. ```bash curl http://127.0.0.1:8080/health ``` Expected fields: `status`, `model`, `context_size`, `threads`. 4. Confirm the gateway sees the LLM backend. ```bash curl -H "X-API-Key: " http://127.0.0.1:8003/health curl -H "X-API-Key: " http://127.0.0.1:8003/ai/health ``` 5. Verify model discovery. ```bash curl -H "X-API-Key: " http://127.0.0.1:8003/v1/models curl -H "X-API-Key: " http://127.0.0.1:8003/ai/models ``` 6. Run a small chat request through the gateway. ```bash curl -X POST http://127.0.0.1:8003/v1/chat/completions \ -H "X-API-Key: " \ -H "Content-Type: application/json" \ -d '{ "messages": [ {"role": "system", "content": "You are a concise assistant for Skinbase Nova."}, {"role": "user", "content": "Write one short admin help sentence about reviewing wallpaper metadata."} ], "max_tokens": 60, "stream": false }' ``` 7. If startup or health fails, inspect the relevant logs. ```bash docker compose logs --tail=200 llm docker compose logs --tail=200 gateway ``` ## Universal analyze (ALL) Analyze an image by URL (gateway aggregates CLIP, BLIP, YOLO): ```bash curl -X POST https://vision.klevze.net/analyze/all \ -H "X-API-Key: " \ -H "Content-Type: application/json" \ -d '{"url":"https://files.skinbase.org/img/aa/bb/cc/md.webp","limit":5}' ``` File upload (multipart): ```bash curl -X POST https://vision.klevze.net/analyze/all/file \ -H "X-API-Key: " \ -F "file=@/path/to/image.webp" \ -F "limit=5" ``` Parameters: - `limit`: optional integer to limit returned tag/caption items. ## Individual services (via gateway) These endpoints call the specific service through the gateway. ### CLIP — tags URL request: ```bash curl -X POST https://vision.klevze.net/analyze/clip \ -H "X-API-Key: " \ -H "Content-Type: application/json" \ -d '{"url":"https://files.skinbase.org/img/aa/bb/cc/md.webp","limit":5}' ``` File upload: ```bash curl -X POST https://vision.klevze.net/analyze/clip/file \ -H "X-API-Key: " \ -F "file=@/path/to/image.webp" \ -F "limit=5" ``` Return: JSON list of tags with confidence scores. ### BLIP — captioning URL request: ```bash curl -X POST https://vision.klevze.net/analyze/blip \ -H "X-API-Key: " \ -H "Content-Type: application/json" \ -d '{"url":"https://files.skinbase.org/img/aa/bb/cc/md.webp","variants":3}' ``` File upload: ```bash curl -X POST https://vision.klevze.net/analyze/blip/file \ -H "X-API-Key: " \ -F "file=@/path/to/image.webp" \ -F "variants=3" \ -F "max_length=60" ``` Parameters: - `variants`: number of caption variants to return. - `max_length`: optional maximum caption length. Return: one or more caption strings (optionally with scores). ### YOLO — object detection URL request: ```bash curl -X POST https://vision.klevze.net/analyze/yolo \ -H "X-API-Key: " \ -H "Content-Type: application/json" \ -d '{"url":"https://files.skinbase.org/img/aa/bb/cc/md.webp","conf":0.25}' ``` File upload: ```bash curl -X POST https://vision.klevze.net/analyze/yolo/file \ -H "X-API-Key: " \ -F "file=@/path/to/image.webp" \ -F "conf=0.25" ``` Parameters: - `conf`: confidence threshold (0.0–1.0). Return: detected objects with `class`, `confidence`, and `bbox` (bounding box coordinates). ### Maturity — NSFW / maturity analysis Analyzes an image for mature or NSFW content and returns a structured signal intended for Nova moderation workflows. URL request: ```bash curl -X POST https://vision.klevze.net/analyze/maturity \ -H "X-API-Key: " \ -H "Content-Type: application/json" \ -d '{"url":"https://files.skinbase.org/img/aa/bb/cc/md.webp"}' ``` File upload: ```bash curl -X POST https://vision.klevze.net/analyze/maturity/file \ -H "X-API-Key: " \ -F "file=@/path/to/image.webp" ``` Example response: ```json { "maturity_label": "mature", "confidence": 0.94, "score": 0.94, "labels": ["nsfw"], "model": "Falconsai/nsfw_image_detection", "threshold_used": 0.80, "analysis_time_ms": 183.0, "source": "maturity-service", "action_hint": "flag_high", "advisory": "High-confidence mature content detected" } ``` Response fields: | Field | Type | Description | |---|---|---| | `maturity_label` | string | `safe` or `mature` | | `confidence` | float | Confidence in the label decision (0–1). For `safe`, this is `1 - score`. | | `score` | float | Raw NSFW probability from the model (0–1). | | `labels` | array | Sublabels when mature: currently `["nsfw"]`. Empty for safe results. | | `model` | string | Model identifier / HuggingFace model ID. | | `threshold_used` | float | The threshold value that determined the label. | | `analysis_time_ms` | float | Inference time in milliseconds. | | `source` | string | Always `maturity-service`. | | `action_hint` | string | `safe`, `review`, or `flag_high`. Use this in Nova to drive blur/queue/flag decisions. | | `advisory` | string | Short human-readable explanation. | `action_hint` decision logic: - `flag_high`: score ≥ `MATURITY_THRESHOLD_MATURE` (default 0.80) — high-confidence mature, flag for moderation. - `review`: score ≥ `MATURITY_THRESHOLD_REVIEW` (default 0.60) but below mature threshold — possible mature, queue for human review. - `safe`: score below both thresholds — content appears safe. If the maturity service is unavailable the gateway returns a `502` or `503` error. **Nova must not treat a gateway failure as a `safe` result** — retry or queue for later processing. ## LLM / Chat endpoints The gateway validates requests, clamps `max_tokens` to configured limits, rejects oversized payloads, and normalizes downstream failures into JSON under an `error` key. ### OpenAI-style chat completions ```bash curl -X POST https://vision.klevze.net/v1/chat/completions \ -H "X-API-Key: " \ -H "Content-Type: application/json" \ -d '{ "messages": [ {"role": "system", "content": "You are a concise assistant for Skinbase Nova."}, {"role": "user", "content": "Write a short biography for a creator known for sci-fi environments."} ], "temperature": 0.7, "max_tokens": 220, "stream": false }' ``` Supported request fields: - `messages` (required) - `temperature` - `max_tokens` - `stream` (`false` only in v1) - `top_p` - `stop` - `presence_penalty` - `frequency_penalty` Validation rules: - At least one message is required. - Roles must be `system`, `user`, or `assistant`. - Empty message content is rejected. - Oversized request bodies return `413`. - `max_tokens` is clamped to `LLM_MAX_TOKENS_HARD_LIMIT`. ### Project-friendly chat response ```bash curl -X POST https://vision.klevze.net/ai/chat \ -H "X-API-Key: " \ -H "Content-Type: application/json" \ -d '{ "messages": [ {"role": "system", "content": "You are a helpful metadata assistant."}, {"role": "user", "content": "Suggest five tags for a fantasy castle wallpaper."} ] }' ``` Example response: ```json { "model": "qwen3-1.7b-instruct-q4_k_m", "content": "fantasy castle, moonlit fortress, medieval towers, epic landscape, digital painting", "finish_reason": "stop", "usage": { "prompt_tokens": 48, "completion_tokens": 19, "total_tokens": 67 } } ``` ### Model discovery ```bash curl -H "X-API-Key: " https://vision.klevze.net/v1/models curl -H "X-API-Key: " https://vision.klevze.net/ai/models ``` ### Failure modes - `401`: missing or invalid API key - `413`: request body exceeds `LLM_MAX_REQUEST_BYTES` - `422`: validation failure or unsupported streaming request - `503`: LLM disabled or upstream unavailable - `504`: upstream timeout ## Vector DB (Qdrant) Use the Qdrant gateway endpoints to store image embeddings and find visually similar images. Embeddings are generated automatically by the CLIP service. Qdrant point IDs must be either an unsigned integer or a UUID string. If you send another string value, the wrapper may replace it with a generated UUID and store the original value in metadata as `_original_id`. #### Upsert (store) an image by URL ```bash curl -X POST https://vision.klevze.net/vectors/upsert \ -H "X-API-Key: " \ -H "Content-Type: application/json" \ -d '{"url":"https://files.skinbase.org/img/aa/bb/cc/md.webp","id":"550e8400-e29b-41d4-a716-446655440000","metadata":{"category":"wallpaper","source":"upload"}}' ``` Parameters: - `url` (required): image URL to embed and store. - `id` (optional): point ID. Use an unsigned integer or UUID string. If omitted, a UUID is auto-generated. - `metadata` (optional): arbitrary key-value payload stored alongside the vector. - `collection` (optional): target collection name (defaults to `images`). #### Upsert by file upload ```bash curl -X POST https://vision.klevze.net/vectors/upsert/file \ -H "X-API-Key: " \ -F "file=@/path/to/image.webp" \ -F 'id=550e8400-e29b-41d4-a716-446655440001' \ -F 'metadata_json={"category":"photo"}' ``` #### Upsert a pre-computed vector ```bash curl -X POST https://vision.klevze.net/vectors/upsert/vector \ -H "X-API-Key: " \ -H "Content-Type: application/json" \ -d '{"vector":[0.1,0.2,...],"id":"550e8400-e29b-41d4-a716-446655440002","metadata":{"custom":"data"}}' ``` #### Search similar images by URL ```bash curl -X POST https://vision.klevze.net/vectors/search \ -H "X-API-Key: " \ -H "Content-Type: application/json" \ -d '{"url":"https://files.skinbase.org/img/aa/bb/cc/md.webp","limit":5}' ``` Parameters: - `url` (required): query image URL. - `limit` (optional, default 5): number of results. - `score_threshold` (optional): minimum cosine similarity (0.0–1.0). - `filter_metadata` (optional): filter results by payload fields, e.g. `{"is_public":true,"category_id":3}`. - `collection` (optional): collection to search. - `hnsw_ef` (optional, int): override the HNSW ef parameter at query time. Higher = better recall, slightly more latency. - `exact` (optional, bool, default false): brute-force exact search. Avoid on large collections. - `indexed_only` (optional, bool, default false): restrict search to fully indexed segments only. Useful during bulk ingest. Return: list of `{"id", "score", "metadata"}` sorted by similarity. #### Search by file upload ```bash curl -X POST https://vision.klevze.net/vectors/search/file \ -H "X-API-Key: " \ -F "file=@/path/to/image.webp" \ -F "limit=5" \ -F 'filter_metadata_json={"is_public":true}' ``` All URL search parameters are available as form fields; use `filter_metadata_json` (JSON string) for filters. #### Search by pre-computed vector ```bash curl -X POST https://vision.klevze.net/vectors/search/vector \ -H "X-API-Key: " \ -H "Content-Type: application/json" \ -d '{"vector":[0.1,0.2,...],"limit":5,"hnsw_ef":128}' ``` #### Collection management List all collections: ```bash curl -H "X-API-Key: " https://vision.klevze.net/vectors/collections ``` Get collection info: ```bash curl -H "X-API-Key: " https://vision.klevze.net/vectors/collections/images ``` Create a custom collection: ```bash curl -X POST https://vision.klevze.net/vectors/collections \ -H "X-API-Key: " \ -H "Content-Type: application/json" \ -d '{"name":"my_collection","vector_dim":512,"distance":"cosine"}' ``` Delete a collection: ```bash curl -H "X-API-Key: " -X DELETE https://vision.klevze.net/vectors/collections/my_collection ``` #### Full diagnostic inspect Returns HNSW config, optimizer config, quantization, segment count, payload index coverage percentages, and RAM footprint estimate for every collection. ```bash curl -H "X-API-Key: " https://vision.klevze.net/vectors/inspect ``` #### Payload index management Payload indexes are critical for fast filtered vector search. Always create indexes for fields used in `filter_metadata` filters. ```bash # List existing indexes curl -H "X-API-Key: " https://vision.klevze.net/vectors/collections/images/indexes # Create a single index curl -X POST https://vision.klevze.net/vectors/collections/images/indexes \ -H "X-API-Key: " \ -H "Content-Type: application/json" \ -d '{"field":"is_public","type":"bool"}' # Ensure multiple indexes exist (idempotent — safe to run multiple times) curl -X POST https://vision.klevze.net/vectors/collections/images/ensure-indexes \ -H "X-API-Key: " \ -H "Content-Type: application/json" \ -d '{"fields":[{"field":"is_public","type":"bool"},{"field":"is_deleted","type":"bool"},{"field":"category_id","type":"integer"},{"field":"user_id","type":"keyword"}]}' ``` Supported index types: `keyword`, `integer`, `float`, `bool`, `geo`, `datetime`, `text`, `uuid`. #### Collection configuration (HNSW / optimizer / quantization) Updates HNSW, optimizer, or scalar quantization settings on an existing collection without data loss. HNSW graph and segment changes apply to newly created segments. ```bash curl -X POST https://vision.klevze.net/vectors/collections/images/configure \ -H "X-API-Key: " \ -H "Content-Type: application/json" \ -d '{ "hnsw_m": 16, "hnsw_ef_construct": 200, "hnsw_on_disk": false, "indexing_threshold": 20000, "default_segment_number": 4, "quantization_type": "int8", "quantization_quantile": 0.99, "quantization_always_ram": true }' ``` Parameters: - `hnsw_m` (int, 4–64): edges per node in the HNSW graph. - `hnsw_ef_construct` (int, 10–1000): ef during index construction. - `hnsw_on_disk` (bool): store HNSW graph on disk (saves RAM, slightly slower queries). - `indexing_threshold` (int): minimum vector changes before a segment is indexed. - `default_segment_number` (int, 1–32): target segment count for parallelism. - `quantization_type` (string, `"int8"` or null): enable scalar quantization (~4× RAM reduction). - `quantization_quantile` (float, 0.5–1.0, default 0.99): calibration quantile. - `quantization_always_ram` (bool, default true): keep quantized vectors in RAM. #### Delete points ```bash curl -X POST https://vision.klevze.net/vectors/delete \ -H "X-API-Key: " \ -H "Content-Type: application/json" \ -d '{"ids":["550e8400-e29b-41d4-a716-446655440000","550e8400-e29b-41d4-a716-446655440001"]}' ``` #### Get a point by ID ```bash curl -H "X-API-Key: " https://vision.klevze.net/vectors/points/550e8400-e29b-41d4-a716-446655440000 ``` #### Get a point by original application ID If the wrapper had to replace your string `id` with a generated UUID, the original value is preserved in metadata as `_original_id`. ```bash curl -H "X-API-Key: " https://vision.klevze.net/vectors/points/by-original-id/img-001 ``` ## Card Renderer The card renderer generates branded social-card images from artwork photos. It applies smart center-weighted cropping, a gradient overlay, title/subtitle/username/category text, optional tags, and an optional logo. Default output: 1200×630 WebP (`nova-artwork-v1` template). ### List available templates ```bash curl -H "X-API-Key: " https://vision.klevze.net/cards/templates ``` ### Render a card from a URL ```bash curl -X POST https://vision.klevze.net/cards/render \ -H "X-API-Key: " \ -H "Content-Type: application/json" \ -d '{ "url": "https://files.skinbase.org/img/aa/bb/cc/md.webp", "title": "Artwork Title", "subtitle": "Optional subtitle", "username": "@artist", "category": "Digital Art", "tags": ["surreal", "landscape"], "template": "nova-artwork-v1", "width": 1200, "height": 630, "output": "webp", "quality": 90, "show_logo": true }' ``` Returns binary image bytes with `Content-Type: image/webp`. ### Render a card from a file upload ```bash curl -X POST https://vision.klevze.net/cards/render/file \ -H "X-API-Key: " \ -F "file=@/path/to/image.webp" \ -F "title=Artwork Title" \ -F "username=@artist" \ -F "template=nova-artwork-v1" \ -F "show_logo=true" ``` Returns binary image bytes. ### Get card layout metadata (no image rendered) ```bash curl -X POST https://vision.klevze.net/cards/render/meta \ -H "X-API-Key: " \ -H "Content-Type: application/json" \ -d '{"url":"https://files.skinbase.org/img/aa/bb/cc/md.webp","title":"Artwork Title"}' ``` Returns crop coordinates and layout data without producing an image. ## Request/Response notes - For URL requests use `Content-Type: application/json`. - For uploads use `multipart/form-data` with a `file` field. - Most gateway endpoints require the `X-API-Key` header. - Remote image URLs must resolve to public hosts and return an image content type. - The gateway aggregates and normalizes outputs for `/analyze/all`. ## Running a single service To run only one service via docker compose: ```bash docker compose up -d --build clip ``` Or run locally (Python env) from the service folder: ```bash # inside clip/ or blip/ or yolo/ uvicorn main:app --host 0.0.0.0 --port 8000 ``` ## Production tips - Add authentication (API keys or OAuth) at the gateway. - Add rate-limiting and per-client quotas. - Keep model services on an internal Docker network. - For GPU: enable NVIDIA runtime and update service Dockerfiles / compose profiles. ## Troubleshooting - Service fails to start: check `docker compose logs ` for model load errors. - BLIP startup error about Hugging Face auth: set `HUGGINGFACE_TOKEN` in `.env` and rebuild `blip`. - Qdrant upsert error about invalid point ID: use a UUID or unsigned integer for `id`, or omit it and use the returned generated `id`. - Image URL rejected before download: the URL may point to localhost, a private IP, a non-`http/https` scheme, or a non-image content type. - High memory / OOM: increase host memory or reduce model footprint; consider GPUs. - Slow startup: model weights load on service startup — expect extra time. The maturity service (`start_period: 90s`) may take longer on first boot as it downloads the classifier weights (~330 MB). Mount `~/.cache/huggingface` as a volume to persist across rebuilds. - Maturity endpoint returns `503`: `MATURITY_ENABLED` is set to `false` in environment configuration. - Maturity endpoint returns `502`: the maturity container is unhealthy or still starting up; wait and retry. ## Extending - Swap or update models in each service by editing that service's `main.py`. - Add request validation, timeouts, and retries in the gateway to improve robustness. ## Files of interest - `docker-compose.yml` — composition and service definitions. - `gateway/` — gateway FastAPI server. - `clip/`, `blip/`, `yolo/` — service implementations and Dockerfiles. - `maturity/` — NSFW/maturity classifier service (ViT-based, HuggingFace `Falconsai/nsfw_image_detection`). - `qdrant/` — Qdrant API wrapper service (FastAPI). - `card-renderer/` — card rendering service (FastAPI). - `common/` — shared helpers (e.g., image I/O).