Files
SkinbaseNova/docs/ai-biography.md

475 lines
12 KiB
Markdown

# AI Biography
AI Biography is the Skinbase feature that generates short, grounded creator biographies from public profile data. It is designed to be conservative: it prefers a safe, concise summary over a flashy or speculative one.
This document explains how the feature works, what commands are available, where output is stored, and where users can see it.
## What AI Biography Does
AI Biography builds a normalized creator input payload from public data, sends that payload to the configured LLM provider, validates the generated text, and stores the result if it passes the rules.
The feature is not a general AI writing system. It is intentionally narrow:
- one biography per creator profile
- one paragraph only
- public data only
- strict validation before storage
- controlled retry when the first result fails
- manual edits and hidden states are protected
## End-to-End Flow
1. The system loads public creator data.
2. The input builder normalizes the data and computes a source hash.
3. The input is classified into a quality tier: `rich`, `medium`, or `sparse`.
4. The service checks whether the profile meets the minimum threshold for generation.
5. If the profile is too sparse, generation is suppressed.
6. If generation is allowed, the prompt builder creates the request payload.
7. The generator sends the payload to the configured provider.
8. The validator checks the returned biography text.
9. If validation fails, the generator retries once with a stricter prompt.
10. If the retry succeeds, the biography is stored.
11. If the retry also fails, the failure is recorded and no new biography is activated.
## Input Data
The normalized input can include:
- username
- member since year
- years on Skinbase
- public upload count
- featured work count
- download count
- top categories
- top tags
- best-performing work
- most productive year
- activity status
- milestone signals
- era signals
- evolution signals
The system only uses public, approved, visible creator data.
## Quality Tiers
AI Biography classifies the profile into one of three tiers:
- `rich` - long history, featured work, and multiple strong signals
- `medium` - some public activity, but not a deeply detailed profile
- `sparse` - very limited public signal
The tier affects both the prompt and the validation behavior.
## Minimum Threshold
Before the system calls the selected provider, it checks whether the profile has enough data to justify a biography.
If the profile is too thin, the generation is suppressed and the system returns the `suppressed_low_signal` action instead of producing weak filler text.
## Prompt Strategy
The prompt builder uses a versioned prompt family.
Current version:
- `v1.1`
Prompt behavior:
- normal prompt for standard profiles
- strict prompt for retry attempts
- sparse prompt for low-signal profiles that still pass the threshold
The prompt is designed to:
- avoid formulaic openings
- avoid hype language
- mention only the most meaningful signals
- keep the output to one paragraph
- discourage unsupported claims
## Provider Options
AI Biography can use one of several backends:
- `together` - Together.ai API using `google/gemma-3n-E4B-it` **(default)**
- `vision_gateway` - the existing Skinbase gateway using `/ai/chat`
- `gemini` - direct Google Gemini `generateContent` requests
- `home` - remote LM Studio using the OpenAI-compatible `/v1/chat/completions` API
Provider selection is config-driven. Together.ai is the default:
```env
AI_BIOGRAPHY_LLM_PROVIDER=together
TOGETHER_API_KEY=your_key_here
# AI_BIOGRAPHY_TOGETHER_MODEL=google/gemma-3n-E4B-it (default)
```
To use Gemini instead:
```env
AI_BIOGRAPHY_LLM_PROVIDER=gemini
GEMINI_API_KEY=...
AI_BIOGRAPHY_GEMINI_MODEL=gemini-flash-latest
```
To use the home LM Studio server instead:
```env
AI_BIOGRAPHY_LLM_PROVIDER=home
AI_BIOGRAPHY_HOME_BASE_URL=http://home.klevze.si:8200
AI_BIOGRAPHY_HOME_MODEL=qwen/qwen3.5-9b
```
To use the legacy Vision gateway:
```env
AI_BIOGRAPHY_LLM_PROVIDER=vision_gateway
```
The rest of the AI Biography pipeline stays the same. Input normalization, prompt versioning, validation, retry behavior, storage, and visibility rules do not change when the provider changes.
## Validation Rules
The validator rejects text that is:
- too short
- too long
- multiple paragraphs
- markdown formatted
- repetitive or filler-heavy
- full of unsupported praise
- too rich-sounding for a sparse profile
It also checks for repeated phrases and common boilerplate patterns.
## Retry Behavior
If the first generated text fails validation, the generator retries exactly once with a stricter prompt.
Retry is used to reduce:
- generic phrasing
- excessive claims
- formatting mistakes
- weak opening lines
If the retry still fails, the failure is stored and the biography is not activated.
## Stored Metadata
The biography record stores more than just the text.
Common metadata fields include:
- `text`
- `source_hash`
- `model`
- `prompt_version`
- `status`
- `is_hidden`
- `is_user_edited`
- `generated_at`
- `approved_at`
- `last_attempted_at`
- `last_error_code`
- `last_error_reason`
- `input_quality_tier`
- `generation_reason`
- `needs_review`
## Generation Reasons
The system tracks why a biography was generated.
Supported reasons include:
- `initial_generate`
- `manual_regenerate`
- `stale_refresh`
- `milestone_change`
- `era_change`
- `featured_change`
- `admin_batch`
- `retry_after_validation_failure`
## Where Output Goes
AI Biography output can appear in several places.
### 1. Public profile output
Public users can see the stored biography through:
- `GET /api/profile/{username}/ai-biography`
- the profile journey payload, which includes the biography when it is visible
The public endpoint never generates a biography. It only returns stored text.
### 2. Creator-facing status output
Authenticated creators can check the status of their biography through:
- `GET /api/creator/profile/ai-biography`
This payload can include:
- whether a biography exists
- whether it is hidden
- whether it is user-edited
- whether it needs review
- prompt version
- input quality tier
- generation reason
- last error details
### 3. Artisan command output
Admin and maintenance commands print directly to the terminal.
### 4. cPad review surface
Admins can review biographies in cPad at:
- `/cp/ai-biography`
This surface shows stored records, review flags, failures, hidden states, and rebuild controls.
### 5. Database storage
The canonical output is stored in the `creator_ai_biographies` table.
## Where Users Can See It
### Public visitors
Public visitors see the biography only when it is visible and stored.
If a biography is hidden, failed, or not yet generated, the public API returns null data.
### The creator
Creators can see their current biography state through the creator-facing status endpoint.
Depending on the frontend, this can be used to show:
- current biography text
- last generated time
- hidden state
- user-edited state
- needs-review state
- stale state
### Admins
Admins can inspect full metadata and review queues through the artisan commands.
They can also use the cPad review surface to:
- browse active and historical biography records
- filter by status, tier, visibility, and review state
- rebuild a creator biography
- mark records reviewed or flag them for review
- hide or re-show the active public biography
## Available Commands
### Generate biographies
```bash
php artisan ai-biography:generate {user_id}
php artisan ai-biography:generate --all
php artisan ai-biography:generate --stale
```
Options:
- `--provider=vision_gateway|vision|gemini|home` override the configured provider for this run
- `--prompt` print the initial system and user prompt that would be sent for each processed creator
- `--result` print the generated biography text to the console after a successful inline run
- `--skip-existing` skip creators who already have an active AI biography; this is most useful for single-user runs because the default batch mode is already missing-only
- `--force` overwrite user-edited biographies
- `--queue` dispatch jobs instead of running inline
- `--dry-run` list candidates without generating
- `--limit` cap batch size
- `--chunk` tune batch chunk size
### Inspect a biography
```bash
php artisan ai-biography:inspect {user_id}
```
Use this to view:
- current stored record
- quality tier
- staleness
- source hash
- failure metadata
- normalized input payload with `-v`
### Review queue
```bash
php artisan ai-biography:review-queue
```
Useful filters:
- `--tier=rich|medium|sparse`
- `--failed`
- `--needs-review`
- `--limit`
### Inspect provider health and models
```bash
php artisan ai-biography:providers
php artisan ai-biography:providers --provider=home
```
Options:
- `--provider=vision_gateway|vision|gemini|home` inspect only one provider
- `--limit` cap how many models are shown per provider
This command checks whether each configured provider is reachable and prints the available model IDs reported by that provider's models endpoint.
### Validate stored biographies
```bash
php artisan ai-biography:validate
php artisan ai-biography:validate {user_id}
```
Options:
- `--dry-run` report failures without updating records
- `--limit` cap batch size
This command re-runs the current validator against stored biographies and can flag outdated bios with `needs_review=true`.
## Visibility Rules
A biography is visible only when all of the following are true:
- the record is active
- the record is not hidden
- the status is visible
- the text is not empty
User-edited biographies are protected. If a new generation is attempted while the active biography is user-edited, the system stores a draft and marks the record with `needs_review=true` instead of silently replacing it.
## Stale Detection
AI Biography uses a source hash to detect when the underlying creator data changes.
Staleness is based on normalized public input, not on noisy micro-changes.
The system is intended to refresh for meaningful changes such as:
- featured work changes
- milestone changes
- era changes
- meaningful activity changes
- materially changed public profile data
It should not refresh just because of tiny download increments or ordering noise.
## Public and Creator API Routes
### Public
- `GET /api/profile/{username}/ai-biography`
### Creator-facing
- `GET /api/creator/profile/ai-biography`
- `POST /api/creator/profile/ai-biography/generate`
- `POST /api/creator/profile/ai-biography/regenerate`
- `PATCH /api/creator/profile/ai-biography`
- `POST /api/creator/profile/ai-biography/hide`
- `POST /api/creator/profile/ai-biography/show`
### Admin cPad
- `GET /cp/ai-biography`
- `POST /cp/ai-biography/users/{user}/rebuild`
- `POST /cp/ai-biography/records/{biography}/approve`
- `POST /cp/ai-biography/records/{biography}/flag`
- `POST /cp/ai-biography/records/{biography}/hide`
- `POST /cp/ai-biography/records/{biography}/show`
## What the Public API Returns
The public endpoint returns stored biography data only.
If there is no visible biography, it returns null data.
Typical public payload shape:
```json
{
"data": {
"text": "...",
"is_visible": true,
"is_user_edited": false,
"generated_at": "2026-04-14T20:00:00Z",
"status": "generated"
}
}
```
## What the Creator Status API Returns
The creator-facing status endpoint returns more metadata than the public API.
It can include:
- has biography
- hidden state
- user-edited state
- needs-review state
- prompt version
- input quality tier
- generation reason
- generated time
- last attempt time
- last error code
- last error reason
## Practical Example
A typical successful flow looks like this:
1. creator has enough public signal
2. system classifies the profile as `rich`
3. prompt version `v1.1` is used
4. provider generates a biography
5. validator accepts the text
6. result is stored as `generated`
7. public profile can now show it
A sparse creator may instead:
1. fail the minimum threshold check
2. get suppressed with `suppressed_low_signal`
3. not receive a generated biography until more public signal exists
## Summary
AI Biography is a constrained, public-data-only generation pipeline with:
- versioned prompts
- stricter validation
- one controlled retry
- sparse-profile suppression
- safe user-edited protection
- admin inspection and validation tools
- clear public and creator-facing visibility rules
The main goal is not more AI. The goal is more trustworthy profile text.