Files
SkinbaseNova/docs/discovery-personalization-engine.md
2026-03-28 19:15:39 +01:00

23 KiB
Raw Blame History

Discovery & Personalization Engine

Covers the trending system, following feed, personalized homepage, similar artworks, unified activity feed, and all input signal collection that powers the ranking formula.

This document also covers the v3 AI discovery layer: vision metadata extraction, vector indexing, AI similar-artwork search, reverse image search, and the hybrid feed section controls.


Table of Contents

  1. Architecture Overview
  2. Input Signal Collection
  3. Windowed Stats (views & downloads)
  4. Trending Engine
  5. Discover Routes
  6. Following Feed
  7. Personalized Homepage
  8. Similar Artworks API
  9. Unified Activity Feed
  10. Meilisearch Configuration
  11. Caching Strategy
  12. Scheduled Jobs
  13. Testing
  14. AI Discovery v3
  15. Operational Runbook

1. Architecture Overview

Browser
  │
  ├─ POST /api/art/{id}/view      → ArtworkViewController
  ├─ POST /api/art/{id}/download  → ArtworkDownloadController
  └─ POST /api/artworks/{id}/favorite / reactions / awards / comments
          │
          ▼
    ArtworkStatsService           UserStatsService
    artwork_stats (all-time +     user_statistics
    windowed counters)            └─ artwork_views_received_count
    artwork_downloads (log)           downloads_received_count
          │
          ▼
    skinbase:reset-windowed-stats   (nightly/weekly)
    └─ zeros views_24h / views_7d
    └─ recomputes downloads_24h / downloads_7d from log
          │
          ▼
    skinbase:recalculate-trending   (every 30 min)
    └─ bulk UPDATE artworks.trending_score_24h / _7d
    └─ dispatches IndexArtworkJob → Meilisearch
          │
          ▼
    Meilisearch index (artworks)
    └─ sortable: trending_score_7d, trending_score_24h, views, ...
    └─ filterable: author_id, tags, category, orientation, is_public, ...
          │
          ▼
    HomepageService / DiscoverController / SimilarArtworksController
    └─ Redis cache (5 min TTL)
          │
          ▼
    Inertia + React frontend

2. Input Signal Collection

2.1 View tracking — POST /api/art/{id}/view

Controller: App\Http\Controllers\Api\ArtworkViewController
Route name: api.art.view
Throttle: 5 requests per 10 minutes per IP

Deduplication (layered):

Layer Mechanism Scope
Client-side sessionStorage key sb_viewed_{id} set before the request Browser tab lifetime
Server-side $request->session()->put('art_viewed.{id}', true) Laravel session lifetime
Throttle throttle:5,10 route middleware Per-IP per-artwork

The React component ArtworkActions.jsx fires a useEffect on mount that checks sessionStorage first, then hits the endpoint. The response includes counted: true|false so callers can confirm whether the increment actually happened.

What gets incremented:

artwork_stats.views          +1  (all-time)
artwork_stats.views_24h      +1  (zeroed nightly)
artwork_stats.views_7d       +1  (zeroed weekly)
user_statistics.artwork_views_received_count  +1  (creator aggregate)

Via ArtworkStatsService::incrementViews() with defer: true (Redis when available, direct DB fallback).


2.2 Download tracking — POST /api/art/{id}/download

Controller: App\Http\Controllers\Api\ArtworkDownloadController
Route name: api.art.download
Throttle: 10 requests per minute per IP

The endpoint:

  1. Inserts a row in artwork_downloads (persistent event log with created_at)
  2. Increments artwork_stats.downloads, downloads_24h, downloads_7d
  3. Returns {"ok": true, "url": "<highest-res thumbnail URL>"} for the native browser download

The <a download> buttons in ArtworkActions.jsx call trackDownload() on click — a fire-and-forget fetch() POST. The actual browser download is triggered by the href/download attributes and is never blocked by the tracking request.

What gets incremented:

artwork_downloads             INSERT (event log, persisted forever)
artwork_stats.downloads       +1  (all-time)
artwork_stats.downloads_24h   +1  (recomputed from log nightly)
artwork_stats.downloads_7d    +1  (recomputed from log weekly)
user_statistics.downloads_received_count  +1  (creator aggregate)

Via ArtworkStatsService::incrementDownloads() with defer: true.


2.3 Other signals (already existed)

Signal Endpoint / Service Written to
Favorite toggle POST /api/artworks/{id}/favorite user_favorites, artwork_stats.favorites
Reaction toggle POST /api/artworks/{id}/reactions artwork_reactions
Award ArtworkAwardController artwork_award_stats.score_total
Comment ArtworkCommentController artwork_comments, activity_events
Follow FollowService user_followers, activity_events

2.4 ArtworkStatsService — Redis deferral

When Redis is available all increments are pushed to a list key artwork_stats:deltas as JSON payloads. A separate job/command (processPendingFromRedis) drains the queue and applies bulk applyDelta() calls. If Redis is unavailable the service falls back transparently to a direct DB increment.

// Deferred (default for view/download controllers)
$svc->incrementViews($artworkId, 1, defer: true);

// Immediate (e.g. favorites toggle needs instant feedback)
$svc->incrementDownloads($artworkId, 1, defer: false);

3. Windowed Stats (views & downloads)

3.1 Why windowed columns?

The trending formula needs recent activity, not all-time totals. artwork_stats.views is a monotonically increasing counter — using it for trending would permanently favour old popular artworks and new artworks could never compete.

The solution is four cached window columns refreshed on a schedule:

Column Meaning Reset cadence
views_24h Views since last midnight reset Nightly at 03:30
views_7d Views since last Monday reset Weekly (Mon) at 03:30
downloads_24h Downloads in last 24 h Nightly at 03:30 (recomputed from log)
downloads_7d Downloads in last 7 days Weekly (Mon) at 03:30 (recomputed from log)

3.2 How views windowing works

No per-view event log exists (storing millions of view rows would be expensive). Instead:

  • Every view event increments views_24h and views_7d alongside views.
  • The reset command zeroes both columns. Artworks re-accumulate from the reset time onward.
  • Accuracy is "views since last reset", which is close enough for trending (error ≤ 1 day).

3.3 How downloads windowing works

artwork_downloads is a full event log with created_at. The reset command:

  1. Queries COUNT(*) FROM artwork_downloads WHERE artwork_id = ? AND created_at >= NOW() - {interval} for each artwork in chunks of 1000.
  2. Writes the exact count back to downloads_24h / downloads_7d.

This overwrites any drift from deferred Redis increments, making download windows always accurate at reset time.

3.4 Reset command

php artisan skinbase:reset-windowed-stats --period=24h
php artisan skinbase:reset-windowed-stats --period=7d

Uses chunked PHP loop (no GREATEST() / INTERVAL MySQL syntax) → works in both production MySQL and SQLite test DB.


4.1 Formula

score = (award_score   × 5.0)
      + (favorites     × 3.0)
      + (reactions     × 2.0)
      + (downloads_Xd  × 1.0)   ← windowed: 24h or 7d
      + (views_Xd      × 2.0)   ← windowed: 24h or 7d
      - (hours_since_published × 0.1)

score = max(score, 0)   ← clamped via GREATEST()

Weights are constants in TrendingService (W_AWARD, W_FAVORITE, etc.) — adjust without a schema change.

4.2 Output columns

Artworks column Meaning
trending_score_24h Score using views_24h + downloads_24h; targets artworks ≤ 7 days old
trending_score_7d Score using views_7d + downloads_7d; targets artworks ≤ 30 days old
last_trending_calculated_at Timestamp of last calculation

4.3 Recalculation command

php artisan skinbase:recalculate-trending --period=24h
php artisan skinbase:recalculate-trending --period=7d
php artisan skinbase:recalculate-trending --period=all
php artisan skinbase:recalculate-trending --period=7d --skip-index  # skip Meilisearch jobs
php artisan skinbase:recalculate-trending --chunk=500               # smaller DB chunks

Implementation: App\Services\TrendingService::recalculate()

  1. Chunks artworks published within the look-back window (chunkById(1000, ...)).
  2. Issues one bulk MySQL UPDATE ... WHERE id IN (...) per chunk — no per-artwork queries in the hot path.
  3. After each chunk, dispatches IndexArtworkJob per artwork to push updated scores to Meilisearch (skippable with --skip-index).

Note: The raw SQL uses GREATEST() and TIMESTAMPDIFF(HOUR, ...) which are MySQL 8 only. The command is tested in production against MySQL; the 4 related Pest tests are skipped on SQLite with a clear skip message.

4.4 Meilisearch sync after calculation

TrendingService::syncToSearchIndex() dispatches IndexArtworkJob for every artwork in the trending window. The job calls Artwork::searchable() which triggers toSearchableArray(), which includes trending_score_24h and trending_score_7d.


5. Discover Routes

All routes under /discover/* are registered in routes/web.php and handled by App\Http\Controllers\Web\DiscoverController. All use Meilisearch sorting — no SQL ORDER BY in the hot path.

Route Name Sort key Auth
/discover/trending discover.trending trending_score_7d:desc No
/discover/fresh discover.fresh created_at:desc No
/discover/top-rated discover.top-rated likes:desc No
/discover/most-downloaded discover.most-downloaded downloads:desc No
/discover/following discover.following created_at:desc (DB) Yes

6. Following Feed

Route: GET /discover/following (auth required)
Controller: DiscoverController::following()

Logic

1. Get user's following IDs from user_followers
2. If empty    → show empty state (see below)
3. If present  → Artwork::whereIn('user_id', $followingIds)
                          ->orderByDesc('published_at')
                          ->paginate(24)
                 + cached 1 min per user per page

Empty state

When the user follows nobody:

  • fallback_trending — up to 12 trending artworks (Meilisearch, with DB fallback)
  • fallback_creators — 8 most-followed verified users (ordered by user_statistics.followers_count)
  • empty: true flag passed to the view
  • The discoverTrending() call is wrapped in try/catch so a Meilisearch outage never breaks the empty state page

7. Personalized Homepage

Controller: HomeController::index()
Service: App\Services\HomepageService

Guest sections

[
  'hero'     => first featured artwork,
  'trending' => 12 artworks sorted by trending_score_7d,
  'fresh'    => 12 newest artworks,
  'tags'     => 12 most-used tags,
  'creators' => creator spotlight,
  'news'     => latest news posts,
]

Authenticated sections (personalized)

[
  'hero'           => same as guest,
  'from_following' => artworks from followed creators (up to 12, cached 1 min),
  'trending'       => same as guest,
  'by_tags'        => artworks matching user's top 5 tags,
  'by_categories'  => fresh uploads in user's top 3 favourite categories,
  'tags'           => same as guest,
  'creators'       => same as guest,
  'news'           => same as guest,
  'preferences'    => { top_tags, top_categories },
]

UserPreferenceService

App\Services\UserPreferenceService::build(User $user) — cached 5 min per user.

Computes preferences from the user's favourited artworks:

Output key Source
top_tags (up to 5) Tags on artworks in artwork_favourites
top_categories (up to 3) Categories on artworks in artwork_favourites
followed_creators IDs from user_followers

getTrending() — Meilisearch-first

Artwork::search('')
    ->options([
        'filter' => 'is_public = true AND is_approved = true',
        'sort'   => ['trending_score_7d:desc', 'trending_score_24h:desc', 'views:desc'],
    ])
    ->paginate($limit, 'page', 1);

Falls back to getTrendingFromDb()orderByDesc('trending_score_7d') with no correlated subqueries — when Meilisearch is unavailable.


8. Similar Artworks API

Route: GET /api/art/{id}/similar
Controller: App\Http\Controllers\Api\SimilarArtworksController
Route name: api.art.similar
Throttle: 60/min
Cache: 5 min per artwork ID
Max results: 12

Similarity algorithm

Meilisearch filters are built in priority order:

is_public = true
is_approved = true
id != {source_id}
author_id != {source_author_id}          ← same creator excluded
orientation = "{landscape|portrait}"     ← only for non-square (visual coherence)
(tags = "X" OR tags = "Y" OR ...)        ← tag overlap (primary signal)
  OR (if no tags)
(category = "X" OR ...)                  ← category fallback

Meilisearch's own ranking then sorts by relevance within those filters. Results are mapped to a slim JSON shape: {id, title, slug, thumb, url, author_id}.


9. Unified Activity Feed

Route: GET /community/activity?type=global|following
Controller: App\Http\Controllers\Web\CommunityActivityController

activity_events schema

Column Type Notes
id bigint PK
actor_id bigint FK users Who did the action
type varchar upload comment favorite award follow
target_type varchar artwork user
target_id bigint ID of the target object
meta json nullable Extra data (e.g. award tier)
created_at timestamp No updated_at — immutable events

Where events are recorded

Event type Recording point
upload UploadController::finish() on publish
follow FollowService::follow()
award ArtworkAwardController::store()
favorite ArtworkInteractionController::favorite()
comment ArtworkCommentController::store()

All via ActivityEvent::record($actorId, $type, $targetType, $targetId, $meta).

Feed filters

  • Global — all recent events, newest first, paginated 30/page
  • FollowingWHERE actor_id IN (following_ids) — only events from users you follow

The controller enriches each event batch with its target objects in a single query per target type (no N+1).


10. Meilisearch Configuration

Configured in config/scout.php under meilisearch.index-settings.

Push settings to a running instance:

php artisan scout:sync-index-settings

Artworks index settings

Searchable attributes (ranked in order):

  1. title
  2. tags
  3. author_name
  4. description

Filterable attributes: tags, category, content_type, orientation, resolution, author_id, is_public, is_approved

Sortable attributes: created_at, downloads, likes, views, trending_score_24h, trending_score_7d, favorites_count, awards_received_count, downloads_count

toSearchableArray() — fields indexed per artwork

[
  'id', 'slug', 'title', 'description',
  'author_id', 'author_name',
  'category', 'content_type', 'tags',
  'resolution', 'orientation',
  'downloads', 'likes', 'views',
  'created_at', 'is_public', 'is_approved',
  'trending_score_24h', 'trending_score_7d',
  'favorites_count', 'awards_received_count', 'downloads_count',
  'awards' => { gold, silver, bronze, score },
]

11. Caching Strategy

Data Cache key TTL Driver
Homepage trending homepage.trending.{limit} 5 min Redis/file
Homepage fresh homepage.fresh.{limit} 5 min Redis/file
Homepage hero homepage.hero 5 min Redis/file
Homepage tags homepage.tags.{limit} 5 min Redis/file
User preferences user.prefs.{user_id} 5 min Redis/file
Following feed discover.following.{user_id}.p{page} 1 min Redis/file
Similar artworks api.similar.{artwork_id} 5 min Redis/file

Rules:

  • Personalized data (from_following, by_tags, by_categories) is not independently cached — it falls inside allForUser() which is called fresh per request.
  • Long-running cache busting: the trending command and reset command do not explicitly clear cache — the TTL is short enough that stale data self-expires within one trending cycle.

12. Scheduled Jobs

All registered in routes/console.php via Schedule::command().

Time Command Purpose
Every 30 min skinbase:recalculate-trending --period=24h Update trending_score_24h
Every 30 min skinbase:recalculate-trending --period=7d --skip-index Update trending_score_7d (background)
03:00 daily uploads:cleanup Remove stale draft uploads
03:10 daily analytics:aggregate-similar-artworks Offline similarity metrics
03:20 daily analytics:aggregate-feed Feed evaluation metrics
03:30 daily skinbase:reset-windowed-stats --period=24h Zero views_24h, recompute downloads_24h
Monday 03:30 skinbase:reset-windowed-stats --period=7d Zero views_7d, recompute downloads_7d

Reset runs at 03:30 so it fires after the other maintenance tasks (03:0003:20). The next trending recalculation (every 30 min, including ~03:30 or ~04:00) picks up the freshly-zeroed windowed stats and writes accurate trending scores.


13. Testing

All tests live under tests/Feature/Discovery/.

Test file Coverage
ActivityEventRecordingTest.php ActivityEvent::record(), all 5 types, actor relation, meta, route smoke tests for the activity feed
FollowingFeedTest.php Auth redirect, empty state fallback, pagination, creator exclusion
HomepagePersonalizationTest.php Guest vs auth homepage sections, preferences shape, 200 responses
SimilarArtworksApiTest.php 404 cases, response shape, result count ≤ 12, creator exclusion
SignalTrackingTest.php View endpoint (404s, first count, session dedup), download endpoint (404s, DB row, guest vs auth), route names
TrendingServiceTest.php Zero artworks, skip outside window, skip private/unapproved — recalculate() tests skipped on SQLite (MySQL-only SQL)
WindowedStatsTest.php incrementViews/Downloads update all 3 columns, reset command zeros views, recomputes downloads from log, window boundary correctness

Run all discovery tests:

php artisan test tests/Feature/Discovery/

Run specific suite:

php artisan test tests/Feature/Discovery/SignalTrackingTest.php

SQLite vs MySQL note: Four tests in TrendingServiceTest are marked .skip() with the message "Requires MySQL: uses GREATEST() and TIMESTAMPDIFF()". Run them against a real MySQL instance in CI or staging to validate the bulk UPDATE formula.


14. AI Discovery v3

15.1 Overview

The v3 layer augments the existing recommendation engine with:

  • CLIP-derived embeddings and tags
  • BLIP captions
  • YOLO object detections
  • vector-gateway similarity search
  • hybrid feed reranking and section generation

Primary request paths:

  • GET /api/art/{id}/similar-ai
  • POST /api/search/image
  • POST /api/uploads/{id}/vision-suggest

Primary async jobs:

  • AutoTagArtworkJob
  • GenerateArtworkEmbeddingJob
  • SyncArtworkVectorIndexJob
  • BackfillArtworkVectorIndexJob

15.2 Core configuration

Vision gateway:

  • VISION_ENABLED
  • VISION_GATEWAY_URL
  • VISION_GATEWAY_TIMEOUT
  • VISION_GATEWAY_CONNECT_TIMEOUT

Vector gateway:

  • VISION_VECTOR_GATEWAY_ENABLED
  • VISION_VECTOR_GATEWAY_URL
  • VISION_VECTOR_GATEWAY_API_KEY
  • VISION_VECTOR_GATEWAY_COLLECTION
  • VISION_VECTOR_GATEWAY_UPSERT_ENDPOINT
  • VISION_VECTOR_GATEWAY_SEARCH_ENDPOINT

Hybrid feed:

  • DISCOVERY_V3_ENABLED
  • DISCOVERY_V3_CACHE_TTL_MINUTES
  • DISCOVERY_V3_VECTOR_SIMILARITY_WEIGHT
  • DISCOVERY_V3_VECTOR_BASE_SCORE
  • DISCOVERY_V3_MAX_SEED_ARTWORKS
  • DISCOVERY_V3_VECTOR_CANDIDATE_POOL

AI section sizing:

  • DISCOVERY_V3_SECTION_SIMILAR_STYLE_LIMIT
  • DISCOVERY_V3_SECTION_YOU_MAY_ALSO_LIKE_LIMIT
  • DISCOVERY_V3_SECTION_VISUALLY_RELATED_LIMIT

15.3 Behavior notes

  • Upload publish remains non-blocking for AI processing; derivatives can complete and the AI jobs are queued after the upload is finalized.
  • The synchronous vision-suggest endpoint is only for immediate upload-step prefill and does not replace the queued persistence path.
  • similar-ai and reverse image search return vector-gateway results only when the gateway is configured; otherwise they fail closed with explicit JSON reasons.
  • Discovery sections are now tunable from config rather than fixed in code, which makes production adjustments safe without service edits.

15. Operational Runbook

# Check last calculated timestamp
SELECT id, title, last_trending_calculated_at FROM artworks ORDER BY last_trending_calculated_at DESC LIMIT 5;

# Manually trigger recalculation
php artisan skinbase:recalculate-trending --period=all

# Re-push scores to Meilisearch
php artisan skinbase:recalculate-trending --period=7d

Windowed counters look wrong after a deploy

# Force a reset and recompute
php artisan skinbase:reset-windowed-stats --period=24h
php artisan skinbase:reset-windowed-stats --period=7d

# Then recalculate trending with fresh numbers
php artisan skinbase:recalculate-trending --period=all

Meilisearch out of sync with DB

# Re-push all artworks in the trending window
php artisan skinbase:recalculate-trending --period=all

# Or full re-index
php artisan scout:import "App\Models\Artwork"

Push updated index settings (after changing config/scout.php)

php artisan scout:sync-index-settings
SELECT
  a.id,
  a.title,
  a.published_at,
  s.views,
  s.views_24h,
  s.views_7d,
  s.downloads,
  s.downloads_24h,
  s.downloads_7d,
  s.favorites,
  a.trending_score_24h,
  a.trending_score_7d,
  a.last_trending_calculated_at
FROM artworks a
LEFT JOIN artwork_stats s ON s.artwork_id = a.id
WHERE a.is_public = 1 AND a.is_approved = 1
ORDER BY a.trending_score_7d DESC
LIMIT 20;

Inspect the artwork_downloads log

-- Downloads in the last 24 hours per artwork
SELECT artwork_id, COUNT(*) as dl_24h
FROM artwork_downloads
WHERE created_at >= NOW() - INTERVAL 1 DAY
GROUP BY artwork_id
ORDER BY dl_24h DESC
LIMIT 20;