Wire admin studio SSR and search infrastructure

This commit is contained in:
2026-05-01 11:46:06 +02:00
parent 257b0dbef6
commit 18cea8b0f0
329 changed files with 197465 additions and 2741 deletions

View File

@@ -0,0 +1,473 @@
# MySQL Slow Query Optimization Plan
**Source:** `/var/log/mysql/slow.log` — 68,950 total queries, 151,576s total exec time
**Period:** 2026-04-04 → 2026-04-26 (22 days)
**Threshold:** 500ms
**Server:** server3 / database: `skinbase`
---
## Summary Stats
| Metric | Value |
|---|---|
| Total queries logged | 68,950 |
| Unique query fingerprints | 139 |
| Total execution time | 151,576s (~42h) |
| Average exec time | 2s |
| 95th percentile | 3s |
| Total rows examined | 15.39B |
| Total bytes sent | 40.79GB |
---
## Priority 1 — Critical (fix immediately)
### P1-A: Correlated subquery counting artworks per tag (Query 8)
**Total time:** 1,004s · **Calls:** 1,138 · **Rows examined/call:** ~240k
**Current query:**
```sql
SELECT tags.*,
(SELECT count(*) FROM artworks
INNER JOIN artwork_tag ON artworks.id = artwork_tag.artwork_id
WHERE tags.id = artwork_tag.tag_id AND artworks.deleted_at IS NULL)
AS artworks_count
FROM tags
ORDER BY artworks_count DESC
LIMIT 10
```
**Problem:** N+1 correlated subquery — one full `artworks JOIN artwork_tag` count per tag row.
**Fix options (pick one):**
1. **Best — cached counter column:** Add `artworks_count INT DEFAULT 0` to `tags`, maintained by an Eloquent observer on `artwork_tag` attach/detach. Query becomes `SELECT * FROM tags ORDER BY artworks_count DESC LIMIT 10` — instant.
2. **Quick — JOIN + GROUP BY:**
```sql
SELECT t.*, COALESCE(cnt.c, 0) AS artworks_count
FROM tags t
LEFT JOIN (
SELECT at.tag_id, COUNT(*) AS c
FROM artwork_tag at
JOIN artworks a ON a.id = at.artwork_id AND a.deleted_at IS NULL
GROUP BY at.tag_id
) cnt ON cnt.tag_id = t.id
ORDER BY artworks_count DESC
LIMIT 10;
```
**Migration needed:** `php artisan make:migration add_artworks_count_to_tags`
---
### P1-B: Correlated subquery counting artworks per user (Query 33)
**Total time:** 85s · **Calls:** 131 · **Rows examined/call:** ~61k
**Current query:**
```sql
SELECT users.*,
(SELECT count(*) FROM artworks
WHERE users.id = artworks.user_id AND is_approved=1 AND is_public=1 ...)
AS artworks_count
FROM users
HAVING artworks_count > 0
ORDER BY artworks_count DESC
LIMIT 6
```
**Problem:** Same correlated N+1 pattern per user. `HAVING` on a subquery forces a full users scan.
**Fix:** Use the existing `user_statistics` table. Add `public_artworks_count INT DEFAULT 0` if not present, maintained by artwork publish/unpublish observer. Then:
```sql
SELECT users.*, us.public_artworks_count AS artworks_count
FROM users
JOIN user_statistics us ON us.user_id = users.id
WHERE us.public_artworks_count > 0 AND users.deleted_at IS NULL
ORDER BY us.public_artworks_count DESC
LIMIT 6;
```
---
### P1-C: Jobs table LIKE scan on JSON payload (Query 38)
**Total time:** 64s · **Calls:** 58 · **Rows examined/call:** ~55k
**Current query:**
```sql
SELECT count(*) FROM jobs
WHERE payload LIKE '%AutoTagArtworkJob%' AND payload LIKE '%69756%'
```
**Problem:** Full scan of the `jobs` table JSON payload column — no index possible on LIKE '%...%'.
**Fix:** Replace this deduplication check with a Redis key or a dedicated `job_dedup` table with an indexed `(job_class, subject_id)` column. Example:
```php
// Instead of scanning jobs table:
if (Cache::has("auto-tag-queued:{$artworkId}")) return;
Cache::put("auto-tag-queued:{$artworkId}", true, now()->addHours(1));
AutoTagArtworkJob::dispatch($artwork);
```
---
### P1-D: `SELECT DISTINCT artwork_id` from snapshots with no index (Query 12)
**Total time:** 923s · **Calls:** 280 · **Rows examined/call:** ~3.97M (max 8.3M!)
**Current query:**
```sql
SELECT DISTINCT artwork_id
FROM artwork_metric_snapshots_hourly
WHERE bucket_hour BETWEEN '...' AND '...'
```
**Problem:** No covering index on `(bucket_hour, artwork_id)`. Full or large range scan every time.
**Fix:** Add a compound index:
```sql
ALTER TABLE artwork_metric_snapshots_hourly
ADD INDEX idx_bucket_artwork (bucket_hour, artwork_id);
```
Migration: `php artisan make:migration add_index_bucket_artwork_to_metric_snapshots`
---
### P1-E: Full-text LIKE searches on artworks title + description (Queries 50, 53)
**Total time:** ~62s combined · **Calls:** ~48 · Pattern: `WHERE title LIKE '%keyword%' OR description LIKE '%keyword%' OR ...` (20+ OR conditions)
**Problem:** Leading wildcard LIKE cannot use B-tree indexes. Full table scan every time.
**Fix:** Use **Meilisearch** (already in use for artwork search). Route AI-tag search queries through `ArtworkSearchService` instead of raw LIKE. For any fallback that must stay in MySQL:
```sql
-- Add FULLTEXT index:
ALTER TABLE artworks ADD FULLTEXT INDEX ft_artwork_text (title, description);
-- Then use MATCH..AGAINST instead of LIKE:
WHERE MATCH(title, description) AGAINST ('+moon +lunar' IN BOOLEAN MODE)
```
---
## Priority 2 — High Impact
### P2-A: Artwork aggregate stats queries — top 2 time consumers (Queries 1 & 2)
**Total time:** 83,766s + 34,068s = **117,834s** (78% of all slow query time)
**Calls:** 37,124 + 13,789 = ~51k · **Avg:** 2.2s · **Rows examined/call:** ~215234k
These are the same heavy SELECT pattern loading per-artwork stats from multiple tables:
```sql
SELECT a.id, a.user_id, a.published_at, a.is_public, a.is_approved,
(a.thumb_ext IS NOT NULL AND a.thumb_ext != '') AS has_thumbnail,
COALESCE(ast.views, 0) AS views_all,
COALESCE(ast.downloads, 0) AS downloads_all,
COALESCE(ast.favorites, 0) AS favourites_all,
COALESCE(cc.cnt, 0) AS comments_count,
COALESCE(sc.cnt, 0) AS shares_count,
COALESCE(ast.views_7d, 0) AS views_7d,
...
FROM artworks a
LEFT JOIN artwork_stats ast ON ast.artwork_id = a.id
LEFT JOIN (SELECT artwork_id, COUNT(*) cnt FROM artwork_favourites WHERE created_at >= ...) fav7 ...
LEFT JOIN (SELECT artwork_id, COUNT(*) cnt FROM artwork_comments ...) cc ...
LEFT JOIN (SELECT artwork_id, COUNT(*) cnt FROM artwork_shares ...) sc ...
WHERE a.is_public = 1 AND a.is_approved = 1 AND a.deleted_at IS NULL
AND a.published_at <= NOW()
ORDER BY a.id ASC
LIMIT 500
```
**Problems:**
- Inline derived subqueries for `fav7`, `cc`, `sc` run per page — not cached.
- `ORDER BY a.id` with filters requires index on `(deleted_at, is_public, is_approved, published_at, id)`.
- 51k calls/22 days = ~2,300 calls/day = every ~37 seconds, all day long. This is a **scheduled job or background process** churning through artworks.
**Fix — composite index (immediate):**
```sql
ALTER TABLE artworks
ADD INDEX idx_public_approved_published (deleted_at, is_public, is_approved, published_at, id);
```
**Fix — pre-aggregate counts (medium term):**
Ensure `artwork_stats` already stores `favorites_7d`, `comments_count`, `shares_count`. If so, move all counts to `artwork_stats` maintenance jobs and remove the inline derived joins. The SELECT becomes a single fast `LEFT JOIN artwork_stats`.
**Fix — reduce call frequency:**
If this is a scheduler-driven scan, batch it into chunks with exponential backoff and persist cursor position so it doesn't re-scan from scratch every run.
---
### P2-B: User stats calculation — 28 second average (Query 3)
**Total time:** 8,847s · **Calls:** 320 · **Avg:** 28s · **Max:** 180s!
Complex query joining `users`, `user_xp_logs`, `user_followers`, `artwork_likes`, `artworks`, `artwork_metric_snapshots_hourly`.
**Fix:** This should **never run on demand**. Route it to:
1. A scheduled background job that pre-aggregates into `user_statistics` (runs every N minutes).
2. The controller/service reads from `user_statistics` only — single-row lookup by `user_id`.
Also ensure `user_statistics` has `INDEX idx_user_id (user_id)`.
---
### P2-C: `artwork_metric_snapshots_hourly` heavy join query (Query 4)
**Total time:** 6,328s · **Calls:** 237 · **Avg:** 27s · **Max:** 52s
```sql
SELECT artworks.*, ...
FROM artworks
JOIN artwork_metric_snapshots_hourly amsh ON amsh.artwork_id = artworks.id
JOIN artwork_likes al ON ...
JOIN artwork_downloads ad ON ...
JOIN artwork_comments ac ON ...
WHERE amsh.bucket_hour BETWEEN ... AND ...
```
**Fix:**
1. Add index from P1-D: `idx_bucket_artwork (bucket_hour, artwork_id)`.
2. Pre-aggregate hourly snapshots into daily/weekly summary tables and query those instead.
3. Reduce time range of `BETWEEN` clause if querying recent data only.
---
### P2-D: rank_artwork_scores queries — ~10 variants (Queries 26, 27, 29, 37, 39, 41, 42, 43, 46)
**Total time:** ~750s combined · ~870 combined calls · **Avg:** ~750ms · **Rows examined/call:** ~140150k
Pattern:
```sql
SELECT ras.artwork_id, a.user_id, ras.score_trending
FROM rank_artwork_scores ras
INNER JOIN artworks a ON a.id = ras.artwork_id AND a.is_public=1 AND a.is_approved=1 AND a.deleted_at IS NULL
WHERE ras.model_version = 'rank_v2'
ORDER BY ras.score_trending DESC
LIMIT 200
```
**Problem:** `WHERE model_version = 'rank_v2' ORDER BY score_X DESC` — no composite index covers both.
**Fix — add partial/composite indexes:**
```sql
ALTER TABLE rank_artwork_scores
ADD INDEX idx_mv_trending (model_version, score_trending DESC),
ADD INDEX idx_mv_new_hot (model_version, score_new_hot DESC),
ADD INDEX idx_mv_best (model_version, score_best DESC),
ADD INDEX idx_mv_score (model_version, score_new_hot, score_trending, score_best);
```
**Fix — cache ranking results:**
These are pre-computed ranking scores. Cache the TOP 200 list per `(model_version, score_column)` for 515 minutes in Redis. The ranking job already runs on a schedule — warm the cache at the end of each ranking job run.
---
### P2-E: `artworks` public count via full scan (Query 30)
**Total time:** 100s · **Calls:** 127 · **Avg:** 790ms · **Rows examined/call:** 97k
```sql
SELECT count(*) FROM artworks
WHERE deleted_at IS NULL AND is_approved=1 AND is_public=1
AND published_at IS NOT NULL AND published_at <= NOW()
```
**Fix — maintain a counter cache:**
```php
// In config or cache:
Cache::remember('artworks.public_count', 300, fn() => Artwork::public()->published()->count());
```
Or store the count in a `site_statistics` / `system_settings` table, updated by the publish observer.
---
### P2-F: artworks ORDER BY id scanner for publish pipeline (Queries 18, 22, 23, 40, 47)
**Total time:** ~650s combined · Many calls · Pattern: `SELECT * FROM artworks WHERE ... ORDER BY id ASC` or `ORDER BY trending_score_7d DESC`
**Problem:** `SELECT *` loads all columns including large `description` blobs. The publish/ranking pipeline only needs IDs.
**Fix:**
- Use `SELECT id` or `SELECT id, user_id` instead of `SELECT *`.
- Ensure `trending_score_7d` is indexed if using `ORDER BY trending_score_7d`.
- Add index from P2-A: `idx_public_approved_published`.
---
## Priority 3 — Medium Impact
### P3-A: Popular tags query with tag_interaction_daily_metrics (Query 6)
**Total time:** 2,234s · **Calls:** 2,349 · **Avg:** 951ms
Joining `artworks → artwork_tag → tags → tag_interaction_daily_metrics`. 246k rows examined/call.
**Fix:**
- Cache the result: popular tags change slowly — cache 515 minutes.
- Add index on `tag_interaction_daily_metrics (tag_id, metric_date)`.
- Precompute `tag_interaction_daily_metrics` aggregates into a `tag_trending_scores` table.
---
### P3-B: Browse/gallery category + tag joins (Query 7)
**Total time:** 1,758s · **Calls:** 2,344 · **Avg:** 750ms
`artworks + categories + artwork_category + artwork_tag` — 80M total rows examined.
**Fix:**
- Verify indexes: `artwork_category(category_id, artwork_id)`, `artwork_tag(tag_id, artwork_id)`.
- Pagination: ensure cursor/keyset pagination is used, not `OFFSET`.
- Cache browse results per category (already partially done in HomepageService).
---
### P3-C: artwork_metric_snapshots_hourly backup full scan (Query 15)
**Total time:** 646s · **Calls:** 93 · **Max:** 127s · **User:** `backuper`
`SELECT /*!40001 SQL_NO_CACHE */ * FROM artwork_metric_snapshots_hourly` — mysqldump reading full table.
**Fix (ops):**
- Partition `artwork_metric_snapshots_hourly` by month on `bucket_hour`. Backup only reads the active partition.
- Or: exclude this table from hot backup and back it up separately during low-traffic window (02:0004:00 UTC).
- Archive data older than 90 days to a cold table.
---
### P3-D: `artworks LEFT JOIN artwork_stats` with OR condition (Query 32)
**Total time:** 86s · **Calls:** 110
```sql
WHERE (artworks.created_at >= '...' OR (s.ranking_score IS NOT NULL AND s.ranking_score > 0))
```
**Problem:** OR prevents index usage on `created_at`.
**Fix:** Rewrite as UNION:
```sql
SELECT id FROM artworks WHERE created_at >= '...' AND deleted_at IS NULL AND is_approved=1
UNION
SELECT a.id FROM artworks a JOIN artwork_stats s ON s.artwork_id = a.id
WHERE s.ranking_score > 0 AND a.deleted_at IS NULL AND a.is_approved=1
```
---
### P3-E: GROUP BY user_id from artworks (Query 31)
**Total time:** 92s · **Calls:** 110
```sql
SELECT a.user_id, COALESCE(us.followers_count, 0), COALESCE(us.favorites_received_count, 0)
FROM artworks a
LEFT JOIN user_statistics us ON us.user_id = a.user_id
WHERE a.is_public=1 AND a.is_approved=1 AND a.deleted_at IS NULL
GROUP BY a.user_id, us.followers_count, us.favorites_received_count
```
**Problem:** Full artworks scan + GROUP BY without a covering index.
**Fix:** Add index `(deleted_at, is_public, is_approved, user_id)` on artworks. Or pre-aggregate into `user_statistics` and query that directly without touching `artworks`.
---
### P3-F: Forum posts moderation scan (Query 62)
**Total time:** 12s · **Calls:** 15
```sql
SELECT * FROM forum_posts
WHERE (moderation_checked = 0 OR last_ai_scan_at IS NULL OR updated_at > last_ai_scan_at)
AND id IS NOT NULL AND deleted_at IS NULL
ORDER BY id ASC LIMIT 50
```
**Fix:** Add a partial index for unmoderated posts:
```sql
ALTER TABLE forum_posts ADD INDEX idx_needs_moderation (moderation_checked, last_ai_scan_at, id)
WHERE moderation_checked = 0;
```
Or maintain a `moderation_queue` table with only pending IDs.
---
## Recommended Index Migrations
Apply these in order (fastest wins first):
```bash
php artisan make:migration add_performance_indexes_batch1
```
```php
public function up(): void
{
// P1-D: snapshot bucket+artwork lookup
Schema::table('artwork_metric_snapshots_hourly', function (Blueprint $table) {
$table->index(['bucket_hour', 'artwork_id'], 'idx_bucket_artwork');
});
// P2-A + P2-F: public artwork scans & ORDER BY id
Schema::table('artworks', function (Blueprint $table) {
$table->index(
['deleted_at', 'is_public', 'is_approved', 'published_at', 'id'],
'idx_public_approved_published_id'
);
$table->index(
['deleted_at', 'is_public', 'is_approved', 'user_id'],
'idx_public_approved_user'
);
});
// P2-D: rank_artwork_scores per model_version
Schema::table('rank_artwork_scores', function (Blueprint $table) {
$table->index(['model_version', 'score_trending'], 'idx_mv_trending');
$table->index(['model_version', 'score_new_hot'], 'idx_mv_new_hot');
$table->index(['model_version', 'score_best'], 'idx_mv_best');
$table->index(['model_version', 'score_new_hot'], 'idx_mv_score');
});
// P3-A: tag daily metrics
Schema::table('tag_interaction_daily_metrics', function (Blueprint $table) {
$table->index(['tag_id', 'metric_date'], 'idx_tag_date');
});
// P1-A: tag artworks_count cached column
Schema::table('tags', function (Blueprint $table) {
$table->unsignedInteger('artworks_count')->default(0)->after('slug');
$table->index('artworks_count', 'idx_artworks_count');
});
}
```
---
## Caching Quick Wins (no schema change needed)
| Surface | Cache Key | TTL | Notes |
|---|---|---|---|
| Popular tags | `tags.popular.10` | 15 min | Currently: correlated subquery per request |
| Ranking top-200 lists | `rank.{model}.{score}.200` | 10 min | Warm at end of ranking job |
| Public artwork count | `artworks.public_count` | 5 min | Used in sitemaps, stats |
| User artworks_count | `user.{id}.artworks_count` | 5 min | Warm on publish/unpublish |
| Group leaderboard | `leaderboard.group.monthly.5` | 30 min | Already in leaderboards service |
---
## Implementation Roadmap
| Phase | Items | Effort | Expected gain |
|---|---|---|---|
| **Week 1** | P1-A correlated tag query, P1-C jobs LIKE, P1-D snapshot index, all P2-D rank indexes | LowMed | ~1520% reduction in slow queries |
| **Week 2** | P2-A artworks composite index, P1-B user artworks_count, caching quick wins table | Med | ~4050% reduction; kills #1 and #2 slow query families |
| **Week 3** | P2-B user stats background job, P2-C snapshot pre-aggregation, P1-E fulltext index | High | ~6575% reduction; kills 28s queries |
| **Month 2** | P2-F SELECT * → SELECT id, P3-C partition snapshots table, P3-D OR→UNION rewrite | High | Remaining tail |
---
## Monitoring After Changes
```bash
# Reset slow query log on server
mysql -e "FLUSH SLOW LOGS;"
# Re-run pt-query-digest after 1 week:
pt-query-digest /var/log/mysql/slow.log > /tmp/slow-report-after.txt
# Check query plan for top queries:
EXPLAIN SELECT ... \G
```
Key metrics to watch:
- Total slow query count per day (target: -50% in week 2)
- `Rows_examined` for artwork queries (target: <10k instead of 234k)
- MySQL CPU usage during ranking job windows