2026-05-12 日报

主题: 工业级推荐系统全栈突破与 LLM 缩放/几何理论双线深化

标签: semantic-id · industrial · parameter-scaling · pretrained-lm · recursive-depth

📊 统计: 共 10 篇 · 精读 6 · 🏢 工业界 3 · 🎓 学术 7 · llm 3 · discriminative-rec 3 · generative-rec 1 · other 3

综述

本日共 10 篇论文，6 篇精读 4 篇略读，覆盖 LLM（3 篇）、判别式/生成式推荐（4 篇）与系统/检索（3 篇），工业与学术约各半。Arena Physica 的 "Practical Scaling Laws" 提出 8 参数闭合形式 L(N,D,T)，将 loss 分解为 undercapacity/undertraining/overfitting 并在 5 个公开 LLM grid 上将外推 RMSE 平均降低 49%，给出数据-算力价格比驱动的成本最优分配。快手 UxSID 开辟超长序列建模的 "第三范式"——用 target SID 作为语义路由键索引 (UID, SID) 离线压缩兴趣 memory，实现 O(1) 在线推理，4 亿用户广告 A/B 实现 +0.337% Revenue 仅增 0.16ms 延迟。百度 LASAR 首次将 Coconut 风格递归 hidden-state latent reasoning 完整移植到 decoder-only 生成式推荐，配合两阶段解耦与 REINFORCE 自适应推理深度，在 Amazon 三数据集近全 SOTA 且比显式 CoT 快约 20×。小红书 CCD 级线程编排在 chiplet 多核 CPU 上为 HNSW/IVF 取得 1.4–3.7× 吞吐与 30–90% 延迟改善。学术侧 "Geometric Wall" 用 Fisher-Rao 信息几何在 844 个 Gemma Scope checkpoints 上证明 SAE 重构的层级差异由流形内禀维度与多尺度曲率决定，几何回归在 2B↔9B 间跨模型迁移 R²>0.92。整体看，工业线条围绕 Semantic ID + 超长序列 + 系统级吞吐展开闭环，理论线条则把 "scaling" 从经验拟合推进到几何与价格比驱动的可解释边界，值得持续关注 latent reasoning 与 manifold-aware scaling 的交叉。

重点论文

Practical Scaling Laws: Converting Compute into Performance in a Data-Constrained World · ⭐ 9/10

🎓 学术 · LLM

提出 8 参数闭合形式扩展 L(N,D,T)=E+(L₀−E)h/(1+h)，把 loss 分解为 undercapacity、undertraining、overfitting 三项并用饱和包装器限定在 [E, L₀]；跨 4 个架构域和 5 个公开 LLM grid 取得 SOTA 外推，并给出 data-vs-compute 价格比驱动的 closed-form 成本最优分配。

UxSID · ⭐ 9/10

UxSID: Semantic-Aware User Interests Modeling for Ultra-Long Sequence

🏢 Kuaishou · 判别式推荐

UxSID 提出 ULSM 的第三条范式：用 target SID 作为语义路由键，按 (UID, SID) 索引离线压缩用户兴趣 memory，online O(1) 拉取，在快手 4 亿用户广告平台一周 A/B 实现 +0.337% Revenue 且仅增加 +0.16 ms 延迟。

LASAR · ⭐ 8/10

LASAR: Latent Adaptive Semantic Aligned Reasoning for Generative Recommendation

🏢 Baidu · 生成式推荐

把 Coconut 风格递归 hidden-state feedback latent reasoning 首次完整移植到主流 decoder-only 生成式推荐：两阶段解耦（先 SID alignment 再 latent loop）+ 每步 bidirectional KL 对齐到 explicit CoT 段 + Policy Head + REINFORCE 做样本级自适应推理步数，在 Amazon 三数据集上几乎全 SOTA，比生成显式 CoT 快约 20×。

CCD-Level and Load-Aware Thread Orchestration for In-Memory Vector ANNS on Multi-Core CPUs · ⭐ 8/10

🏢 Xiaohongshu · 其他

针对 chiplet 时代多 CCD CPU 上向量 ANNS 加核不加吞吐的瓶颈，提出 hot-cold 均衡映射 + CCD 拓扑感知任务窃取 + 快照重映射的统一线程编排框架，在 RedNote 生产环境 HNSW/IVF 服务上取得 1.4–3.7× 吞吐和 30–90% P50/P999 延迟改善。

The Geometric Wall: Manifold Structure Predicts Layerwise Sparse Autoencoder Scaling Laws · ⭐ 7/10

🎓 学术 · LLM

用 pullback 信息几何把 SAE 重构误差的层级差异归因于激活流形的内禀维度与多尺度曲率：在 844 个 Gemma Scope checkpoints 上拟出几何条件 scaling law，在 2B↔9B 之间几何回归系数迁移 R²>0.92，识别 SAE 遭遇的不是有限算力天花板而是流形几何决定的几何墙。

全部论文

模型	标题	类别	公司	摘要分	精读分
—	Practical Scaling Laws: Converting Compute into Performance in a Data-Constrained World	LLM	🎓 学术	8	9
UxSID	UxSID: Semantic-Aware User Interests Modeling for Ultra-Long Sequence	判别式	🏢 Kuaishou	8	9
LASAR	LASAR: Latent Adaptive Semantic Aligned Reasoning for Generative Recommendation	生成式	🏢 Baidu	8	8
—	CCD-Level and Load-Aware Thread Orchestration for In-Memory Vector ANNS on Multi-Core CPUs	其他	🏢 Xiaohongshu	0	8
—	Language Models Without a Trainable Input Embedding Table: Learning from Fixed Minimal Binary Token Codes	LLM	🎓 学术	0	7
—	The Geometric Wall: Manifold Structure Predicts Layerwise Sparse Autoencoder Scaling Laws	LLM	🎓 学术	0	7
—	A General Framework for Multimodal LLM-Based Multimedia Understanding in Large-Scale Recommendation Systems	判别式	🎓 学术	6	—
CVA	Compressed Video Aggregator: Content-driven Module for Efficient Micro-Video Recommendation	判别式	🎓 学术	6	—
NumColBERT	NumColBERT: Non-Intrusive Numeracy Injection for Late-Interaction Retrieval Models	其他	🎓 学术	4	—
Reddit2Deezer	Reddit2Deezer: A Scalable Dataset for Real-World Grounded Conversational Music Recommendation	其他	🎓 学术	4	—