1. Paper2. Highlights3. Method At A Glance4. Repository Structure5. Installation6. Data7. Quick Start8. Reproducing Results9. Configuration Notes10. Experimental Highlights11. Notes For Maintainers12. Citation13. Contact

Official PyTorch implementation for FuXi-beta: Towards a Lightweight and Fast Large-Scale Generative Recommendation Model.

arXiv Project Page

1. Paper

Yufei Ye, Wei Guo, Hao Wang, Hong Zhu, Yuyang Ye, Yong Liu, Huifeng Guo, Ruiming Tang, Defu Lian, and Enhong Chen. FuXi-beta: Towards a Lightweight and Fast Large-Scale Generative Recommendation Model. arXiv preprint arXiv:2508.10615, 2025.

Paper / PDF / Project Page / Citation

FuXi-beta studies efficiency bottlenecks in large-scale generative recommendation models and proposes a lightweight design for faster training and inference. The repository provides the PyTorch implementation and public MovieLens experiment configs.

2. Highlights

3. Method At A Glance

FuXi-beta method overview

FuXi-beta analyzes bottlenecks from relative temporal attention bias and query-key attention-map computation, then replaces expensive operations with a lightweight token-mixing path.

4. Repository Structure

.
├── configs/                                      # MovieLens experiment configs
├── generative_recommenders/modeling/sequential/  # FuXi-beta and baseline encoders
├── generative_recommenders/trainer/              # Training pipeline
├── main.py                                       # Distributed training entry
├── preprocess_public_data.py                     # MovieLens preprocessing
├── requirements.txt
└── docs/                                         # GitHub Pages project page

The FuXi-beta model code is under generative_recommenders/modeling/sequential/fuxi_beta.py.

5. Installation

Install PyTorch following the official instructions for your CUDA environment, then install dependencies:

pip install -r requirements.txt

The original quick setup used:

pip3 install gin-config absl-py scikit-learn scipy matplotlib numpy apex hypothesis pandas fbgemm_gpu iopath

6. Data

Prepare the public MovieLens data:

mkdir -p tmp/
python3 preprocess_public_data.py

7. Quick Start

Run FuXi-beta on MovieLens-1M:

CUDA_VISIBLE_DEVICES=0 python3 main.py \
  --gin_config_file=configs/ml-1m/fuxi-beta-sampled-softmax-n128-final.gin \
  --master_port=12345

Other configurations are included in configs/ml-1m/ and configs/ml-20m/.

8. Reproducing Results

A GPU with 24GB or more HBM should work for most public MovieLens settings.

Training logs are written to exps/ by default. Launch TensorBoard with:

tensorboard --logdir ~/generative-recommenders/exps/ml-1m-l200/ --port 24001 --bind_all
tensorboard --logdir ~/generative-recommenders/exps/ml-20m-l200/ --port 24001 --bind_all

9. Configuration Notes

10. Experimental Highlights

FuXi-beta public and industrial results

FuXi-beta ablation and compatibility results

The paper tables above make the lightweight accuracy-cost tradeoff visible: FuXi-beta is compared with FuXi-alpha/HSTU/SASRec variants, then analyzed through attention and compatibility ablations.

FuXi-beta is positioned as a lightweight successor to heavier generative recommendation models. The code includes both FuXi-alpha and FuXi-beta components for direct architectural comparison.

Finding Paper evidence Takeaway
Industrial accuracy On large-scale industrial datasets, FuXi-beta reports +27% to +47% NDCG@10 compared with FuXi-alpha. The lightweight design is not only an efficiency change.
Public benchmark behavior The paper reports performance comparable to prior state of the art on public datasets while significantly reducing training time. FuXi-beta targets a better accuracy-cost balance.
Query-key attention ablation Removing the query-key attention map improves MovieLens-1M NDCG@10 from 0.1871 to 0.1947 and reduces relative time from 1.000 to 0.951; on MovieLens-20M, NDCG@10 moves from 0.2097 to 0.2117 and time from 1.000 to 0.842. The paper's efficiency claim is tied to a concrete architectural simplification.
Temporal attention ablation Removing temporal attention hurts MovieLens-20M NDCG@10 from 0.2097 to 0.1863. The lightweight model still depends on explicit temporal information.

Conclusion: FuXi-beta keeps the useful temporal structure from FuXi-alpha while removing expensive attention components that are less helpful for recommendation.

11. Notes For Maintainers

12. Citation

If you find FuXi-beta useful, please cite:

@article{ye2025fuxibeta,
  title={FuXi-beta: Towards a Lightweight and Fast Large-Scale Generative Recommendation Model},
  author={Ye, Yufei and Guo, Wei and Wang, Hao and Zhu, Hong and Ye, Yuyang and Liu, Yong and Guo, Huifeng and Tang, Ruiming and Lian, Defu and Chen, Enhong},
  journal={arXiv preprint arXiv:2508.10615},
  year={2025}
}

13. Contact