โ
ICLR 2026
SP-VLA: A Joint Model Scheduling and Token Pruning Approach for VLA Model Acceleration
Unified framework for real-time VLA inference โ model scheduling + token pruning
Ye Li, Yuan Meng, Zewen Sun, Kangye Ji, Chen Tang, Jiajun Fan,
Xinzhu Ma, Shu-Tao Xia, Zhi Wang, Wenwu Zhu
Tsinghua University & UIUC
TL;DR โ VLA models are powerful but too slow for real-time robotics. SP-VLA jointly addresses temporal redundancy (via action-aware model scheduling) and spatial redundancy (via spatio-semantic token pruning), achieving significant speedup while maintaining or improving task performance.
โ๏ธ Two Complementary Mechanisms
๐ค Action-Aware Model Scheduling
Categorizes VLA actions into deliberative (complex, needs full VLA) and intuitive (routine, handled by lightweight generator). Dynamically switches between models to reduce temporal redundancy โ inspired by human intuition vs deliberation.
๐ Spatio-Semantic Token Pruning
Classifies tokens by both spatial importance and semantic relevance, then prunes redundant ones before VLA inference. Reduces spatial redundancy in visual input without harming task-critical information.
๐ Results
1.5ร
Lossless speedup
on LIBERO benchmark
2.4ร
Speedup
on SimplerEnv
+6%
Avg performance gain
over baseline VLA
2.2ร
Inference frequency
improvement (SimplerEnv)
๐
Publication Journey
Oct 2025
Submitted to ICLR 2026
Submitted to The Fourteenth International Conference on Learning Representations (Submission #944).
Jan 2026
โ
Accepted at ICLR 2026 (Poster)
Apr 2026
Presented at ICLR 2026 ยท Rio de Janeiro
๐ Cite This Paper
BibTeX (OpenReview):
@inproceedings{li2026spvla,
title={{SP}-{VLA}: A Joint Model Scheduling and Token Pruning
Approach for {VLA} Model Acceleration},
author={Ye Li and Yuan Meng and Zewen Sun and Kangye Ji and
Chen Tang and Jiajun Fan and Xinzhu Ma and
Shu-Tao Xia and Zhi Wang and Wenwu Zhu},
booktitle={The Fourteenth International Conference on
Learning Representations},
year={2026},
url={https://openreview.net/forum?id=RwdGIIjPlC}
}