Embodied Continual Learning

Learning New Tasks via Reusable Skills: Skill-Compositional Experts for Embodied Continual Learning

SCE enables robots to continually acquire new tasks by decomposing demonstrations into reusable skills and composing them through dual execution-and-transition experts.

Shuaike Zhang¹, Shaokun Wang¹, Haoyu Tang², Jianlong Wu^1,3, Liqiang Nie^1,3

¹Harbin Institute of Technology, Shenzhen
²Shandong University
³Shenzhen Loop Area Institute

Paper Code coming soon

Overview comparing embodied continual learning with reusable skill composition in SCE. — Closed-loop control amplifies the impact of old-task feature drift, while SCE composes new tasks from a skill base.

Abstract

Embodied Continual Learning (ECL) aims to enable robots to continually acquire new manipulation tasks while retaining previously learned behaviors under closed-loop control. Compared with conventional continual learning, ECL suffers from more severe catastrophic forgetting. Feature drift accumulated under closed-loop control progressively propagates through sequential decision-making, leading to degradation of previously learned behaviors. A key challenge in ECL lies in structured skill reuse across continually evolving tasks, since existing methods primarily focus on skill learning without explicitly organizing them for coherent task execution.

To address this issue, we propose SCE, a Skill-Compositional Experts framework for ECL. SCE builds a skill base via Compositional Skill Grounding (CSG), which decomposes task demonstrations into reusable skills. Based on this, Dual Execution-and-Transition Experts (DETE) enable new task learning through skill composition, where one branch ensures skill execution and the other supports transitions between skills for coherent behavior. Experiments on LIBERO benchmarks and real-world manipulation tasks demonstrate that SCE consistently improves retention and overall task performance. Further feature drift analyses and ablation studies verify the effectiveness of our method. The code and real-world datasets will be made publicly available.

Method

CSG builds the skill base; DETE composes skills

Compositional Skill Grounding

CSG partitions each demonstration into temporally coherent segments using robot-state dynamics, then uses VLM-grounded induction to assign skill labels and update the skill base.

Execution Expert Branch

DETE routes each input to a skill-specific expert through a skill distribution, giving each skill dedicated capacity and reducing interference with unrelated skills.

Transition Expert Branch

A transition-aware router models cross-skill dependencies around skill boundaries, and adaptive fusion balances execution and transition outputs during closed-loop action generation.

Overall SCE framework with CSG and DETE modules. — Overall framework of SCE. Demonstrations are decomposed into reusable skills, and DETE augments a pretrained VLA action decoder for skill-specific execution and coherent cross-skill composition.

Results

Stronger retention and final success across simulation and real-world tasks

The reported metrics follow the paper protocol: FWT measures new-task performance, NBT measures forgetting on previously learned tasks, AUC summarizes performance across stages, and Final SR reports final-stage average success.

LIBERO-Goal 83.4

Final SR with 4.3 NBT and 82.5 AUC.

LIBERO-Long 73.4

Final SR with 0.5 NBT and 70.0 AUC.

Real World 83.3

Final SR with 0.0 NBT and 82.8 AUC.

Main results under embodied continual learning
Method	LIBERO-Goal				LIBERO-Long
Method	FWT	NBT	AUC	Final SR	FWT	NBT	AUC	Final SR
Sequential	81.2	73.7	24.0	7.4	62.4	58.4	14.8	4.0
ER	75.2	17.6	60.6	45.0	74.4	22.8	56.2	51.8
Task-MoE	84.5	15.8	71.9	64.8	72.8	12.0	63.8	57.8
SCE	86.2	4.3	82.5	83.4	69.6	0.5	70.0	73.4

Representative real-world manipulation snapshots for three SCE tasks. — Real-world tasks include Box-to-Board, Board-to-Bowl, and Box-to-Bowl manipulation with shared behavioral elements such as object retrieval, transfer, placement, and lid opening.

Real-world continual learning results
Method	FWT	NBT	AUC	Final SR
Sequential	56.7	50.0	27.2	6.7
ER	53.3	26.7	36.7	20.0
Task-MoE	63.3	10.0	57.8	56.7
SCE	83.3	0.0	82.8	83.3

Stage-wise real-world success rates for SCE, Task-MoE, ER, and Sequential. — Stage-wise success rates on the three real-world tasks. Ours denotes SCE, which maintains stronger performance across the real-world continual learning stream.

Videos

Real-World Video Demos

The videos show successful SCE rollouts and representative failure cases from baseline methods. All videos are played at 2x speed.

SCE success demos

RW-1: Box-to-Board

SCE moves the block from the box to the wooden board.

RW-2: Board-to-Bowl

SCE opens the lid and transfers the block from the board into the bowl.

RW-3: Box-to-Bowl

SCE composes the learned manipulation skills to move the block from the box into the bowl.

Baseline failure cases

Task-MoE: RW-2 Failure

The policy hesitates between gripper-state changes and upward motion, causing execution to remain stuck around these actions.

ER: RW-2 Failure

Directly introducing previous-task demonstrations can weaken adaptation to the new task, leading to insufficient task execution.

Analysis

Why SCE Works

Adaptive balancing coefficient and transition expert contribution around skill transitions. — Around skill transitions, the adaptive coefficient shifts weight from skill-specific execution toward the transition expert branch for coherent cross-skill behavior.