Supervised Learning Beats Reinforcement: How 50 Samples Trained the Unitree G1's Arm

A breakthrough in humanoid robotics reveals that simple supervised learning with just 50 samples per action can achieve precise arm control – potentially eliminating the need for complex reinforcement learning approaches.

The Cartesian Control Challenge

The Unitree G1 humanoid robot presents an interesting technical challenge: while it can walk and balance effectively, its SDK provides no direct way to control arm movement in Cartesian space. Instead, developers must manually coordinate seven different motor joints to achieve desired hand positions.

Breaking Down the Solution

Rather than pursuing the traditional path of reinforcement learning, which would require complex simulation environments and reward engineering, we explored a simpler approach:

Collect ~50 samples per action (up/down/left/right/forward/back)
Record starting and ending joint positions for each movement
Train a basic neural network to map commands to joint positions

The Technical Implementation

The solution uses a surprisingly minimal architecture:

Component	Details
Model Architecture	MLP with 2 hidden layers (32 neurons each)
Activation Function	ReLU
Optimizer	Adam
Loss Function	Mean Squared Error
Training Data	~300 total samples (50 per action)

Safety Considerations

A critical discovery during implementation was the violent speed at which the G1’s motors can operate. The G1’s physical capabilities exceed what many assume – these aren’t toy servos, but industrial-grade actuators capable of dangerous force.

Required Safety Measures

Implement gradual position changes rather than direct commands
Maintain emergency stop functionality via remote control
Test movements in damp mode before full power engagement

URDF Simulation Challenges

While attempting to validate the approach in simulation using URDF models, we encountered significant discrepancies between simulated and real-world behavior. This raises important questions about the reliability of simulation-based safety testing for humanoid robots.

Future Applications

This approach opens the door for practical applications like object retrieval tasks. The simplified control scheme, while not perfect, provides sufficient precision for basic manipulation tasks – potentially enabling household robotics applications without requiring complex reinforcement learning infrastructure.