Supervised Learning Beats Reinforcement: How 50 Samples Trained the Unitree G1's Arm

A breakthrough in humanoid robotics reveals that simple supervised learning with just 50 samples per action can achieve precise arm control – potentially eliminating the need for complex reinforcement learning approaches.
The Cartesian Control Challenge
The Unitree G1 humanoid robot presents an interesting technical challenge: while it can walk and balance effectively, its SDK provides no direct way to control arm movement in Cartesian space. Instead, developers must manually coordinate seven different motor joints to achieve desired hand positions.
Breaking Down the Solution
Rather than pursuing the traditional path of reinforcement learning, which would require complex simulation environments and reward engineering, we explored a simpler approach:
- Collect ~50 samples per action (up/down/left/right/forward/back)
- Record starting and ending joint positions for each movement
- Train a basic neural network to map commands to joint positions
The Technical Implementation
The solution uses a surprisingly minimal architecture:
| Component | Details |
|---|---|
| Model Architecture | MLP with 2 hidden layers (32 neurons each) |
| Activation Function | ReLU |
| Optimizer | Adam |
| Loss Function | Mean Squared Error |
| Training Data | ~300 total samples (50 per action) |
Safety Considerations
A critical discovery during implementation was the violent speed at which the G1’s motors can operate. The G1’s physical capabilities exceed what many assume – these aren’t toy servos, but industrial-grade actuators capable of dangerous force.
Required Safety Measures
- Implement gradual position changes rather than direct commands
- Maintain emergency stop functionality via remote control
- Test movements in damp mode before full power engagement
URDF Simulation Challenges
While attempting to validate the approach in simulation using URDF models, we encountered significant discrepancies between simulated and real-world behavior. This raises important questions about the reliability of simulation-based safety testing for humanoid robots.
Future Applications
This approach opens the door for practical applications like object retrieval tasks. The simplified control scheme, while not perfect, provides sufficient precision for basic manipulation tasks – potentially enabling household robotics applications without requiring complex reinforcement learning infrastructure.