Dataset Creation from Trajectories#
The wds_from_traj script converts trajectory JSON files into training datasets in two formats:
Output Formats#
- WebDataset: Raw bytes in tar shards, optimized for distributed training.
Basic Usage#
Create WebDataset Shards#
uv run src/agoge/wds_from_traj.py --help