Skip to content

Dataset Creation from Trajectories#

The wds_from_traj script converts trajectory JSON files into training datasets in two formats:

Output Formats#

  • WebDataset: Raw bytes in tar shards, optimized for distributed training.

Basic Usage#

Create WebDataset Shards#

uv run src/agoge/wds_from_traj.py --help