Skip to content

Agoge logging workflow#

Agoge entrypoints run with a shared logging stack defined in agoge.log. Calling setup_logging(cfg.logging) installs two handlers on the process root logger: a console handler that emits human-readable lines to stdout, and a QueueHandler that forwards structured JSON payloads into the Ray queue named logger_queue_obj. Driver scripts (e.g. src/agoge/entrypoints/rl.py, eval.py, sft.py) start a queue_to_file task that drains the queue into ${cfg.logging.log_filename} as newline-delimited JSON. Every record includes timestamp, level, module name, filename, line number, process id, and, when Ray is initialized, job / node / worker identifiers so distributed runs can be audited after the fact.

Entrypoint wiring#

The main Hydra entrypoints already handle all logging setup: each one calls setup_logging(cfg.logging), spins up the shared Ray queue, and launches queue_to_file to persist JSONL output to ${paths.nfs}. If you are building a new entrypoint, copy that pattern; otherwise, contributors should not need to invoke these helpers directly or mutate the process-wide logging configuration.

Contributor guidelines#

  • Lean on the shared setup. Entrypoints call setup_logging(cfg.logging) for you; don’t add logging.basicConfig, bespoke handlers, or second queues in module code. The shared helpers clear stale handlers to avoid duplicate emits in Ray workers and notebooks.
  • Pick explicit logger names. Instantiate a module-level logger once (logger = logging.getLogger("agoge.runner")). Names should align with the Python package path so the hierarchy in configs/logging/default.yaml applies cleanly. __name__ is the module’s import path (e.g., agoge.runner.train_loop); you can use it if the module already lives under the agoge. namespace, but spell the string out when the natural name is shorter or when the code sits outside that tree (e.g., task-generator utilities).
  • Avoid silent exception handling. Catch only when you can add context or recover, log with logger.exception("context...") to preserve the traceback, and re-raise unless you have a well-documented alternative code path. Suppressing errors without emitting a structured log makes debugging distributed jobs nearly impossible.
  • Do not mutate global logging config ad hoc. Adjust per-logger levels with logging.getLogger("agoge.inference_manager").setLevel(logging.DEBUG) during debugging, but prefer committing config changes to configs/logging/*.yaml so overrides propagate through Ray’s worker_process_setup_hook.
  • Runtime defaults live in configs/logging/default.yaml. Override them via the CLI (uv run src/agoge/entrypoints/eval.py logging.level=DEBUG logging.log_filename=/tmp/agoge.jsonl) or by editing the config. Use the level_overrides mapping to bump individual components when chasing a bug, and drop the overrides once the investigation is over. For one-off experiments you can tweak a specific logger at runtime, but remember to restore the previous level when done.

JSONL logs land on the shared NFS path by default (${paths.nfs}/agoge.log), which resolve per-run when a run_id is set. To inspect them mid-run, tail the file (tail -f ${paths.nfs}/agoge.log) or load them into a notebook / command-line filter that understands JSON Lines. Because each record contains the logger name, you can filter quickly (jq 'select(.logger | startswith("agoge.runner"))'). Remember to send the None sentinel through the queue (handled in the entrypoints) so the queue_to_file task flushes and exits cleanly when jobs finish.