Contributing to Agoge#

Getting started#

Quick navigation:

Typical workflow - Start here for most contributions
Specialized topics - Hydra configs, external sources
Getting help

Contributing code#

Pull Request Process#

Branch naming: <name/login>/<issue_number>_<issue_name> (automation coming soon), optional for small changes
Make PRs descriptive: At least the top PR should have a description of what is happening and why, in short form and with a reference if this is described somewhere else (issue, etc.)
Keep PRs small: rule of thumb <= 8 h of work or <= 300 LOC per PR
Break down large work‑items: split into design PR (stubs, schemas, config skeleton) + implementation PR(s). Submit individual PRs for refactoring
Review process: keep main stable, code area owners in /CODEOWNERS give majority of the feedback related to structure and vision, additional experts dwell more on the contents of the feature, everyone oversees coding conventions and tests

Keep PRs small#

Small PRs are easier to review, faster to merge, and less likely to introduce bugs. For reference change list guidelines see.

Please read reference: "why small"

Why Write Small CLs(change lists)? (source)

Small, simple CLs are:

Reviewed more quickly. It’s easier for a reviewer to find five minutes several times to review small CLs than to set aside a 30 minute block to review one large CL.
Reviewed more thoroughly. With large changes, reviewers and authors tend to get frustrated by large volumes of detailed commentary shifting back and forth—sometimes to the point where important points get missed or dropped.
Less likely to introduce bugs. Since you’re making fewer changes, it’s easier for you and your reviewer to reason effectively about the impact of the CL and see if a bug has been introduced.
Less wasted work if they are rejected. If you write a huge CL and then your reviewer says that the overall direction is wrong, you’ve wasted a lot of work.
Easier to merge. Working on a large CL takes a long time, so you will have lots of conflicts when you merge, and you will have to merge frequently.
Easier to design well. It’s a lot easier to polish the design and code health of a small change than it is to refine all the details of a large change.
Less blocking on reviews. Sending self-contained portions of your overall change allows you to continue coding while you wait for your current CL in review.
Simpler to roll back. A large CL will more likely touch files that get updated between the initial CL submission and a rollback CL, complicating the rollback (the intermediate CLs will probably need to be rolled back too).

Note that reviewers have discretion to reject your change outright for the sole reason of it being too large. Usually they will thank you for your contribution but request that you somehow make it into a series of smaller changes. It can be a lot of work to split up a change after you’ve alrey written it, or require lots of time arguing about why the reviewer should accept your large change. It’s easier to just write small CLs in the first place.

Breakdown large work‑items#

For big features submit an initial PR, for example with:

Pydantic schemas
Class definitions with docstrings
Method signatures (no implementation)

This lets reviewers understand and discuss the design before you write hundreds of lines.

A concrete workflow example is the stacking workflow, made convenient with git town. This is optional/endorsed.

Refactoring#

Prepares a code area, preserves original behavior, increases readability, compatibility or resource usage, etc. These all fall under refactoring and should be submitted as a separate PR see reference guidelines. It is mandatory for refactorings to pass tests.

Review process#

Submittor checks relevant tests are implemented and passing for merging into main to keep it stable, feature branches can be in a temporarily broken state. (this will be made easier for long-running/exhaustive testing when test pipelines are in place)
Code-area owners should make sure their area is developing in ways they envision and communicate that vision with the team effectively via documentation (locally - readme files, globally - /docs). Throughout the PR process they supervise these guidelines are adhered to.
Experts (additional experts whose review has been requested), pay close attention to the contents of the feature, e.g. is there a difference in between the agreed spec and the implementation etc.
Everyone observes the submitted changes exhibit good coding standards and follow guidelines where applicable.
Merge requirements: Approval from code area owner + passing tests required before merge to main.
Review timeline: First review within 1-2 days of review request, aim to merge within a week. If blocked longer, discuss in communication channels to identify blockers.

Tests#

See tests/readme.md for structure, conventions, and testing guidelines.

Untested code will break. Add tests for the things you expect to keep working after someone else's PR is merged
Prefer supporting your development with repeatable tests instead of manually doing regression testing after each change
- If you develop a complicated piece of code, it's worth writing some unit tests to make sure it does what you expect it does
- Tip: The time cost to ensure an area is working or debugging grows quadratically with number of steps and complexity, asymptotically follow the area of a triangle - imagine #steps on x and complexity y as a piece-wise linear function. This can be made linear with intermediate tests as checkpoints to fall back upon as the cost of the previous steps becomes nearly zero (given fast tests).
- Tip: If you use the REPL paradigm for development and intermediate testing, you might as well do that work in a Notebook and "record" your steps. These can be a) repeated automatically b) transformed into test cases.
Add integration tests where sensible, ensuring that the components you are developing work with the main interfaces that others use, e.g. entrypoints: sft, rl, eval

Documentation#

Documentation guidelines#

Everyone: Document APIs through docstrings in Google style.
Seeding new documentation: Imagine you are introducing an area to a professional but a newcomer - what would be the most valuable thing to get them up to speed?
Submitters: Evaluate if it makes sense to seed new documentation. If your changes touch an area with existing documentation, make sure to update it if the semantics or utility have changed.
Reviewers: Double check changes are reflected in relevant documentation.

Documentation location#

Project-wide: docs/**/*.md
Code-area specific: readme.md files in relevant directories, use your best judgement, avoid scattering documentation throughout codebase and link with /docs.
- Example is tests/readme.md having a symbolic link in docs/collected/tests_readme.md
API documentation: Docstrings in code

Code Style & Formatting#

See docs/getting-started.md for how to run:

pre-commit hooks
ruff linting rules

Generative tools#

All generated contributions must be supervised and critically reviewed, modified, shaped and otherwise cared for as if you wrote them yourself.
This is to ensure code quality and suitability in the context of the rest of the codebase. It also serves as a healthy bottleneck, i.e. you can only review and author so much code before losing track.
All other guidelines apply just the same.

Commit Guidelines#

We do not have any, we are currently squashing any commits on the feature branches, follow your working preference, just make sure to use descriptive short form -m if you need to collaborate and if you need more space a second -m for additional commentary.

Code area specifics#

Guidelines for specific code areas.

Additional areas will be documented as patterns emerge. For now, Hydra configs are the primary documented area.

Hydra configs#

Our entrypoints are sft, rl, and eval with corresponding files and configs:

`src/agoge/entrypoints/`	`configs/`
`sft.py`	`sft.yaml`
`rl.py`	`rl.yaml`
`eval.py`	`eval.yaml`

When adding new configs:

Derive from an existing config using defaults:
Override only what changes
Don't duplicate config values

Example from configs/generate_tasks.yaml:

# @package _global_
defaults:
  - eval  # Inherit from eval.yaml
  - override environment: gui_env
  - _self_

# Only override what's different
task_generation_client:
  _target_: agoge.client.openai.OpenAIClient
  ...

Spend time getting this right. Hydra is for composable configs—use it properly.

External sources and licensing#

Licensing#

When required or just as a good practice, add licenses/attribution in /ATTRIBUTIONS.md

The package/project/repository name
URL source
In short, the scope of the utility within codebase
The original license, or link to a license file in /LICENSING/licensefile.txt

External sources#

Including external files/directories (ad verbum). We want to take a defensive strategy where these are placed on the leafs of our source tree. They should be encapsulated in an explicitly "external" module. If we require modifications to their import structure, monkey patching or duck typing – these should be controlled via __include__.py of that module.

Information required for attribution#

Source URL, e.g. GitHub permalink
Author attribution, e.g. Name/Company
Traceable timestamp, e.g. datetime/commit hash
- hash is preferred, it can just be first 6 letters, of the latest commit that includes changes to the file/directory used

Example#

There is an evaluator for the AIME benchmark that includes a utils.py file hosted at EleutherAI/lm-evaluation-harness.

We do not wish to maintain the evaluator ourselves as that may require refactoring, which in turn requires comprehensive tests to ensure complete replication of original behavior (including bugs) to match numerically. Restarting this work over when the original gets updated. Instead, we wish to rely on the original behavior and have a way to exactly trace any numerical results to the original code, even if used through our codebase in an external fashion.

Header in a file#

# source: https://github.com/EleutherAI/lm-evaluation-harness/blob/c0fc717240032aec738c3199ae344887b5f34c23/lm_eval/tasks/aime/utils.py
# author: jannalulu (Janna)
# hash: 5ac7cd

import re
from typing import Dict, List

def process_results(doc: dict, results: List[str]) -> Dict[str, int]:
    retval = 0
    response = results[0]
...

here the hash is the latest change to that file

Header in a directory#

If including a whole directory, either do the previous for all files or just include a new file attributions.md in that directory:

- source: https://github.com/EleutherAI/lm-evaluation-harness/blob/c0fc717240032aec738c3199ae344887b5f34c23/lm_eval/tasks/aime/utils.py
- author: jannalulu (Janna)
- hash: c0fc71

here the hash is the latest commit containing the version of the directory used

Communication & Support#

GitHub issues/PRs preferred where suitable to contain specific development history and discussion
Slack: #agoge for anything, e.g. team-wide discussions, check-ins, etc.

Reporting Issues#

When filing issues:

Bug reports: Include steps to reproduce, expected vs actual behavior, environment details
Feature requests: Explain the problem you're solving and proposed solution
Questions: Check existing issues/docs first, then ask in Slack or file an issue