Debugging with Ray#

This is the best way I've been able to get ray debug working. I am using the legacy system here so maybe there's a better way?

For detailed information about Ray cluster setup and management, see Ray Cluster Setup.

Start by setting these .env values:

RAY_DEBUG=1
RAY_PDB_FORCE_TELNET=1
RAY_DEBUG=legacy

Then add a breakpoint to your code.

@ray.remote
class SimpleAgent(Agent):
    """
    Agent that just takes an observation and returns an action.
    """

    @ray_remote_logger
    async def act(self, observation: Chat, available_tool_schemas: list[dict]) -> tuple[list[Chat], ChatMessage]:
        self.history += observation
        breakpoint()
        response = await self.generate_output(available_tool_schemas=available_tool_schemas)
        return ([self.history], response)

Then run the code, it will stop at your breakpoint with a message like:

(SimpleAgent pid=356911) RemotePdb session open at localhost:40849, use 'ray debug' to connect...
(SimpleAgent pid=356911) RemotePdb accepted connection from ('127.0.0.1', 51532)

To connect to this you'll need to SSH onto the node running the ray worker associated with the breakpoint. To find the node used we can open the Ray Cluster website.

When you launched the ray server you got a log with the ray dashboard URL to submit to:

# Using the environment variable approach
export RAY_API_SERVER_ADDRESS='http://100.84.54.3:8265'
ray job submit --working-dir . -- python my_script.py

# OR directly specifying the address
ray job submit --address 'http://100.84.54.3:8265' --working-dir . -- python my_script.py

If your on Tailscale you can visit this address in your browser to view a cluster overview.

Then go to Actors -> Actor holding your breakpoint -> Click on the Note ID -> Node page showing the Hostname of the node.

Once you've found the hostname simply ssh into the hostname and then run:

telnet localhost 40849