The FutureOutput object was designed to somewhat mimic Celery's AsyncResult and a bit of python's Future object, providing an object-oriented handle to a future computational result. However, I'm noticing that mixing the transport layer into the FutureOutput creates complications:
- When saving
FutureOutput to disk I must serialize and then rehydrate a client object.
- Transport logic is split between the client and the
FutureObject rather than centralized on the CCClient.
- One often wants a simple API: pass a
task_id to a client.get() or client.cancel() or client.status() method. While originally one had to create a whole FutureOutput object to do this (just like in Celery), I started supporting these more basic methods with client.fetch_output. Having a functional API (where one passes task_ids or their associated objects to a client.method(...) function) is easier to reason about for end users and separates actions (on the client) from data (stored in a FutureOutput or TaskRef object). This makes it cleaner to save tasks to disk and then collect them later with a client--there's always a separation between the data (task_ids) and what we do with it (methods on the client).
The reason we likely want a FutureOutput or TaskRef object rather than just returning strings as task_ids is for one core feature: handling 500 errors (or other 400 errors) on the server that prevent the user from a collecting a result. I think it's important that the client still construct a ProgramOutput object for the end user consisting of the ProgramInput and details about the failure. This is a MASSIVE simplification for end users rather than them needing to handing their own bookkeeping that connects ProgramInput objects with their task_id and then can reconstruct from HTTP failure cases which input caused the problem. Additionally, if we decide to maintain any of the object-oriented API for FutureResult objects, they can be added (or retained) as simple wrappers around client.method() calls that themselves don't contain any real logic and that can be similarly accessed from the client.method(task_id) API as well.
I'm also noticing that concurrent.futures.as_completed(fs: Future, ...) actually takes a list of Future objects. So this more closely resembles what I'm proposing with client.as_completed(task_ids).
I think this design may
- simplify the mixed logic of the library
- give a simpler interface that users can more easily understand
- and make cross-process results collection feel more natural.
Also, instead of client.compute(..., return_future=True refactor to client.submit().
submit() → returns a passive handle (TaskRef)
compute() → blocks, returns ProgramOutput | list[...]
client.get()/as_completed() → take handle or IDs (i.e., `TaskRef` or `str` or `list[str, TaskRef]`)
Possibly:
from pydantic import BaseModel
from typing import Optional, List
from qcio import Inputs
class TaskBatch(BaseModel):
task_ids: List[str]
inputs: List[Inputs]
program: str
submitted_at: Optional[str] = None # ISO datetime if you want
def save(self, path): ...
@classmethod
def open(cls, path): ...
See conversation on ChatGPT.
The
FutureOutputobject was designed to somewhat mimic Celery'sAsyncResultand a bit of python'sFutureobject, providing an object-oriented handle to a future computational result. However, I'm noticing that mixing the transport layer into theFutureOutputcreates complications:FutureOutputto disk I must serialize and then rehydrate a client object.FutureObjectrather than centralized on theCCClient.task_idto aclient.get()orclient.cancel()orclient.status()method. While originally one had to create a wholeFutureOutputobject to do this (just like in Celery), I started supporting these more basic methods withclient.fetch_output. Having a functional API (where one passestask_idsor their associated objects to aclient.method(...)function) is easier to reason about for end users and separates actions (on the client) from data (stored in aFutureOutputorTaskRefobject). This makes it cleaner to save tasks to disk and then collect them later with a client--there's always a separation between the data (task_ids) and what we do with it (methods on the client).The reason we likely want a
FutureOutputorTaskRefobject rather than just returning strings as task_ids is for one core feature: handling 500 errors (or other 400 errors) on the server that prevent the user from a collecting a result. I think it's important that the client still construct aProgramOutputobject for the end user consisting of theProgramInputand details about the failure. This is a MASSIVE simplification for end users rather than them needing to handing their own bookkeeping that connectsProgramInputobjects with theirtask_idand then can reconstruct fromHTTPfailure cases which input caused the problem. Additionally, if we decide to maintain any of the object-oriented API forFutureResultobjects, they can be added (or retained) as simple wrappers aroundclient.method()calls that themselves don't contain any real logic and that can be similarly accessed from theclient.method(task_id)API as well.I'm also noticing that
concurrent.futures.as_completed(fs: Future, ...)actually takes a list ofFutureobjects. So this more closely resembles what I'm proposing withclient.as_completed(task_ids).I think this design may
Also, instead of
client.compute(..., return_future=Truerefactor toclient.submit().Possibly:
See conversation on ChatGPT.