Lazy decompression via page fault handling is, I think, a fairly novel feature of libasdf. But in fact there's no reason the mechanism behind it must be limited to decompression, although it's a major use case.
Consider the asdf_ndarray_data function. It returns a pointer to the "raw" decompressed block data (with possible on-the-fly decompression when users access the data). But if that data is, e.g., not in a native endian format the user reading it has to manually handle byte order conversions. Likewise they need to convert if they want the data in a different type (e.g. doubles from data stored in the array as ints).
The function asdf_ndarray_readall (which itself is a wrapper around asdf_ndarray_read_tile_ndim) both handle datatype conversions, but the conversion is performed in memory over the full array, requiring the full array to be resident in memory.
The exact same mechanism behind lazy decompression could just as easily perform these types of conversions as well. User is simply returned a pointer to some memory, and if they access it randomly, page faults are intercepted, and as the data is paged in from the file it can also have any data conversions performed on it.
Not super high priority, though there may be some operations that could be made more efficient by this. If the user is, however, already using asdf_ndarray_read_tile_ndim to take smallish cutouts the benefit is less immediate.
Lazy decompression via page fault handling is, I think, a fairly novel feature of libasdf. But in fact there's no reason the mechanism behind it must be limited to decompression, although it's a major use case.
Consider the
asdf_ndarray_datafunction. It returns a pointer to the "raw" decompressed block data (with possible on-the-fly decompression when users access the data). But if that data is, e.g., not in a native endian format the user reading it has to manually handle byte order conversions. Likewise they need to convert if they want the data in a different type (e.g. doubles from data stored in the array as ints).The function
asdf_ndarray_readall(which itself is a wrapper aroundasdf_ndarray_read_tile_ndim) both handle datatype conversions, but the conversion is performed in memory over the full array, requiring the full array to be resident in memory.The exact same mechanism behind lazy decompression could just as easily perform these types of conversions as well. User is simply returned a pointer to some memory, and if they access it randomly, page faults are intercepted, and as the data is paged in from the file it can also have any data conversions performed on it.
Not super high priority, though there may be some operations that could be made more efficient by this. If the user is, however, already using
asdf_ndarray_read_tile_ndimto take smallish cutouts the benefit is less immediate.