Virtualizarr/kerchunk parquet files save the bit range information about any number of file formats that is supported, and make the files look like zarr files. To use that information you need to be able to:
zarr-java already has what is needed to do step 2. For step 1 you can use either the java version of duckDB or the direct java parquet version. In fact, I have test Java code that successfully uses either. But:
Anyway a sample session for some netcdf4 files where I used Python to create the Kerchunk parquet file:
java --enable-native-access=ALL-UNNAMED \
-jar ~/kerchunk-zarr-reader/target/kerchunk-zarr-reader-1.0-SNAPSHOT.jar \
/Users/rmendels/kerchunk-test/VHN2015056_2015056_chla.parquet chla
Detected VirtualiZarr 2.x partitioned format
Materializing virtual store → /var/folders/46/jyz1mm5x5bvbf59b5g8f7f580000gn/T/zarr_virtual_5248494852501128730
6 variable(s): [altitude, chla, latitude, longitude, time, time_bnds]
altitude shape=[1] chunks=[1]
→ 0 chunk(s)
chla shape=[1, 1, 11985, 9338] chunks=[1, 1, 2997, 2335]
→ 16 chunk(s)
latitude shape=[11985] chunks=[11985]
→ 0 chunk(s)
longitude shape=[9338] chunks=[9338]
→ 0 chunk(s)
time shape=[1] chunks=[1]
→ 0 chunk(s)
time_bnds shape=[1, 2] chunks=[1, 2]
→ 1 chunk(s)
Materialization complete.
Detected Zarr v2 store — using direct chunk reader
Variable : chla
Shape : [1, 1, 11985, 9338]
Dtype : <f4
Chunks : [1, 1, 2997, 2335]
Compressor: none
Filters : zlib
DimSep : "."
=== Data summary: chla ===
Shape : [1, 1, 11985, 9338]
Size : 111915930 elements
First 8: -999.0000 -999.0000 -999.0000 -999.0000 -999.0000 -999.0000 -999.0000 -999.0000
Min=-999.0000 Max=392.4024 Mean=-868.0739 NaN=0
Temp store deleted.
(work) ➜ ~ java --enable-native-access=ALL-UNNAMED \
-jar ~/kerchunk-zarr-reader/target/kerchunk-zarr-reader-1.0-SNAPSHOT.jar \
/Users/rmendels/kerchunk-test/VHN2015056_2015056_chla.parquet chla \
0,0,0,0 1,1,100,100
Detected VirtualiZarr 2.x partitioned format
Materializing virtual store → /var/folders/46/jyz1mm5x5bvbf59b5g8f7f580000gn/T/zarr_virtual_14639025837829272872
6 variable(s): [altitude, chla, latitude, longitude, time, time_bnds]
altitude shape=[1] chunks=[1]
→ 0 chunk(s)
chla shape=[1, 1, 11985, 9338] chunks=[1, 1, 2997, 2335]
→ 16 chunk(s)
latitude shape=[11985] chunks=[11985]
→ 0 chunk(s)
longitude shape=[9338] chunks=[9338]
→ 0 chunk(s)
time shape=[1] chunks=[1]
→ 0 chunk(s)
time_bnds shape=[1, 2] chunks=[1, 2]
→ 1 chunk(s)
Materialization complete.
Detected Zarr v2 store — using direct chunk reader
Variable : chla
Shape : [1, 1, 11985, 9338]
Dtype : <f4
Chunks : [1, 1, 2997, 2335]
Compressor: none
Filters : zlib
DimSep : "."
=== Data summary: chla ===
Shape : [1, 1, 100, 100]
Size : 10000 elements
First 8: -999.0000 -999.0000 -999.0000 -999.0000 -999.0000 -999.0000 -999.0000 -999.0000
Min=-999.0000 Max=-999.0000 Mean=-999.0000 NaN=0
If there is anyway this capability can be added to zarr-java it would be really neat, but perfectly understand time is limited and it is not something I can do myself
Virtualizarr/kerchunk parquet files save the bit range information about any number of file formats that is supported, and make the files look like zarr files. To use that information you need to be able to:
zarr-java already has what is needed to do step 2. For step 1 you can use either the java version of duckDB or the direct java parquet version. In fact, I have test Java code that successfully uses either. But:
Anyway a sample session for some netcdf4 files where I used Python to create the Kerchunk parquet file:
If there is anyway this capability can be added to zarr-java it would be really neat, but perfectly understand time is limited and it is not something I can do myself