Add Support for Virtualizarr files saved as Kerchunk parquet files

Virtualizarr/kerchunk parquet files save the bit range information about any number of file formats that is supported, and make the files look like zarr files.  To use that information you need to be able to:

1.  read and interpret the parquet file
2.  make the bit range requests.  

zarr-java already has what is needed to do step 2.  For step 1 you can use either the java version of duckDB or the direct java parquet  version. In fact,  I have test Java code that successfully uses either.  But:

1.  The code was generated by Claude.ai with a lot of back and forth,  I am not a java programmer so I have no idea of the quality of the code,  so no I am not going to make a pull request.  Nor are there appropriate tests etc etc.  Just that it is doable and in fact works.
2.  I would put the example code here except the folder is quite large,  over 100MB,  but if there is another way that I can get the files to someone would gladly do so.  I likely can not zip it and mail it,  because my mailer doesn't like zipped files with executable code in it. 

Anyway a sample session for some netcdf4 files where I used Python to create the Kerchunk parquet file:

```
java --enable-native-access=ALL-UNNAMED \
   -jar ~/kerchunk-zarr-reader/target/kerchunk-zarr-reader-1.0-SNAPSHOT.jar \
   /Users/rmendels/kerchunk-test/VHN2015056_2015056_chla.parquet chla
Detected VirtualiZarr 2.x partitioned format
Materializing virtual store → /var/folders/46/jyz1mm5x5bvbf59b5g8f7f580000gn/T/zarr_virtual_5248494852501128730
6 variable(s): [altitude, chla, latitude, longitude, time, time_bnds]
altitude              shape=[1]  chunks=[1]
  → 0 chunk(s)
chla                  shape=[1, 1, 11985, 9338]  chunks=[1, 1, 2997, 2335]
  → 16 chunk(s)
latitude              shape=[11985]  chunks=[11985]
  → 0 chunk(s)
longitude             shape=[9338]  chunks=[9338]
  → 0 chunk(s)
time                  shape=[1]  chunks=[1]
  → 0 chunk(s)
time_bnds             shape=[1, 2]  chunks=[1, 2]
  → 1 chunk(s)
Materialization complete.
Detected Zarr v2 store — using direct chunk reader
Variable  : chla
Shape     : [1, 1, 11985, 9338]
Dtype     : <f4
Chunks    : [1, 1, 2997, 2335]
Compressor: none
Filters   : zlib
DimSep    : "."

=== Data summary: chla ===
Shape : [1, 1, 11985, 9338]
Size  : 111915930 elements
First 8: -999.0000 -999.0000 -999.0000 -999.0000 -999.0000 -999.0000 -999.0000 -999.0000
Min=-999.0000  Max=392.4024  Mean=-868.0739  NaN=0
Temp store deleted.
(work) ➜  ~ java --enable-native-access=ALL-UNNAMED \
   -jar ~/kerchunk-zarr-reader/target/kerchunk-zarr-reader-1.0-SNAPSHOT.jar \
   /Users/rmendels/kerchunk-test/VHN2015056_2015056_chla.parquet chla \
   0,0,0,0  1,1,100,100
Detected VirtualiZarr 2.x partitioned format
Materializing virtual store → /var/folders/46/jyz1mm5x5bvbf59b5g8f7f580000gn/T/zarr_virtual_14639025837829272872
6 variable(s): [altitude, chla, latitude, longitude, time, time_bnds]
altitude              shape=[1]  chunks=[1]
  → 0 chunk(s)
chla                  shape=[1, 1, 11985, 9338]  chunks=[1, 1, 2997, 2335]
  → 16 chunk(s)
latitude              shape=[11985]  chunks=[11985]
  → 0 chunk(s)
longitude             shape=[9338]  chunks=[9338]
  → 0 chunk(s)
time                  shape=[1]  chunks=[1]
  → 0 chunk(s)
time_bnds             shape=[1, 2]  chunks=[1, 2]
  → 1 chunk(s)
Materialization complete.
Detected Zarr v2 store — using direct chunk reader
Variable  : chla
Shape     : [1, 1, 11985, 9338]
Dtype     : <f4
Chunks    : [1, 1, 2997, 2335]
Compressor: none
Filters   : zlib
DimSep    : "."

=== Data summary: chla ===
Shape : [1, 1, 100, 100]
Size  : 10000 elements
First 8: -999.0000 -999.0000 -999.0000 -999.0000 -999.0000 -999.0000 -999.0000 -999.0000
Min=-999.0000  Max=-999.0000  Mean=-999.0000  NaN=0

```

If there is anyway this capability can be added to zarr-java it would be really neat,  but perfectly understand time is limited and it is not something I can do myself

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Support for Virtualizarr files saved as Kerchunk parquet files #76

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Add Support for Virtualizarr files saved as Kerchunk parquet files #76

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions