Skip to content

Improve blosc efficiency#5

Open
eschnett wants to merge 1 commit into
asdf-format:mainfrom
eschnett:eschnett/blosc-update
Open

Improve blosc efficiency#5
eschnett wants to merge 1 commit into
asdf-format:mainfrom
eschnett:eschnett/blosc-update

Conversation

@eschnett

Copy link
Copy Markdown

This is a continuation of braingram#1. This PR discusses two improvements to the current interface to blosc:

  1. The compressor does not pass the data type size. Knowing the data type size allows the shuffle filter to reorder the data, exposing more regularity, which allows the compression algorithm to compress better.
  2. The decompressor has the ability to write into a preallocated buffer instead of allocating its own output buffer. This saves memory bandwidth and would improve the decompression speed slightly.

I experimented with creating a large 3d float64 array (1000 x 1000 x 250 elements) and compressing it with the shuffle filter, using as type sizes either 8 (describing the data) or 1:

  for (int64_t i = 0; i < ni; ++i)
    for (int64_t j = 0; j < nj; ++j)
      for (int64_t k = 0; k < nk; ++k) {
        int64_t idx = getidx(i, j, k);
        rho.at(idx) = 1.0 / (1.1 * i + 1.2 * j + 1.3 * k + 1);
      }

. The resulting file sizes are:

  -rw-r--r--   1 eschnett staff 1993847361 Nov 16 11:31 large-new-shuffle-typesize-1.asdf
  -rw-r--r--   1 eschnett staff  395927299 Nov 16 11:29 large-new-shuffle-typesize-8.asdf

In this case the efficiency drops by a factor of 5 when using the wrong type size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant