Create a Python package to store package data?

One possibility to simplify access to data files would be to include them in a Python package that could be made available on PyPI and conda-forge.  The package could include functionality to open files with `pandas`, `xarray`, or `h5py`, which could then be imported into PlasmaPy.

Instead of needing to download the data files separately, they could be acquired via `pip install plasmapy-data`, and then accessed by PlasmaPy.  We could potentially have `plasmapy-data` be a dependency of PlasmaPy. We could perhaps even allow installation without `plasmapy-data` via `pip install plasmapy[lite]` if the size of the data increases to ≳ 10 MB. 

So far, the sizes of data files in this repository are of a scope that is well within what can reasonable included in a Python package. PlasmaPy wheels are ∼9 MB and source distributions are ∼14 MB.

The main disadvantage of creating a package is that we would have an additional package to maintain, but there are tools like `cruft` that could simplify package maintenance. I don't expect the amount of maintenance for this package to be very large compared to the main PlasmaPy repo, though.  We would want to make the release process simpler than for the main PlasmaPy repo (i.e., by avoiding changelogs).  

We'd have to figure out what we'd want to do with data used in tests. If PlasmaPy moves to an `src` layout with a separate `tests` directory, then the test data could live in the `tests` directory.

An advantage of incorporating the data into a Python package is that it could be cached in GitHub Actions very straightforwardly.

I do not know if this is the best approach, so I'd also like to look into best practices and check with people in pyOpenSci about alternatives.

This will take a while quite a bit more discussion, so we should proceed with https://github.com/PlasmaPy/PlasmaPy/pull/2570 (which we may need for especially large data sets).

@pheuer, @JaydenR2305 — I'm curious what your thoughts are on this!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a Python package to store package data? #3

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Create a Python package to store package data? #3

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions