Skip to content
This repository was archived by the owner on May 13, 2026. It is now read-only.
This repository was archived by the owner on May 13, 2026. It is now read-only.

Create a Python package to store package data? #3

@namurphy

Description

@namurphy

One possibility to simplify access to data files would be to include them in a Python package that could be made available on PyPI and conda-forge. The package could include functionality to open files with pandas, xarray, or h5py, which could then be imported into PlasmaPy.

Instead of needing to download the data files separately, they could be acquired via pip install plasmapy-data, and then accessed by PlasmaPy. We could potentially have plasmapy-data be a dependency of PlasmaPy. We could perhaps even allow installation without plasmapy-data via pip install plasmapy[lite] if the size of the data increases to ≳ 10 MB.

So far, the sizes of data files in this repository are of a scope that is well within what can reasonable included in a Python package. PlasmaPy wheels are ∼9 MB and source distributions are ∼14 MB.

The main disadvantage of creating a package is that we would have an additional package to maintain, but there are tools like cruft that could simplify package maintenance. I don't expect the amount of maintenance for this package to be very large compared to the main PlasmaPy repo, though. We would want to make the release process simpler than for the main PlasmaPy repo (i.e., by avoiding changelogs).

We'd have to figure out what we'd want to do with data used in tests. If PlasmaPy moves to an src layout with a separate tests directory, then the test data could live in the tests directory.

An advantage of incorporating the data into a Python package is that it could be cached in GitHub Actions very straightforwardly.

I do not know if this is the best approach, so I'd also like to look into best practices and check with people in pyOpenSci about alternatives.

This will take a while quite a bit more discussion, so we should proceed with PlasmaPy/PlasmaPy#2570 (which we may need for especially large data sets).

@pheuer, @JaydenR2305 — I'm curious what your thoughts are on this!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions