Welcome to intake_xarray’s documentation!
This package enables the set of data-loading methods from Xarray to be used within the Intake data access and cataloging system.
Quickstart
intake-xarray
provides quick and easy access to n dimensional data
suitable for reading by xarray.
Installation
To use this plugin for intake, install with the following command:
conda install -c conda-forge intake-xarray
Usage
Inline use
After installation, the functions intake.open_netcdf
,
intake.open_rasterio
, intake.open_zarr
,
intake.open_xarray_image
, and intake.open_opendap
will become available.
They can be used to open data files as xarray objects.
Creating Catalog Entries
Catalog entries must specify driver: netcdf
, driver: rasterio
,
driver: zarr
, driver: xarray_image
, or driver: opendap
as appropriate.
The zarr and image plugins allow access to remote data stores (s3 and gcs),
settings relevant to those should be passed in using the parameter
storage_options
.
Choosing a Driver
While all the drivers in the intake-xarray
plugin yield xarray
objects, they do not all accept the same file formats.
netcdf/grib/tif
Supports any local or downloadable file that can be passed to xarray.open_mfdataset. Works for:
opendap
Supports OPeNDAP URLs, optionally with esgf
, urs
or generic_http
authentication.
zarr
Supports .zarr
directories. See https://zarr.readthedocs.io/ for more
information.
rasterio
Supports any file format supported by rasterio.open
- most commonly
geotiffs.
Note: Consider installing rioxarray
and using the netcdf
driver with engine="rasterio"
.
xarray_image
Supports any file format that can be passed to scikit-image.io.imread
which includes all the common image formats (jpg
, png
, tif
, …)
Caching
Remote files can be cached locally by `fsspec<https://filesystem-spec.readthedocs.io/en/latest/features.html#url-chaining>`_.
Note that opendap
does not support caching as the URL does not back a downloadable file.
API Reference
|
Open a xarray file. |
|
Open a OPeNDAP source. |
|
Open a xarray dataset. |
|
Open a xarray dataset via RasterIO. |
|
Open a xarray dataset from image files. |
- class intake_xarray.netcdf.NetCDFSource(*args, **kwargs)[source]
Open a xarray file.
- Parameters
- urlpathstr, List[str]
Path to source file. May include glob “*” characters, format pattern strings, or list. Some examples:
{{ CATALOG_DIR }}/data/air.nc
{{ CATALOG_DIR }}/data/*.nc
{{ CATALOG_DIR }}/data/air_{year}.nc
- chunksint or dict, optional
Chunks is used to load the new dataset into dask arrays.
chunks={}
loads the dataset with dask using a single chunk for all arrays.- combine({‘by_coords’, ‘nested’}, optional)
Which function is used to concatenate all the files when urlpath has a wildcard. It is recommended to set this argument in all your catalogs because the default has changed and is going to change. It was “nested”, and is now the default of xarray.open_mfdataset which is “auto_combine”, and is planed to change from “auto” to “by_corrds” in a near future.
- concat_dimstr, optional
Name of dimension along which to concatenate the files. Can be new or pre-existing if combine is “nested”. Must be None or new if combine is “by_coords”.
- path_as_patternbool or str, optional
Whether to treat the path as a pattern (ie.
data_{field}.nc
) and create new coodinates in the output corresponding to pattern fields. If str, is treated as pattern to match on. Default is True.- xarray_kwargs: dict
Additional xarray kwargs for xr.open_dataset() or xr.open_mfdataset().
- storage_options: dict
If using a remote fs (whether caching locally or not), these are the kwargs to pass to that FS.
- Attributes
- cache
- cache_dirs
- cat
- classname
- description
- dtype
- entry
gui
Source GUI, with parameter selection and plotting
- has_been_persisted
hvplot
Returns a hvPlot object to provide a high-level plotting API.
- is_persisted
- path_as_pattern
- pattern
plot
Returns a hvPlot object to provide a high-level plotting API.
plots
List custom associated quick-plots
- shape
- urlpath
Methods
__call__
(**kwargs)Create a new instance of this source with altered arguments
close
()Delete open file from memory
configure_new
(**kwargs)Create a new instance of this source with altered arguments
describe
()Description from the entry spec
discover
()Open resource and populate the source attributes.
export
(path, **kwargs)Save this data for sharing with other people
get
(**kwargs)Create a new instance of this source with altered arguments
persist
([ttl])Save data from this source to local persistent storage
read
()Return a version of the xarray with all the data in memory
read_chunked
()Return xarray object (which will have chunks)
read_partition
(i)Fetch one chunk of data at tuple index i
to_dask
()Return xarray object where variables are dask arrays
to_spark
()Provide an equivalent data object in Apache Spark
yaml
()Return YAML representation of this data-source
get_persisted
set_cache_dir
- class intake_xarray.opendap.OpenDapSource(*args, **kwargs)[source]
Open a OPeNDAP source.
- Parameters
- urlpath: str
Path to source file.
- chunks: None, int or dict
Chunks is used to load the new dataset into dask arrays.
chunks={}
loads the dataset with dask using a single chunk for all arrays.- auth: None, “esgf” or “urs”
Method of authenticating to the OPeNDAP server. Choose from one of the following: None - [Default] Anonymous access. ‘esgf’ - Earth System Grid Federation. ‘urs’ - NASA Earthdata Login, also known as URS. ‘generic_http’ - OPeNDAP servers which support plain HTTP authentication None - No authentication. Note that you will need to set your username and password respectively using the environment variables DAP_USER and DAP_PASSWORD.
- engine: str
Engine used for reading OPeNDAP URL. Should be one of ‘pydap’ or ‘netcdf4’.
- Attributes
- cache
- cache_dirs
- cat
- classname
- description
- dtype
- entry
gui
Source GUI, with parameter selection and plotting
- has_been_persisted
hvplot
Returns a hvPlot object to provide a high-level plotting API.
- is_persisted
plot
Returns a hvPlot object to provide a high-level plotting API.
plots
List custom associated quick-plots
- shape
Methods
__call__
(**kwargs)Create a new instance of this source with altered arguments
close
()Delete open file from memory
configure_new
(**kwargs)Create a new instance of this source with altered arguments
describe
()Description from the entry spec
discover
()Open resource and populate the source attributes.
export
(path, **kwargs)Save this data for sharing with other people
get
(**kwargs)Create a new instance of this source with altered arguments
persist
([ttl])Save data from this source to local persistent storage
read
()Return a version of the xarray with all the data in memory
read_chunked
()Return xarray object (which will have chunks)
read_partition
(i)Fetch one chunk of data at tuple index i
to_dask
()Return xarray object where variables are dask arrays
to_spark
()Provide an equivalent data object in Apache Spark
yaml
()Return YAML representation of this data-source
get_persisted
set_cache_dir
- class intake_xarray.xzarr.ZarrSource(*args, **kwargs)[source]
Open a xarray dataset.
- Parameters
- urlpath: str
Path to source. This can be a local directory or a remote data service (i.e., with a protocol specifier like
's3://
).- storage_options: dict
Parameters passed to the backend file-system
- kwargs:
Further parameters are passed to xr.open_zarr
- Attributes
- cache
- cache_dirs
- cat
- classname
- description
- dtype
- entry
gui
Source GUI, with parameter selection and plotting
- has_been_persisted
hvplot
Returns a hvPlot object to provide a high-level plotting API.
- is_persisted
plot
Returns a hvPlot object to provide a high-level plotting API.
plots
List custom associated quick-plots
- shape
Methods
__call__
(**kwargs)Create a new instance of this source with altered arguments
close
()Delete open file from memory
configure_new
(**kwargs)Create a new instance of this source with altered arguments
describe
()Description from the entry spec
discover
()Open resource and populate the source attributes.
export
(path, **kwargs)Save this data for sharing with other people
get
(**kwargs)Create a new instance of this source with altered arguments
persist
([ttl])Save data from this source to local persistent storage
read
()Return a version of the xarray with all the data in memory
read_chunked
()Return xarray object (which will have chunks)
read_partition
(i)Fetch one chunk of data at tuple index i
to_dask
()Return xarray object where variables are dask arrays
to_spark
()Provide an equivalent data object in Apache Spark
yaml
()Return YAML representation of this data-source
get_persisted
set_cache_dir
- class intake_xarray.raster.RasterIOSource(*args, **kwargs)[source]
Open a xarray dataset via RasterIO.
This creates an xarray.array, not a dataset (i.e., there is exactly one variable).
See https://rasterio.readthedocs.io/en/latest/ for the file formats supported, particularly GeoTIFF, and http://xarray.pydata.org/en/stable/generated/xarray.open_rasterio.html#xarray.open_rasterio for possible extra arguments
- Parameters
- urlpath: str or iterable, location of data
May be a local path, or remote path if including a protocol specifier such as
's3://'
. May include glob wildcards or format pattern strings. Must be a format supported by rasterIO (normally GeoTiff). Some examples:{{ CATALOG_DIR }}data/RGB.tif
s3://data/*.tif
s3://data/landsat8_band{band}.tif
s3://data/{location}/landsat8_band{band}.tif
{{ CATALOG_DIR }}data/landsat8_{start_date:%Y%m%d}_band{band}.tif
- chunks: None or int or dict, optional
Chunks is used to load the new dataset into dask arrays.
chunks={}
loads the dataset with dask using a single chunk for all arrays. default None loads numpy arrays.- path_as_pattern: bool or str, optional
Whether to treat the path as a pattern (ie.
data_{field}.tif
) and create new coodinates in the output corresponding to pattern fields. If str, is treated as pattern to match on. Default is True.
- Attributes
- cache
- cache_dirs
- cat
- classname
- description
- dtype
- entry
gui
Source GUI, with parameter selection and plotting
- has_been_persisted
hvplot
Returns a hvPlot object to provide a high-level plotting API.
- is_persisted
- path_as_pattern
- pattern
plot
Returns a hvPlot object to provide a high-level plotting API.
plots
List custom associated quick-plots
- shape
- urlpath
Methods
__call__
(**kwargs)Create a new instance of this source with altered arguments
close
()Delete open file from memory
configure_new
(**kwargs)Create a new instance of this source with altered arguments
describe
()Description from the entry spec
discover
()Open resource and populate the source attributes.
export
(path, **kwargs)Save this data for sharing with other people
get
(**kwargs)Create a new instance of this source with altered arguments
persist
([ttl])Save data from this source to local persistent storage
read
()Return a version of the xarray with all the data in memory
read_chunked
()Return xarray object (which will have chunks)
read_partition
(i)Fetch one chunk of data at tuple index i
to_dask
()Return xarray object where variables are dask arrays
to_spark
()Provide an equivalent data object in Apache Spark
yaml
()Return YAML representation of this data-source
get_persisted
set_cache_dir
- class intake_xarray.image.ImageSource(*args, **kwargs)[source]
Open a xarray dataset from image files.
This creates an xarray.DataArray or an xarray.Dataset. See http://scikit-image.org/docs/dev/api/skimage.io.html#skimage.io.imread for the file formats supported.
NOTE: Although
skimage.io.imread
is used by default, any reader function which accepts a file object and outputs a numpy array can be used instead.- Parameters
- urlpathstr or iterable, location of data
May be a local path, or remote path if including a protocol specifier such as
's3://'
. May include glob wildcards or format pattern strings. Must be a format supported byskimage.io.imread
or user-suppliedimread
. Some examples:{{ CATALOG_DIR }}/data/RGB.tif
s3://data/*.jpeg
https://example.com/image.png
s3://data/Images/{{ landuse }}/{{ '%02d' % id }}.tif
- chunksint or dict
Chunks is used to load the new dataset into dask arrays.
chunks={}
loads the dataset with dask using a single chunk for all arrays.- path_as_patternbool or str, optional
Whether to treat the path as a pattern (ie.
data_{field}.tif
) and create new coodinates in the output corresponding to pattern fields. If str, is treated as pattern to match on. Default is True.- concat_dimstr or iterable
Dimension over which to concatenate. If iterable, all fields must be part of the the pattern.
- imreadfunction (optional)
Optionally provide custom imread function. Function should expect a file object and produce a numpy array. Defaults to
skimage.io.imread
.- preprocessfunction (optional)
Optionally provide custom function to preprocess the image. Function should expect a numpy array for a single image and return a numpy array.
- coerce_shapeiterable of len 2 (optional)
Optionally coerce the shape of the height and width of the image by setting coerce_shape to desired shape.
- Attributes
- cache
- cache_dirs
- cat
- classname
- description
- dtype
- entry
gui
Source GUI, with parameter selection and plotting
- has_been_persisted
hvplot
Returns a hvPlot object to provide a high-level plotting API.
- is_persisted
- path_as_pattern
- pattern
plot
Returns a hvPlot object to provide a high-level plotting API.
plots
List custom associated quick-plots
- shape
- urlpath
Methods
__call__
(**kwargs)Create a new instance of this source with altered arguments
close
()Delete open file from memory
configure_new
(**kwargs)Create a new instance of this source with altered arguments
describe
()Description from the entry spec
discover
()Open resource and populate the source attributes.
export
(path, **kwargs)Save this data for sharing with other people
get
(**kwargs)Create a new instance of this source with altered arguments
persist
([ttl])Save data from this source to local persistent storage
read
()Return a version of the xarray with all the data in memory
read_chunked
()Return xarray object (which will have chunks)
read_partition
(i)Fetch one chunk of data at tuple index i
to_dask
()Return xarray object where variables are dask arrays
to_spark
()Provide an equivalent data object in Apache Spark
yaml
()Return YAML representation of this data-source
get_persisted
set_cache_dir
Contributing to intake-xarray
Contributions are highly welcomed and appreciated. Every little help counts, so do not hesitate!
Contribution links
Feature requests and feedback
Do you like intake-xarray? Share some love on Twitter or in your blog posts!
We’d also like to hear about your propositions and suggestions. Feel free to submit them as issues and:
Explain in detail how they should work.
Keep the scope as narrow as possible. This will make it easier to implement.
Report bugs
Report bugs for intake-stac in the issue tracker.
If you are reporting a bug, please include:
Your operating system name and version.
Any details about your local setup that might be helpful in troubleshooting, specifically the Python interpreter version, installed libraries, and intake-stac version.
Detailed steps to reproduce the bug.
If you can write a demonstration test that currently fails but should pass (xfail), that is a very useful commit to make as well, even if you cannot fix the bug itself.
Fix bugs
Look through the GitHub issues for bugs.
Talk to developers to find out how you can fix specific bugs.
Write documentation
intake-xarray could always use more documentation. What exactly is needed?
More complementary documentation. Have you perhaps found something unclear?
Docstrings. There can never be too many of them.
Blog posts, articles and such – they’re all very appreciated.
You can also edit documentation files directly in the GitHub web interface, without using a local copy. This can be convenient for small fixes.
Note
Build the documentation locally with the following command:
$ conda env create -f docs/environment.yml $ cd docs $ make html
The built documentation should be available in the
docs/_build/
.
Preparing Pull Requests
Fork the intake-xarray GitHub repository. It’s fine to use
intake-xarray
as your fork repository name because it will live under your user.Clone your fork locally using git and create a branch:
$ git clone git@github.com:YOUR_GITHUB_USERNAME/intake-xarray.git $ cd intake-xarray # now, to fix a bug or add feature create your own branch off "master": $ git checkout -b your-bugfix-feature-branch-name master
Install development version in a conda environment:
$ conda env create -f ci/environment-py39.yml $ conda activate test_env $ pip install . -e
Run all the tests
Now running tests is as simple as issuing this command:
$ pytest --verbose
This command will run tests via the “pytest” tool
Commit and push once your tests pass and you are happy with your change(s):
$ git commit -a -m "<commit message>" $ git push -u
Finally, submit a pull request through the GitHub website using this data:
head-fork: YOUR_GITHUB_USERNAME/intake-xarray compare: your-branch-name base-fork: intake/intake-xarray base: master
Release a new version
intake-xarray uses the pypipublish GitHub action to publish new versions on PYPI. Just create a new tag git tag 0.4.1, git push upstream –tags, then create a release by visiting https://github.com/intake/intake-xarray/releases/new. When the release is created the version will automatically be uploaded to https://pypi.org/project/intake-xarray/.