API Reference

`intake_xarray.netcdf.NetCDFSource`(*args, ...)	Open a xarray file.
`intake_xarray.opendap.OpenDapSource`(*args, ...)	Open a OPeNDAP source.
`intake_xarray.xzarr.ZarrSource`(args, *kwargs)	Open a xarray dataset.
`intake_xarray.raster.RasterIOSource`(*args, ...)	Open a xarray dataset via RasterIO.
`intake_xarray.image.ImageSource`(args, *kwargs)	Open a xarray dataset from image files.

class intake_xarray.netcdf.NetCDFSource(*args, **kwargs)[source]

Open a xarray file.

Parameters

urlpathstr, List[str]

Path to source file. May include glob “*” characters, format pattern strings, or list. Some examples:

{{ CATALOG_DIR }}/data/air.nc

{{ CATALOG_DIR }}/data/*.nc

{{ CATALOG_DIR }}/data/air_{year}.nc

chunksint or dict, optional

Chunks is used to load the new dataset into dask arrays. chunks={} loads the dataset with dask using a single chunk for all arrays.

combine({‘by_coords’, ‘nested’}, optional)

Which function is used to concatenate all the files when urlpath has a wildcard. It is recommended to set this argument in all your catalogs because the default has changed and is going to change. It was “nested”, and is now the default of xarray.open_mfdataset which is “auto_combine”, and is planed to change from “auto” to “by_corrds” in a near future.

concat_dimstr, optional

Name of dimension along which to concatenate the files. Can be new or pre-existing if combine is “nested”. Must be None or new if combine is “by_coords”.

path_as_patternbool or str, optional

Whether to treat the path as a pattern (ie. data_{field}.nc) and create new coodinates in the output corresponding to pattern fields. If str, is treated as pattern to match on. Default is True.

xarray_kwargs: dict

Additional xarray kwargs for xr.open_dataset() or xr.open_mfdataset().

storage_options: dict

If using a remote fs (whether caching locally or not), these are the kwargs to pass to that FS.

Attributes

cache
cache_dirs
cat
classname
description
dtype
entry
gui: Source GUI, with parameter selection and plotting
has_been_persisted
hvplot: Returns a hvPlot object to provide a high-level plotting API.
is_persisted
path_as_pattern
pattern
plot: Returns a hvPlot object to provide a high-level plotting API.
plots: List custom associated quick-plots
shape
urlpath

Methods

`__call__`(**kwargs)	Create a new instance of this source with altered arguments
`close`()	Delete open file from memory
`configure_new`(**kwargs)	Create a new instance of this source with altered arguments
`describe`()	Description from the entry spec
`discover`()	Open resource and populate the source attributes.
`export`(path, **kwargs)	Save this data for sharing with other people
`get`(**kwargs)	Create a new instance of this source with altered arguments
`persist`([ttl])	Save data from this source to local persistent storage
`read`()	Return a version of the xarray with all the data in memory
`read_chunked`()	Return xarray object (which will have chunks)
`read_partition`(i)	Fetch one chunk of data at tuple index i
`to_dask`()	Return xarray object where variables are dask arrays
`to_spark`()	Provide an equivalent data object in Apache Spark
`yaml`()	Return YAML representation of this data-source

get_persisted
set_cache_dir

class intake_xarray.opendap.OpenDapSource(*args, **kwargs)[source]

Open a OPeNDAP source.

Parameters

urlpath: str: Path to source file.
chunks: None, int or dict: Chunks is used to load the new dataset into dask arrays. chunks={} loads the dataset with dask using a single chunk for all arrays.
auth: None, “esgf” or “urs”: Method of authenticating to the OPeNDAP server. Choose from one of the following: None - [Default] Anonymous access. ‘esgf’ - Earth System Grid Federation. ‘urs’ - NASA Earthdata Login, also known as URS. ‘generic_http’ - OPeNDAP servers which support plain HTTP authentication None - No authentication. Note that you will need to set your username and password respectively using the environment variables DAP_USER and DAP_PASSWORD.
engine: str: Engine used for reading OPeNDAP URL. Should be one of ‘pydap’ or ‘netcdf4’.

Attributes

cache
cache_dirs
cat
classname
description
dtype
entry
gui: Source GUI, with parameter selection and plotting
has_been_persisted
hvplot: Returns a hvPlot object to provide a high-level plotting API.
is_persisted
plot: Returns a hvPlot object to provide a high-level plotting API.
plots: List custom associated quick-plots
shape

Methods

`__call__`(**kwargs)	Create a new instance of this source with altered arguments
`close`()	Delete open file from memory
`configure_new`(**kwargs)	Create a new instance of this source with altered arguments
`describe`()	Description from the entry spec
`discover`()	Open resource and populate the source attributes.
`export`(path, **kwargs)	Save this data for sharing with other people
`get`(**kwargs)	Create a new instance of this source with altered arguments
`persist`([ttl])	Save data from this source to local persistent storage
`read`()	Return a version of the xarray with all the data in memory
`read_chunked`()	Return xarray object (which will have chunks)
`read_partition`(i)	Fetch one chunk of data at tuple index i
`to_dask`()	Return xarray object where variables are dask arrays
`to_spark`()	Provide an equivalent data object in Apache Spark
`yaml`()	Return YAML representation of this data-source

get_persisted
set_cache_dir

class intake_xarray.xzarr.ZarrSource(*args, **kwargs)[source]

Open a xarray dataset.

Parameters

urlpath: str: Path to source. This can be a local directory or a remote data service (i.e., with a protocol specifier like 's3://).
storage_options: dict: Parameters passed to the backend file-system
kwargs:: Further parameters are passed to xr.open_zarr

Attributes

cache
cache_dirs
cat
classname
description
dtype
entry
gui: Source GUI, with parameter selection and plotting
has_been_persisted
hvplot: Returns a hvPlot object to provide a high-level plotting API.
is_persisted
plot: Returns a hvPlot object to provide a high-level plotting API.
plots: List custom associated quick-plots
shape

Methods

`__call__`(**kwargs)	Create a new instance of this source with altered arguments
`close`()	Delete open file from memory
`configure_new`(**kwargs)	Create a new instance of this source with altered arguments
`describe`()	Description from the entry spec
`discover`()	Open resource and populate the source attributes.
`export`(path, **kwargs)	Save this data for sharing with other people
`get`(**kwargs)	Create a new instance of this source with altered arguments
`persist`([ttl])	Save data from this source to local persistent storage
`read`()	Return a version of the xarray with all the data in memory
`read_chunked`()	Return xarray object (which will have chunks)
`read_partition`(i)	Fetch one chunk of data at tuple index i
`to_dask`()	Return xarray object where variables are dask arrays
`to_spark`()	Provide an equivalent data object in Apache Spark
`yaml`()	Return YAML representation of this data-source

get_persisted
set_cache_dir

close()[source]: Delete open file from memory

class intake_xarray.raster.RasterIOSource(*args, **kwargs)[source]

Open a xarray dataset via RasterIO.

This creates an xarray.array, not a dataset (i.e., there is exactly one variable).

See https://rasterio.readthedocs.io/en/latest/ for the file formats supported, particularly GeoTIFF, and http://xarray.pydata.org/en/stable/generated/xarray.open_rasterio.html#xarray.open_rasterio for possible extra arguments

Parameters

urlpath: str or iterable, location of data

May be a local path, or remote path if including a protocol specifier such as 's3://'. May include glob wildcards or format pattern strings. Must be a format supported by rasterIO (normally GeoTiff). Some examples:

{{ CATALOG_DIR }}data/RGB.tif

s3://data/*.tif

s3://data/landsat8_band{band}.tif

s3://data/{location}/landsat8_band{band}.tif

{{ CATALOG_DIR }}data/landsat8_{start_date:%Y%m%d}_band{band}.tif

chunks: None or int or dict, optional

Chunks is used to load the new dataset into dask arrays. chunks={} loads the dataset with dask using a single chunk for all arrays. default None loads numpy arrays.

path_as_pattern: bool or str, optional

Whether to treat the path as a pattern (ie. data_{field}.tif) and create new coodinates in the output corresponding to pattern fields. If str, is treated as pattern to match on. Default is True.

Attributes

cache
cache_dirs
cat
classname
description
dtype
entry
gui: Source GUI, with parameter selection and plotting
has_been_persisted
hvplot: Returns a hvPlot object to provide a high-level plotting API.
is_persisted
path_as_pattern
pattern
plot: Returns a hvPlot object to provide a high-level plotting API.
plots: List custom associated quick-plots
shape
urlpath

Methods

`__call__`(**kwargs)	Create a new instance of this source with altered arguments
`close`()	Delete open file from memory
`configure_new`(**kwargs)	Create a new instance of this source with altered arguments
`describe`()	Description from the entry spec
`discover`()	Open resource and populate the source attributes.
`export`(path, **kwargs)	Save this data for sharing with other people
`get`(**kwargs)	Create a new instance of this source with altered arguments
`persist`([ttl])	Save data from this source to local persistent storage
`read`()	Return a version of the xarray with all the data in memory
`read_chunked`()	Return xarray object (which will have chunks)
`read_partition`(i)	Fetch one chunk of data at tuple index i
`to_dask`()	Return xarray object where variables are dask arrays
`to_spark`()	Provide an equivalent data object in Apache Spark
`yaml`()	Return YAML representation of this data-source

get_persisted
set_cache_dir

class intake_xarray.image.ImageSource(*args, **kwargs)[source]

Open a xarray dataset from image files.

This creates an xarray.DataArray or an xarray.Dataset. See http://scikit-image.org/docs/dev/api/skimage.io.html#skimage.io.imread for the file formats supported.

NOTE: Although skimage.io.imread is used by default, any reader function which accepts a file object and outputs a numpy array can be used instead.

Parameters

urlpathstr or iterable, location of data

May be a local path, or remote path if including a protocol specifier such as 's3://'. May include glob wildcards or format pattern strings. Must be a format supported by skimage.io.imread or user-supplied imread. Some examples:

{{ CATALOG_DIR }}/data/RGB.tif

s3://data/*.jpeg

https://example.com/image.png

s3://data/Images/{{ landuse }}/{{ '%02d' % id }}.tif

chunksint or dict

Chunks is used to load the new dataset into dask arrays. chunks={} loads the dataset with dask using a single chunk for all arrays.

path_as_patternbool or str, optional

Whether to treat the path as a pattern (ie. data_{field}.tif) and create new coodinates in the output corresponding to pattern fields. If str, is treated as pattern to match on. Default is True.

concat_dimstr or iterable

Dimension over which to concatenate. If iterable, all fields must be part of the the pattern.

imreadfunction (optional)

Optionally provide custom imread function. Function should expect a file object and produce a numpy array. Defaults to skimage.io.imread.

preprocessfunction (optional)

Optionally provide custom function to preprocess the image. Function should expect a numpy array for a single image and return a numpy array.

coerce_shapeiterable of len 2 (optional)

Optionally coerce the shape of the height and width of the image by setting coerce_shape to desired shape.

Attributes

cache
cache_dirs
cat
classname
description
dtype
entry
gui: Source GUI, with parameter selection and plotting
has_been_persisted
hvplot: Returns a hvPlot object to provide a high-level plotting API.
is_persisted
path_as_pattern
pattern
plot: Returns a hvPlot object to provide a high-level plotting API.
plots: List custom associated quick-plots
shape
urlpath

Methods

`__call__`(**kwargs)	Create a new instance of this source with altered arguments
`close`()	Delete open file from memory
`configure_new`(**kwargs)	Create a new instance of this source with altered arguments
`describe`()	Description from the entry spec
`discover`()	Open resource and populate the source attributes.
`export`(path, **kwargs)	Save this data for sharing with other people
`get`(**kwargs)	Create a new instance of this source with altered arguments
`persist`([ttl])	Save data from this source to local persistent storage
`read`()	Return a version of the xarray with all the data in memory
`read_chunked`()	Return xarray object (which will have chunks)
`read_partition`(i)	Fetch one chunk of data at tuple index i
`to_dask`()	Return xarray object where variables are dask arrays
`to_spark`()	Provide an equivalent data object in Apache Spark
`yaml`()	Return YAML representation of this data-source

get_persisted
set_cache_dir