Combining data after pipeline computation¶
A common use case of TokSearch is to aggregate the data in a results object for visualization, training a machine learning model, or performing statistical analysis. This notebook shows a few examples of how one might go about doing that.
Creating a simple pipeline¶
We'll start be creating a very simple pipeline. This pipeline fetches flux-on-the-grid data from MDSplus.
import numpy as np
import xarray as xr
from toksearch import MdsSignal, Pipeline
pipeline = Pipeline([165920, 165921, 173000])
psirz_sig = MdsSignal(
r'\psirz',
'efit01',
dims=('r', 'z', 'times'),
data_order=('times', 'r', 'z'),
)
pipeline.fetch_dataset('ds', {'psirz': psirz_sig})
Aside on ordering of dimensions¶
Note that in creating the MdsSignal
object, we had to be careful to specify the dims
keyword argument along with the data_order
keyword argument. This is done because the order in which MDSplus stores the coordinates for a node's dimensions doesn't necessarily correspond to the shape of the data that is being retrieved. In this case, MDSplus is set up such that dim_of(0)
is the r
coordinates, dim_of(1)
is the z
coordinates, and dim_of(2)
is the times
coordinates. However, the underlying Numpy ndarray has shape ('times', 'r', 'z').
Computing the data¶
Now we go ahead and compute the pipeline. Recall that the object returned from the compute_*
family of methods is a list-like object that can be iterated over. So, we can extract a list of xarray Dataset
objects. This list can be used subsequently as a basis for a few types of aggregations.
recs = pipeline.compute_serial()
datasets = [rec['ds'] for rec in recs]
Using xr.concat
¶
One option is to create a new dataset that is concatenated along the shot
dimension. Note that if, for example, the timebases are different (as they almost always will be), this methodology will leave you with some nan
s in the data.
combined_along_shot_dim = xr.concat(datasets, dim='shot')
print(combined_along_shot_dim)
<xarray.Dataset> Size: 16MB Dimensions: (times: 312, r: 65, z: 65, shot: 3) Coordinates: * times (times) float32 1kB 100.0 120.0 140.0 ... 6.36e+03 6.38e+03 * r (r) float32 260B 0.84 0.8666 0.8931 0.9197 ... 2.487 2.513 2.54 * z (z) float32 260B -1.6 -1.55 -1.5 -1.45 -1.4 ... 1.45 1.5 1.55 1.6 * shot (shot) int64 24B 165920 165921 173000 Data variables: psirz (shot, times, r, z) float32 16MB -0.2949 -0.2961 -0.297 ... nan nan
Similarly, we can concatenate along the times
dimension:
combined_along_times_dim = xr.concat(datasets, dim='times')
print(combined_along_times_dim)
<xarray.Dataset> Size: 13MB Dimensions: (shot: 3, r: 65, z: 65, times: 758) Coordinates: * shot (shot) int64 24B 165920 165921 173000 * r (r) float32 260B 0.84 0.8666 0.8931 0.9197 ... 2.487 2.513 2.54 * z (z) float32 260B -1.6 -1.55 -1.5 -1.45 -1.4 ... 1.45 1.5 1.55 1.6 * times (times) float32 3kB 100.0 140.0 160.0 ... 5.4e+03 5.42e+03 5.44e+03 Data variables: psirz (times, r, z) float32 13MB -0.2949 -0.2961 -0.297 ... 0.2966 0.2973
Converting to numpy ndarrays¶
It is often useful to manipulate the dataset data directly as ndarrays.
ndarrays = [ds['psirz'].values for ds in datasets]
ndarrays[0].shape
(303, 65, 65)
The list of ndarrays can then be, for example, concatenated along the time dimension:
big_array = np.concatenate(ndarrays, axis=0)
big_array.shape
(758, 65, 65)