Combining data after pipeline computation¶

A common use case of TokSearch is to aggregate the data in a results object for visualization, training a machine learning model, or performing statistical analysis. This notebook shows a few examples of how one might go about doing that.

Creating a simple pipeline¶

We'll start be creating a very simple pipeline. This pipeline fetches flux-on-the-grid data from MDSplus.

In [9]:

Copied!





import numpy as np
import xarray as xr
from toksearch import MdsSignal, Pipeline

pipeline = Pipeline([165920, 165921, 173000])

psirz_sig = MdsSignal(
    r'\psirz',                  
    'efit01',
    dims=('r', 'z', 'times'),
    data_order=('times', 'r', 'z'),
)

pipeline.fetch_dataset('ds', {'psirz': psirz_sig})
import numpy as np
import xarray as xr
from toksearch import MdsSignal, Pipeline

pipeline = Pipeline([165920, 165921, 173000])

psirz_sig = MdsSignal(
    r'\psirz',                  
    'efit01',
    dims=('r', 'z', 'times'),
    data_order=('times', 'r', 'z'),
)

pipeline.fetch_dataset('ds', {'psirz': psirz_sig})

Aside on ordering of dimensions¶

Note that in creating the MdsSignal object, we had to be careful to specify the dims keyword argument along with the data_order keyword argument. This is done because the order in which MDSplus stores the coordinates for a node's dimensions doesn't necessarily correspond to the shape of the data that is being retrieved. In this case, MDSplus is set up such that dim_of(0) is the r coordinates, dim_of(1) is the z coordinates, and dim_of(2) is the times coordinates. However, the underlying Numpy ndarray has shape ('times', 'r', 'z').

Computing the data¶

Now we go ahead and compute the pipeline. Recall that the object returned from the compute_* family of methods is a list-like object that can be iterated over. So, we can extract a list of xarray Dataset objects. This list can be used subsequently as a basis for a few types of aggregations.

In [10]:

Copied!

recs = pipeline.compute_serial()

datasets = [rec['ds'] for rec in recs]
recs = pipeline.compute_serial()

datasets = [rec['ds'] for rec in recs]

Using `xr.concat`¶

One option is to create a new dataset that is concatenated along the shot dimension. Note that if, for example, the timebases are different (as they almost always will be), this methodology will leave you with some nans in the data.

In [11]:

Copied!

combined_along_shot_dim = xr.concat(datasets, dim='shot')
print(combined_along_shot_dim)
combined_along_shot_dim = xr.concat(datasets, dim='shot')
print(combined_along_shot_dim)

<xarray.Dataset> Size: 16MB
Dimensions:  (times: 312, r: 65, z: 65, shot: 3)
Coordinates:
  * times    (times) float32 1kB 100.0 120.0 140.0 ... 6.36e+03 6.38e+03
  * r        (r) float32 260B 0.84 0.8666 0.8931 0.9197 ... 2.487 2.513 2.54
  * z        (z) float32 260B -1.6 -1.55 -1.5 -1.45 -1.4 ... 1.45 1.5 1.55 1.6
  * shot     (shot) int64 24B 165920 165921 173000
Data variables:
    psirz    (shot, times, r, z) float32 16MB -0.2949 -0.2961 -0.297 ... nan nan

Similarly, we can concatenate along the times dimension:

In [12]:

Copied!

combined_along_times_dim = xr.concat(datasets, dim='times')
print(combined_along_times_dim)
combined_along_times_dim = xr.concat(datasets, dim='times')
print(combined_along_times_dim)

<xarray.Dataset> Size: 13MB
Dimensions:  (shot: 3, r: 65, z: 65, times: 758)
Coordinates:
  * shot     (shot) int64 24B 165920 165921 173000
  * r        (r) float32 260B 0.84 0.8666 0.8931 0.9197 ... 2.487 2.513 2.54
  * z        (z) float32 260B -1.6 -1.55 -1.5 -1.45 -1.4 ... 1.45 1.5 1.55 1.6
  * times    (times) float32 3kB 100.0 140.0 160.0 ... 5.4e+03 5.42e+03 5.44e+03
Data variables:
    psirz    (times, r, z) float32 13MB -0.2949 -0.2961 -0.297 ... 0.2966 0.2973

Converting to numpy ndarrays¶

It is often useful to manipulate the dataset data directly as ndarrays.

In [13]:

Copied!

ndarrays = [ds['psirz'].values for ds in datasets]
ndarrays[0].shape
ndarrays = [ds['psirz'].values for ds in datasets]
ndarrays[0].shape

Out[13]:

(303, 65, 65)

The list of ndarrays can then be, for example, concatenated along the time dimension:

In [14]:

Copied!

big_array = np.concatenate(ndarrays, axis=0)
big_array.shape
big_array = np.concatenate(ndarrays, axis=0)
big_array.shape

Out[14]:

(758, 65, 65)