Adding Data Sources: Basic Plugins¶
This guided section will show you how to add a rudimentary data loading plugin as a demonstration of how to extend PyARPES to allow you to work with your lab’s data.
This is the first of a two-part section on data loading and plugins in PyARPES. If your needs are more advanced, you can see the second page for more details.
Loading CSV Files into the PyARPES Format¶
For all analysis work, PyARPES assumes that the data to be manipulated is an xarray datatype, typically an xarrary.Dataset or an xarray.DataArray.
Additionally, ARPES data must be labeled with enough standard
coordinates that we can convert to momentum. Let’s assume
that our data comes formatted in two files, one providing the data
{name}.csv
, and one the coordinates {name}.coords.csv
. A
standard data file might look like
Analyzer Spectrum
35, 41, 43, 112, 229, 433, 654, 584, 262, 105
89, 153, 207, 281, 529, 969, 1061, 602, 236, 101
295, 180, 249, 522, 833, 911, 856, 536, 236, 98
261, 226, 379, 509, 613, 787, 777, 522, 224, 94
271, 268, 338, 397, 568, 746, 703, 478, 217, 93
233, 204, 327, 464, 557, 691, 682, 477, 216, 93
185, 142, 203, 412, 681, 792, 732, 494, 223, 94
189, 130, 141, 206, 395, 740, 934, 615, 233, 96
146, 142, 151, 169, 238, 364, 531, 501, 267, 110
36, 40, 49, 89, 144, 214, 288, 256, 161, 96
and a standard coordinates file might hypothetically look like
energy angle
-0.425 0.221
-0.369 0.263
-0.313 0.305
-0.258 0.347
-0.202 0.389
-0.146 0.431
-0.090 0.472
-0.034 0.514
0.020 0.556
0.076 0.598
Barebones, Function-based Approach¶
The simplest way to handle this task is just to write a data loading
function that we can use to load the CSVs. As a first pass, we can load
just the data file and turn it into an xarray.DataArray
.
import xarray as xr
import numpy as np
def load_csv_datatype(path_to_file: str) -> xr.DataArray:
loaded_data = np.loadtxt(path_to_file, delimiter=',', skiprows=1) # skip the Data comment
return xr.DataArray(loaded_data)
All we need to do now is attach the coordinates. Let’s modify the function to load also the columns from the other file
import xarray as xr
import numpy as np
from pathlib import Path
def load_csv_datatype(path_to_file: str) -> xr.DataArray:
loaded_data = np.loadtxt(path_to_file, delimiter=',', skiprows=1) # skip the Data comment
coordinates_file = str(Path(Path(path_to_file).stem + '.coords.csv').absolute())
# get the dimension names
with open(coordinates_file) as f:
dim_names = f.readline().split()
raw_coordinates = np.loadtxt(coordinates_file, skiprows=1)
return xr.DataArray(
loaded_data,
coords={d: raw_coordinates[:,i] for i, d in enumerate(dim_names)},
dims=dim_names,
# attrs={...} <- attributes here
)
Writing the Plugin¶
You can use the above code for loading this data, with the caveat mentioned above about momentum conversion. Alternatively, we can integrate it into a plugin, which allows registering the data loading code against a labeled “location” sourcing the data, and makes it easier to fill in missing values, ensure a standard representation, and modify behavior between similar but differing data formats.
To do this, we subclass arpes.endstations.SingleFileEndstation
...
from arpes.endstations import SingleFileEndstation, add_endstation
class CSVDataEndstation(SingleFileEndstation):
PRINCIPAL_NAME = 'csv' # allows us to use this code to refer to data labeled with location="csv"
_TOLERATED_EXTENSIONS = {'.csv',} # allow only .csv files!
def load_single_frame(self, frame_path: str=None, scan_desc: dict = None, **kwargs):
data = load_csv_datatype(frame_path)
return xr.Dataset({'spectrum': data})
# register it
add_endstation(CSVDataEndstation)
Now, you can load code with CSVDataEndstation.load_from_path
, or
with CSVDataEndstation.load
. Additionally, you can load using the
standard data loading function by passing location='csv'
. Because
ours is the only one registered against the .csv file format, loading
data without the location keyword will use our new class by default.
The data loading plugins provide a number of features making it simpler to write data loading code for ARPES, especially in normalizing coordinate units (mm for all distances, rad for all angular measures), and ensuring the coordinates necessary to allow momentum conversion are attached. If you want to learn more about writing data plugins, have a look at the in depth description of how they work in the second part of this tutorial.