ploteries.data_store.data_store

Module Attributes

Col_(text[, type_, is_literal, _selectable])

Produce a ColumnClause object.

Classes

DataStore(path[, read_only, max_queue_size])

Stores figure definitions, data series definitions and data series records.

Ref_(index)

Extension to SliceSequence that is used to define serializable references to data store content.

ploteries.data_store.data_store.Col_(text, type_=None, is_literal=False, _selectable=None)

Convenience alias to sqlalchemy.sql.column. See documentation for DataStore.get_data_handlers() for example usage.

class ploteries.data_store.data_store.Ref_(index)

Extension to SliceSequence that is used to define serializable references to data store content.

Params data, index:

Same params taken by dictionary form input to DataStore.__getitem__().

classmethod call_multi(data_store: DataStore, *refs_: Ref_, _test_output=False)

Applies the multiple Ref_ slice sequences to the data store individually, but avoids redundant queries that are repeated across refs_.

class ploteries.data_store.data_store.DataStore(path, read_only=False, max_queue_size=1000)

Stores figure definitions, data series definitions and data series records. Each of these is stored in one of three tables. The data series records table in particular contains heterogenoeous data records for all data series types. Exposes various facilities including

  • an insert_data_record() method that supports asynchronous data record insert and

  • a __getitem__() method that makes it possible to retrieve all records or a subset thereof from one or more data series that are joined on the time step using familiar bracket syntax (e.g., data_store['data_series_0'] ).

  • The __getitem__() method also makes it possible to employ Ref_ objects embedded in figure definitions that retrieve data from the data store when building the figure.

Todo

Add an example below.

In [1]: print('Hello world')
Hello world
Parameters:
  • path – Database path.

  • read_only – Use database in read-only mode.

  • max_queue_size – Max size of insert cache queue.

get_data_handlers(*column_constraints: BinaryExpression, connection=None)

Gets the data handlers satisfying the specified binary constraints. E.g.,

` from ploteries.data_store import col `

  • get_data_handlers() returns all handlers,

  • get_data_handlers(Col_('name')=='arr1') returns the data handler of name ‘arr1’,

  • get_data_handlers(data_store.data_defs_table.c.name=='arr1') returns the data handler of name ‘arr1’,

  • get_data_handlers(Col_('type')==UniformNDArrayDataHandler) returns all data handlers of that type. (NOT WORKING!)

Todo

Type constraints are not working (see last bullet above).

get_figure_handlers(*column_constraints: BinaryExpression, connection=None)

Gets the figure handlers satisfying the specified binary constraints. See get_data_handlers() for an example.

insert_data_record(data_record: dict | List[dict])

Using this method will automatically batch inserts in a new thread to increase disk-write efficiency (ThreadedInsertCache used internally). Call flush() to ensure all records inserted before the call have been written to the database.

Parameters:

data_record – Record dictionary or list of record dictionaries.

__getitem__(idx: str | Tuple[str] | dict)

Load the data in a table or table join. This method can load all the data or a single record. Joins are carried out on data records table fields index and writer_id.

The returned output is a dictionary in one of two formats: When data series names are provided as a tuple of strings, the format is

{'meta': numpy.ndarray(shape=num_records,
                       dtype=[('index', '<i4'), ('writer_id', '<i4')]),
 'series': {
     'series_name_1': {
         'created': numpy.ndarray(shape=num_records,
                                  dtype='datetime64[us]'),
         'data': (data handler dependent content of length num_records)},
     'series_name_2': ...}
 }

When a single data series name is provided as a string, the nested data and created fields are provided at the root level:

{'meta': numpy.ndarray(shape=num_records,
                       dtype=[('index', '<i4'), ('writer_id', '<i4')]),
 'created': numpy.ndarray(shape=num_records, dtype='datetime64[us]'),
 'data': (data handler dependent content of length num_records)
 }
Parameters:

idx – Data name or tuple of data names (to specify a join). Alternatively, pass the data name (tuple) as field ‘data’ in a dictionary that can further contain field ‘criterion’ to specify any further criterion.

  • ‘data’: Data name string or list of names.

  • ‘index’: If set to ‘latest’, will return a record with the highest index (there might be more than one, of which one of those with the highest worker_id is taken). Otherwise, needs to be an index value. Ignored if None (the default).

  • ‘connection’: Connection object from previously-started context if any. None by default.

  • ‘criterion’: List of extra criterion to apply to the data_records_table query. Empty list [] by default.