Advanced Usage¶
Schema definitions¶
To query data you need a schema, a python class that inherits from rframe.BaseSchema. Lets say we want to store a collection of versioned documents indexed by the experiment, detector and version
Database Queries¶
Once we have a schema, we can use it to query any of the supported data backends This allows for a consistent interface to data stored in any backend.
import pymongo
db = pymongo.MongoClient()['cmt2']['simple_dataframe']
# or
db = pd.read_csv("pandas_dataframe.csv")
docs = ExampleSchema.find(datasource=db, experiment=..., detector=..., version=...)
# or
doc = ExampleSchema.find_one(datasource=db, experiment=..., detector=..., version=...)
Inserting data¶
A schema can also help you insert new data into a data source, e.g. a mogodb collection.
import pymongo
db = pymongo.MongoClient()['cmt2']['simple_dataframe']
doc = ExampleSchema(experiment=..., detector=..., version=..., value=..., ...)
doc.save(db) # this will insert the document into the collection
The details of how to insert data into are handled by the framework so that you can use the same code independent of the data source.
The RemoteFrame¶
Alternatively we can use the RemoteFrame
class to access/store documents in any supported backend.
rf = ExampleSchema.rframe(db)
# or
rf = rframe.RemoteFrame(ExampleSchema, db)
Reading specific rows
Rows can be accessed by calling the dataframe with the rows index values, using pandas-like indexing df.loc[idx]
, df.at[idx, column]
, df[column].loc[idx]
or with the xarray style df.sel(index_name=idx)
method
# These methods will al return an identical pandas dataframe
df = rf.loc[experiment,detector, version]
df = rf.sel(experiment=experiment, detector=detector, version=version)
df = rf.loc[experiment,detector, version]
# Access a specific column to get a series back
df = rf['value'].loc[experiment,detector, version]
df = rf.value.loc[experiment,detector, version]
# pandas-style scalar lookup returns a scalar
value = rdf.at[(experiment,detector, version), 'value']
# or call the dataframe with the column as argyment and index values as keyword arguments
value = rf('value', experiment=experiment, detector=detector, version=version)
Slicing
You can also omit indices to get results back matching all values of the omitted index
df = rf.sel(version=version)
# or
df = rf.loc[experiment, detector, :]
# or
df = rf.loc[experiment]
# or pass a list a values you want to match on:
df = rf.sel(version=[0,1], experiment=experiment)
# Slicing is also supported
df = rf.sel(version=slice(2,10), detector=detector)
The interval index also supports passing a tuple/slice/begin,end keywords to query all intervals overlapping the given interval
df = rf.sel(version=version, time=(time1,time2))
df = rf.loc[version, time1:time2]
df = rf.get(version=version, begin=time1, end=time2)