Skip to main content

Handling Data

This page will introduce you to handling data in Dara applications.

DataVariable

You have previously been introduced to the concept of Variables, the core of interactivity in Dara. These variables work well for smaller application data but DataVariable are better equipped to handle datasets, especially larger ones while optimizing performance.

DataVariables take in data in the form of a pandas.DataFrame. DataVariable can also be instantiated without data to be filled later by interacting with the app.

import pandas
from dara.core import DataVariable, CacheType

data = pandas.DataFrame(data={'a': [1, 2, 3], 'b': [4, 5, 6]})
my_first_data_variable = DataVariable(data, cache=CacheType.GLOBAL)

Like DerivedVariables, you can set how you'd like to cache the results according to the CacheType enumeration:

  • CacheType.GLOBAL - as you've seen previously, the data will be accessible by all users and can be provided upfront
  • CacheType.SESSION - the data will be accessible only for the current session
  • CacheType.USER - the data will be accessible only for the current user and is tracked across sessions/logins
caution

When loading your data directly in source code like the example above, the cache argument must be set to global and therefore accessible by all users. This is because at the time of executing your code initially, there is no concept of a logged in user or session. Any other setting in this context will trigger an error. Keep reading to see when other cache settings can be used.

Usage

The DataVariable can be used as any other variable, that is:

  • in a DerivedVariable
import pandas
from dara.core import DataVariable, DerivedVariable, CacheType

data = pandas.DataFrame(data={'a': [1, 2, 3], 'b': [4, 5, 6]})
my_first_data_variable = DataVariable(data, cache=CacheType.GLOBAL)

DerivedVariable(my_function, variables=[my_first_data_variable])
  • in a py_component
import pandas
from bokeh.plotting import figure

from dara.core import py_component, DataVariable, CacheType, ConfigurationBuilder
from dara.components import Bokeh


@py_component
def plot_data(data: pandas.DataFrame):
p = figure(title='My Plot')
p.line(x=data['a'], y=data['b'])
return Bokeh(p)

data = pandas.DataFrame(data={'a': [1, 2, 3], 'b': [4, 5, 6]})
my_first_data_variable = DataVariable(data, cache=CacheType.GLOBAL)

config = ConfigurationBuilder()
config.add_page('data page', plot_data(my_first_data_variable))
  • in an action
import pandas
from dara.core import DataVariable, CacheType, Variable, action

data = pandas.DataFrame(data={'a': [1, 2, 3], 'b': [4, 5, 6]})
my_first_data_variable = DataVariable(data, cache=CacheType.GLOBAL)

target = Variable()

@action
async def my_resolver_function(ctx: action.Ctx, data_var_value: pandas.DataFrame):
# data_var_value here takes the value of my_first_data_variable
...
await ctx.update(variable=target, value=data_var_value)

my_resolver_function(my_first_data_variable)

Updating

You can update the data stored in a DataVariable by:

  • using a dara.core.interactivity.actions action
import pandas
from dara.core import DataVariable, CacheType

my_session_data_variable = DataVariable(cache=CacheType.SESSION)
# Load a new DataFrame into the DataVariable - only visible to the current session since cache=CacheType.SESSION
my_session_data_variable.update(value=pandas.read_csv(...))

# Load a new DataFrame into the DataVariable - visible to all users since cache=CacheType.GLOBAL
my_first_data_variable.update(value=pandas.read_csv(...))
  • using dara-components.common.dropzone.UploadDropzone to upload a new dataset
from dara.core import DataVariable, CacheType
from dara.components import UploadDropzone

# data specific to the user
target = DataVariable(cache=CacheType.USER)

# On upload, target will be updated to store the DataFrame uploaded by the user
UploadDropzone(target=target)

Filtering

Another benefit of using a DataVariable for data is the ability to filter it. This will automatically be done by components designed to handle DataVariables, such as Table, which will only request a subset of data as needed.

import pandas

from dara.core import DataVariable, ConfigurationBuilder
from dara.core.interactivity.filtering import ClauseQuery, ValueQuery
from dara.components import Table

data = pandas.DataFrame(data={'a': [1, 2, 3], 'b': [4, 5, 6]})
variable = DataVariable(data)

# Filter rows where 'a' == 2
variable_filtered_1 = variable.filter(ValueQuery(column='a', value=2))

# Filter rows where 'a' == 2 or 'b' == 5
variable_filtered_2 = variable.filter(
ClauseQuery(
combinator='OR',
clauses=[ValueQuery(column='a', value=2), ValueQuery(column='b', value=5)]
)
)

...

filtered_tables = Stack(
Table(data=variable_filtered_1, columns=['a', 'b']),
Table(data=variable_filtered_2, columns=['a', 'b'])
)

config = ConfigurationBuilder()
config.add_page('Filtered Data', filtered_tables)

Even though the new filtered variables can be used as if they were new variables, they will correctly be updated when the original variable is updated i.e. via an action.

DerivedDataVariable

Another way to handle data is to create a dara.core.interactivity.derived.data_variable.DerivedDataVariable that will produce a dataset on the fly. This is very similar to a DerivedVariable except it's designed to handle data.

The differences between DerivedVariable and DerivedDataVariable are the following:

  • The resolver function must return a DataFrame.
  • It doesn't support nested data or have the .get method- as it does not make sense when working with a DataFrame.
  • The cache option cannot be set to None.

Under the hood it works almost exactly like a DerivedVariable. Your resolver function will be executed whenever one of the dependant variables change (or when you use .trigger()). Its result will be then passed through the same filtering logic as for a DataVariable.

DerivedDataVariables can also be ran in a separate process like DerivedVariables. The same requirements apply in that you must specify the location of the DerivedDataVariables resolvers in your app using config.task_module and set run_as_task=True on the DerivedDataVariable's instantiation.

tip

When you want to perform any data processing within your app based on user inputs, a DerivedDataVariable is a great way to keep this logic outside of your py_components. See __Best Practices: Light PyComponents for an example.

Additionally, if you are doing any data processing in your app that is not based on user inputs (for example static scaling or dropping of columns), your app will perform better if this processing is done beforehand and loaded into your app already processed.

Caching

Just like in DerivedVariables, the cache setting on data variables can also accept a policy object to determine how the data will be preserved and evicted. Learn more about the available cache policies and how to use them in the Cache Policies Documentation.

danger

For a DataVariable the size-based LRU cache does not make a difference, as there is always one copy of the data in memory per the cache scope (i.e. one global copy with cache_type='global', one copy per user or per session respectively with 'user' and 'session' types).

The TTL cache can be still used to evict data after a certain amount of time.

Data Utilities

The data utilities provide extra functionalities for interacting with data, built on top of the core DataVariable concepts. They contain a DataFactory class which is a factory of variables, actions and methods to interact with locally stored data.

To learn more you can checkout the documentation.

Next Steps

Throughout the user guide, you've been adding pages to your configuration to render the components you've built. All of these pages are configured into a layout along with a router so that you can navigate to and from pages with URLs and navigation bars. You will learn more about this in the next section.