CipherStash: Dataset configuration

A dataset is a collection of tables and fields that you want to encrypt. The configuration includes:

The types of indexes set for each column in the table
The mode for each index
The data type
Settings for match indexes, eg tokenization settings.

We suggest creating a separate dataset for each environment you are handling sensitive data in. This allows the dataset configuration to be updated and tested without affecting another environment. When creating the dataset, make sure you specify a clear and unique description that identifies what the dataset is used for.

Dataset management

Creating a dataset

To create a dataset run the following command in the CipherStash CLI. If you don't have the CLI installed, please follow the getting started guide.

1stash datasets create my_dataset_name --description "Test application"

Uploading a dataset

Use the CipherStash CLI to upload a dataset configuration to your account. Note you will need to have created a client key before you can upload a dataset configuration.

1stash datasets config upload --file dataset.yml --client-id $CS_CLIENT_ID --client-key $CS_CLIENT_KEY

Setting the Workspace region

You may have multiple workspaces in different regions. To set the region for the workspace you are working in, use the following option in the CLI.

1--vitur-host https://us-east-1.aws.viturhosted.net/

Where us-east-1 is the region you are working in. You can find the region you are working in by checking the workspace overview in the CipherStash dashboard.

Configuration reference

Option	Description	Example Setting
tables	List of tables to encrypt	users
tables.path	Name of the table to encrypt	users
tables.fields	List of fields to encrypt	name, email
tables.fields.name	Name of the field to encrypt	name, email
tables.fields.in_place	Whether encrypted data is stored in the same column as plaintext	false
tables.fields.cast_type	Type of data stored in the column	utf8-str
tables.fields.mode	Encryption mode	plaintext
tables.fields.indexes	List of indexes to create for the field
tables.fields.indexes.version	Version of the index
tables.fields.indexes.kind	Type of index	match
tables.fields.indexes.tokenizer	Tokenizer used to tokenize the data	ngram
tables.fields.indexes.tokenizer.kind	Type of tokenizer	ngram
tables.fields.indexes.tokenizer.token_length	Length of the tokens generated by the tokenizer	3
tables.fields.indexes.token_filters	List of filters applied to the tokens	downcase, stop
tables.fields.indexes.token_filters.kind	Type of filter
tables.fields.indexes.k	Number of tokens generated for each value	6
tables.fields.indexes.m	Number of buckets used to store tokens	2048
tables.fields.indexes.include_original	Whether the original value is stored in the index	true