____        _           ____  _
/ ___|  ___ | |_   _____| __ )(_) ___
\___ \ / _ \| \ \ / / _ \  _ \| |/ _ \
 ___) | (_) | |\ V /  __/ |_) | | (_) |
|____/ \___/|_| \_/ \___|____/|_|\___/
                            For Ruby

Welcome to the SolveBio Ruby Tutorial!

First, open the SolveBio Ruby shell by typing “solvebio.rb”.

The SolveBio Ruby Shell is based on IRB. When you log in, it will automatically pick up your API key.

View this tutorial online: www.solvebio.com/docs/ruby-tutorial

Navigate the Library

List all available depositories:

Depository.all

Depositories are versioned containers of datasets. There are many versions of each depository, and each version may have one or more datasets.

To list all datasets from all depositories:

Dataset.all

Pass “latest=true” to get all the latest versions of each dataset:

Dataset.all(:latest => true)

To retrieve a dataset by its full name (“ClinVar/3.0.0-2014-12-05/Variants”):

Dataset.retrieve('ClinVar/3.0.0-2014-12-05/Variants')

By leaving out the version (for example: “ClinVar/Variants”), you get a quick shortcut to the latest version of any dataset. Be careful here: you should always specify an exact version in your production code.

Query a Dataset

Every dataset in SolveBio can be queried the same way. You can build queries manually in the Ruby shell, or use our visual Workbench (www.solvebio.com/workbench).

In this example, we will query the latest Variants dataset from ClinVar.

dataset = Dataset.retrieve('ClinVar/Variants')
dataset.query

The “query()” function returns a Ruby enumerable so you can loop through all the results easily.

To examine a single result more closely, you may treat the query response as a list of dictionaries:

dataset.query[0]

You can also slice the result set like any other Ruby array:

dataset.query[0..100]

Filter a Dataset

To narrow down your query, you can filter on any field. For example, to get all variants in ClinVar that are Pathogenic, you would filter on the clinical_significance field for “Pathogenic”:

dataset.query.filter(:clinical_significance => 'Pathogenic')

By default, adding more filters will result in a boolean AND query (all filters must match):

dataset.query.filter(:clinical_significance => 'Pathogenic', :review_status => 'single')

Use the “Filter” class to do more advanced filtering. For example, combine a few filters using boolean OR:

filters = Filter(:clinical_significance => 'Pathogenic') | Filter(:clinical_significance => 'Benign')
dataset.query.filter(filters)

Genomic Datasets

Some SolveBio datasets allow querying by genome build. We call these “genomic datasets”. To find out if a dataset is genomic, and what genome builds are supported:

dataset.is_genomic
> true
dataset.genome_builds
> ['GRCh38', 'GRCh37']

By default, build 'GRCh37' will be selected if it is available. If not, the most recent build will be selected by default. To manually select a genome build when querying, specify the build as a query parameter:

dataset.query(:genome_build => 'GRCh38')

On genomic datasets, you may query by position (single nucleotide) or by range:

dataset.query(:genome_build => 'GRCh37').position('chr1', 976629)
> ...
dataset.query(:genome_build => 'GRCh37').range('chr1', 976629, 1000000)
> ...

Position and range queries return all results that overlap with the specified coordinate(s). Add the parameter exact=true to request exact matches.

dataset.query(:genome_build => 'GRCh37').position('chr1', 883516, exact=true)
> ...
dataset.query(:genome_build => 'GRCh37').range('chr9', 136289550, 136289579, exact=true)
> ...

Next Steps

To learn more about a dataset and its fields, use dataset.help(). For more information on queries and filters, see the API reference: www.solvebio.com/docs/api?ruby