API Reference

This page provides an overview of the Minus80 API.

Accession

class minus80.Accession(name, files=None, **kwargs)[source]

From google: Definition (noun): a new item added to an existing collection of books, paintings, or artifacts.

An Accession is an item that exists in an experimental collection.

Most of the time an accession is interoperable with a sample. However, the term sample can become confusing when an experiment has multiple samplings from the same sample, e.g. timecourse or different tissues.

add_file(path, scheme='ssh', username=None, hostname=None)[source]

Add a file that is associated with the accession. This method will attempt to determine where the file is actually stored based on its path. Currently it supports three different protocols: local, ssh and s3. A local file will looks something like: /tmp/file1.fastq.

Parameters:
  • path/URL (string) – The path/URL the the file. The string is parsed for default information (e.g.
  • scheme (string (default: ssh)) – Specifies the scheme/protocol for accessing the file. Defaults to ssh, also supports s3
  • username (string (default: None)) – Defines a username that is authorized to access hostname using protocol. Defaults to None in which case it will be determined by calling getpass.getuser().
  • hostname (sting (default: None)) – Defines the ostname that the file is accessible through. Defaults to None, where the hostname will be determined
  • port (int (default: 22)) – Port to access the file through. Defaults to 22, which is for ssh.
  • NOTE (any keyword arguments passed in will override) – the values parsed out of the path.
Returns:

Return type:

None

add_files(paths, skip_test=False)[source]

Add multiple paths that are associated with an accession

Parameters:
  • paths (iterable of strings) – The paths the the files
  • skip_test (bool) – If true, the method will not test if the file exists
Returns:

Return type:

None

Cohort

class minus80.Cohort(name, parent=None)[source]

A Cohort is a named set of accessions. Once cohorts are created, they are persistant as they are stored in the disk by minus80.

add_accession(accession)[source]

Add a sample to the Database

add_accessions(accessions)[source]

Add multiple Accessions at once

add_accessions_from_data_frame(df, name_col)[source]

Add accessions from data frame. This assumes each row is an Accession and that the properties of the accession are stored in the columns.

Parameters:
  • df (pandas.DataFrame) – The pandas data frame containing one accession per row
  • name_col (string) – The column containing the accession names

Example

>>> df = pd.DataFrame(
      [['S1'    23    'O'],
       ['S2'    30    'O+']],
       columns =  ['Name','Age','Type']
    )
>>> x = m80.add_accessions_from_data_frame(df,'Name')

Would yield two Accessions: S1 and S2 with Age and Type properties.

add_raw_file(path, scheme='ssh', username=None, hostname=None)[source]

Add a raw file to the Cohort

alias_column(colname, min_alias_length=3)[source]

Assign an accession column as aliases

assimilate_files(files, best_only=True)[source]

Take a list of files and assign them to Accessions

columns

Return a list of all the available metadata stored for available Accessions

crawl_host(hostname='localhost', path='/', username=None, glob='*.fastq')[source]

Use SSH to crawl a host looking for raw files

drop_aliases()[source]

Clear the aliases from the database

classmethod from_accessions(name, accessions)[source]

Create a Cohort from an iterable of Accessions.

Parameters:
  • name (str) – The name of the Cohort
  • accessions (iterable of Accessions) – The accessions that will be frozen in the cohort under the given name
Returns:

Return type:

A Cohort object

classmethod from_yaml(name, yaml_file)[source]

Create a Cohort from a YAML file. Note: this yaml file must be created from

Parameters:
  • name (str) – The name of the Cohort
  • yaml_file (pathlike) – The path to the YAML file that contains the Accessions
Returns:

Return type:

A Cohort object

ignore_files(files)[source]

ignore file paths

interactive_ignore_pattern(pattern, n=20)[source]

Start an interactive prompt to ignore patterns in file names (e.g. “test”)

names

Return a list of all available names and aliases

random_accession()[source]

Returns a random accession from the Cohort

Parameters:None
Returns:An Accession object
Return type:Accession
random_accessions(n=1, replace=False)[source]

Returns a list of random accessions from the Cohort, either with or without replacement.

Parameters:
  • n (int) – The number of random accessions to retrieve
  • replace (bool) – If false, randomimzation does not include replacement
search_accessions(name, include_scores=False, recurse=True)[source]

Performs a search of accession names

search_files(path)[source]

Perform a search of files names (path)

Tools

minus80.Tools.available(dtype=None, name=None)[source]

Reports the available datasets Frozen in the minus80 database.

Parameters:
  • dtype (str) – Each dataset has a datatype associated with it. E.g.: Cohort. If no dtype is specified, all available dtypes will be returned.
  • name (str, default:'*') – The name of the dataset you want to check is available. The default value is the wildcard ‘*’ which will return all available datasets with the specified dtype.
Returns:

If both dtype and name are specified, a bool is returned indiciating if the dataset is available. Otherise a formatted table is printed and None is returned.

Return type:

bool, None

minus80.Tools.delete(dtype=None, name=None, force=False)[source]

Deletes files associated with Minus80 datasets.

Parameters:
  • name (str) – The name of the dataset you want to delete
  • dtype (str) – Each dataset has a datatype associated with it. E.g.: Cohort. If no dtype is specified, all available dtypes will be returned.
  • force (bool, default: False) – If False, the function will list off the files it wants to delete. If True, it will do what you tell it to do and just delete things (not recommended).
Returns:

  • int – Returns the number of files deleted
  • .. warning:: This is damaging. Deleted datasets cannot be (easily) recovered.

minus80.Tools.get_files(dtype=None, name=None, fullpath=False)[source]

List the files in the minus80 directory associated with a dtype and a name.

Parameters:
  • name (str, required) – The name of the dataset. Note: accepts glob arguments.
  • dtype (str, default=None) – The data type of the dataset. E.g.: Cohort. If None, a wildward will be used to retrieve all dtypes with the name will be returned.
  • fullpath (bool, default=False) – If true, full paths to files will be returned if false, only filenames will be returned.

Note

This will only return top level files which sometimes will be directories.

CloudData

class minus80.CloudData[source]

Freezable

class minus80.Freezable(name, parent=None, basedir=None)[source]

Freezable is an abstract class. Things that inherit from Freezable can be loaded and unloaded from the Minus80.

A freezable object is a persistant object that lives in a known directory aimed to make expensive to build objects and databases loadable from new runtimes.

The three main things that a Freezable object supplies are: * access to a sqlite database (relational records) * access to a bcolz databsase (columnar/table data) * access to a persistant key/val store * access to named temp files

__init__(name, parent=None, basedir=None)[source]

Initialize the Freezable Object.

Parameters:
  • name (str) – The name of the frozen object.
  • parent (Freezable object or None) – The parent object
__weakref__

list of weak references to the object (if defined)

_add_child(child)[source]

Register a child dataset

_bcolz(tblname, df=None, m80name=None, m80type=None, blaze=False)[source]

This is the access point to the bcolz database

_bcolz_array(name, array=None, m80name=None, m80type=None)[source]

Routines to set/get arrays from the bcolz store

_bulk_transaction()[source]

This is a context manager that handles bulk transaction. i.e. this context will handle the BEGIN, END and appropriate ROLLBACKS.

Usage: >>> with x.bulk_transaction() as cur:

cur.execute(‘INSERT INTO table XXX VALUES YYY’)
_get_dbpath(extension, create=False)[source]

Get the path to database files

_sqlite()[source]

This is the access point to the sqlite database