API Reference¶
This page provides an overview of the Minus80 API.
Accession¶
-
class
minus80.
Accession
(name, files=None, **kwargs)[source]¶ From google: Definition (noun): a new item added to an existing collection of books, paintings, or artifacts.
An Accession is an item that exists in an experimental collection.
Most of the time an accession is interoperable with a sample. However, the term sample can become confusing when an experiment has multiple samplings from the same sample, e.g. timecourse or different tissues.
-
add_file
(path, scheme='ssh', username=None, hostname=None)[source]¶ Add a file that is associated with the accession. This method will attempt to determine where the file is actually stored based on its path. Currently it supports three different protocols: local, ssh and s3. A local file will looks something like: /tmp/file1.fastq.
Parameters: - path/URL (string) – The path/URL the the file. The string is parsed for default information (e.g.
- scheme (string (default: ssh)) – Specifies the scheme/protocol for accessing the file. Defaults to ssh, also supports s3
- username (string (default: None)) – Defines a username that is authorized to access hostname using protocol. Defaults to None in which case it will be determined by calling getpass.getuser().
- hostname (sting (default: None)) – Defines the ostname that the file is accessible through. Defaults to None, where the hostname will be determined
- port (int (default: 22)) – Port to access the file through. Defaults to 22, which is for ssh.
- NOTE (any keyword arguments passed in will override) – the values parsed out of the path.
Returns: Return type:
-
Cohort¶
-
class
minus80.
Cohort
(name, parent=None)[source]¶ A Cohort is a named set of accessions. Once cohorts are created, they are persistant as they are stored in the disk by minus80.
-
add_accessions_from_data_frame
(df, name_col)[source]¶ Add accessions from data frame. This assumes each row is an Accession and that the properties of the accession are stored in the columns.
Parameters: - df (pandas.DataFrame) – The pandas data frame containing one accession per row
- name_col (string) – The column containing the accession names
Example
>>> df = pd.DataFrame( [['S1' 23 'O'], ['S2' 30 'O+']], columns = ['Name','Age','Type'] ) >>> x = m80.add_accessions_from_data_frame(df,'Name')
Would yield two Accessions: S1 and S2 with Age and Type properties.
-
add_raw_file
(path, scheme='ssh', username=None, hostname=None)[source]¶ Add a raw file to the Cohort
-
columns
¶ Return a list of all the available metadata stored for available Accessions
-
crawl_host
(hostname='localhost', path='/', username=None, glob='*.fastq')[source]¶ Use SSH to crawl a host looking for raw files
-
classmethod
from_accessions
(name, accessions)[source]¶ Create a Cohort from an iterable of Accessions.
Parameters: - name (str) – The name of the Cohort
- accessions (iterable of Accessions) – The accessions that will be frozen in the cohort under the given name
Returns: Return type: A Cohort object
-
classmethod
from_yaml
(name, yaml_file)[source]¶ Create a Cohort from a YAML file. Note: this yaml file must be created from
Parameters: - name (str) – The name of the Cohort
- yaml_file (pathlike) – The path to the YAML file that contains the Accessions
Returns: Return type: A Cohort object
-
interactive_ignore_pattern
(pattern, n=20)[source]¶ Start an interactive prompt to ignore patterns in file names (e.g. “test”)
-
names
¶ Return a list of all available names and aliases
-
random_accession
()[source]¶ Returns a random accession from the Cohort
Parameters: None – Returns: An Accession object Return type: Accession
-
random_accessions
(n=1, replace=False)[source]¶ Returns a list of random accessions from the Cohort, either with or without replacement.
Parameters:
-
Tools¶
-
minus80.Tools.
available
(dtype=None, name=None)[source]¶ Reports the available datasets Frozen in the minus80 database.
Parameters: - dtype (str) – Each dataset has a datatype associated with it. E.g.: Cohort. If no dtype is specified, all available dtypes will be returned.
- name (str, default:'*') – The name of the dataset you want to check is available. The default value is the wildcard ‘*’ which will return all available datasets with the specified dtype.
Returns: If both dtype and name are specified, a bool is returned indiciating if the dataset is available. Otherise a formatted table is printed and None is returned.
Return type:
-
minus80.Tools.
delete
(dtype=None, name=None, force=False)[source]¶ Deletes files associated with Minus80 datasets.
Parameters: - name (str) – The name of the dataset you want to delete
- dtype (str) – Each dataset has a datatype associated with it. E.g.: Cohort. If no dtype is specified, all available dtypes will be returned.
- force (bool, default: False) – If False, the function will list off the files it wants to delete. If True, it will do what you tell it to do and just delete things (not recommended).
Returns: - int – Returns the number of files deleted
- .. warning:: This is damaging. Deleted datasets cannot be (easily) recovered.
-
minus80.Tools.
get_files
(dtype=None, name=None, fullpath=False)[source]¶ List the files in the minus80 directory associated with a dtype and a name.
Parameters: - name (str, required) – The name of the dataset. Note: accepts glob arguments.
- dtype (str, default=None) – The data type of the dataset. E.g.: Cohort. If None, a wildward will be used to retrieve all dtypes with the name will be returned.
- fullpath (bool, default=False) – If true, full paths to files will be returned if false, only filenames will be returned.
Note
This will only return top level files which sometimes will be directories.
Freezable¶
-
class
minus80.
Freezable
(name, parent=None, basedir=None)[source]¶ Freezable is an abstract class. Things that inherit from Freezable can be loaded and unloaded from the Minus80.
A freezable object is a persistant object that lives in a known directory aimed to make expensive to build objects and databases loadable from new runtimes.
The three main things that a Freezable object supplies are: * access to a sqlite database (relational records) * access to a bcolz databsase (columnar/table data) * access to a persistant key/val store * access to named temp files
-
__weakref__
¶ list of weak references to the object (if defined)
-
_bcolz
(tblname, df=None, m80name=None, m80type=None, blaze=False)[source]¶ This is the access point to the bcolz database
-
_bcolz_array
(name, array=None, m80name=None, m80type=None)[source]¶ Routines to set/get arrays from the bcolz store
-