`pcutilities` Module¶

download_uniprot
download_uniprot_batch
download_cRAP
download_taxonomy
load_taxonomy
ncbi_check
db_stats
db_fetch_params

download_uniprot¶

proteoclade.pcutilities.download_uniprot(*targets, download_folder='fastas')¶

Download FASTA protein sequences from UniProt

Parameters

targets (tuple) –
One or more tuples containing an integer and a string (TaxonID, DBtype) TaxonID must be a NCBI-valid taxon identifier OR ‘all’ for all taxa (up to ~60GB) DBType must be one of: “s”, “sr”, “r”, “t”, “a”

(SwissProt, SwissProt Reference, Reference, TrEMBL, or All, respectively)
download_folder (string) – Folder which will contain downloaded FASTA files. Default: ‘fasta’ subdirectory

Examples

>>> download_uniprot((9606, ‘s’),(10090, ‘s’)) #downloads human and mouse SwissProt entries
>>> download_uniprot((‘all’,’a’)) #downloads every entry in UniProt

Notes

Downloads UniProt-derived FASTA file(s) with specified parameters. Naming convention: taxonid_StartingEntryCount.fasta

download_uniprot_batch¶

proteoclade.pcutilities.download_uniprot_batch(file, download_folder='fastas')¶

Download UniProt entries from a tab delimited txt file

Parameters

file (string) –
tab-delimited txt file with 2 columns: [TaxonID,UniProtMods]

Use the uniprot_options constant for available mods
download_folder (string) – subdirectory to store downloads (default ‘fastas’)

Notes

Easier method to download larger numbers of taxa.

download_cRAP¶

proteoclade.pcutilities.download_cRAP(directory='fastas')¶

Downloads contaminant Repository for Affinity Purification, (cRAP) sequence database.

Parameters: directory (string) – folder to download cRAP to (default ‘fastas’)

Notes

Supplies an edited fasta file from cRAP

download_taxonomy¶

proteoclade.pcutilities.download_taxonomy(directory='taxonomy_downloads')¶

Download taxonomy mappings from NCBI.

Parameters: directory (string) – Where to store temporary files downloaded from NCBI (default ‘taxonomy_downloads’)

Notes

Unzips taxonomy files from NCBI. Will call _taxonomy_mapper to produce PCTAXA file in working directory. PCTAXA file is a pickled dict of taxonomy mapping. Naming will be Y-M-D formatted so you can remember when it was retrieved.

load_taxonomy¶

proteoclade.pcutilities.load_taxonomy(file)¶

Loads PCTAXA file into memory.

Parameters

file (string) – A .pctaxa file created using the download_taxonomy function.

Returns

taxonomy dictionary – This dictionary contains all NCBI taxonomy mappings for an organism ID.

dictionary[TaxID] = {‘species’: species, ‘genus’: genus, … }

Return type

dict

Example

>>> taxonomy = load_taxonomy('190101.pctaxa')
>>> taxonomy[9606].get('species')
Homo sapiens

ncbi_check¶

proteoclade.pcutilities.ncbi_check(taxa)¶

Validates a list of taxa by making sure they are NCBI-valid ranks.

Parameters: taxa (tuple or list) – List of taxonomic ranks, i.e. (‘order’,’family’)
Returns: List of only valid taxonomic ranks.
Return type: list

db_stats¶

proteoclade.pcutilities.db_stats(db)¶

Prints database parameters from database for the user.

Parameters: db (string) – .pcdb file created with create_pcdb; prints out stats.

db_fetch_params¶

proteoclade.pcutilities.db_fetch_params(db)¶

Used in pcannotate.py but also may be useful for the user. Will retrieve the digest parameters specified when the database was created.

Parameters: db (string) – PCDB (SQLite db) to connect to
Returns: results – tuple of parameters (min_length, max_length, missed_cleavages, digest_rule, date_created). If parameters are not found, None is returned.
Return type: tuple or None

pcutilities Module¶

download_uniprot¶

download_uniprot_batch¶

download_cRAP¶

download_taxonomy¶

load_taxonomy¶

ncbi_check¶

db_stats¶

db_fetch_params¶

`pcutilities` Module¶