pcutilities Module

download_uniprot

proteoclade.pcutilities.download_uniprot(*targets, download_folder='fastas')

Download FASTA protein sequences from UniProt

Parameters
  • targets (tuple) –

    One or more tuples containing an integer and a string (TaxonID, DBtype) TaxonID must be a NCBI-valid taxon identifier OR ‘all’ for all taxa (up to ~60GB) DBType must be one of: “s”, “sr”, “r”, “t”, “a”

    (SwissProt, SwissProt Reference, Reference, TrEMBL, or All, respectively)

  • download_folder (string) – Folder which will contain downloaded FASTA files. Default: ‘fasta’ subdirectory

Examples

>>> download_uniprot((9606, ‘s’),(10090, ‘s’)) #downloads human and mouse SwissProt entries
>>> download_uniprot((‘all’,’a’)) #downloads every entry in UniProt

Notes

Downloads UniProt-derived FASTA file(s) with specified parameters. Naming convention: taxonid_StartingEntryCount.fasta

download_uniprot_batch

proteoclade.pcutilities.download_uniprot_batch(file, download_folder='fastas')

Download UniProt entries from a tab delimited txt file

Parameters
  • file (string) –

    tab-delimited txt file with 2 columns: [TaxonID,UniProtMods]

    Use the uniprot_options constant for available mods

  • download_folder (string) – subdirectory to store downloads (default ‘fastas’)

Notes

Easier method to download larger numbers of taxa.

download_cRAP

proteoclade.pcutilities.download_cRAP(directory='fastas')

Downloads contaminant Repository for Affinity Purification, (cRAP) sequence database.

Parameters

directory (string) – folder to download cRAP to (default ‘fastas’)

Notes

Supplies an edited fasta file from cRAP

download_taxonomy

proteoclade.pcutilities.download_taxonomy(directory='taxonomy_downloads')

Download taxonomy mappings from NCBI.

Parameters

directory (string) – Where to store temporary files downloaded from NCBI (default ‘taxonomy_downloads’)

Notes

Unzips taxonomy files from NCBI. Will call _taxonomy_mapper to produce PCTAXA file in working directory. PCTAXA file is a pickled dict of taxonomy mapping. Naming will be Y-M-D formatted so you can remember when it was retrieved.

load_taxonomy

proteoclade.pcutilities.load_taxonomy(file)

Loads PCTAXA file into memory.

Parameters

file (string) – A .pctaxa file created using the download_taxonomy function.

Returns

taxonomy dictionary – This dictionary contains all NCBI taxonomy mappings for an organism ID.

dictionary[TaxID] = {‘species’: species, ‘genus’: genus, … }

Return type

dict

Example

>>> taxonomy = load_taxonomy('190101.pctaxa')
>>> taxonomy[9606].get('species')
Homo sapiens

ncbi_check

proteoclade.pcutilities.ncbi_check(taxa)

Validates a list of taxa by making sure they are NCBI-valid ranks.

Parameters

taxa (tuple or list) – List of taxonomic ranks, i.e. (‘order’,’family’)

Returns

List of only valid taxonomic ranks.

Return type

list

db_stats

proteoclade.pcutilities.db_stats(db)

Prints database parameters from database for the user.

Parameters

db (string) – .pcdb file created with create_pcdb; prints out stats.

db_fetch_params

proteoclade.pcutilities.db_fetch_params(db)

Used in pcannotate.py but also may be useful for the user. Will retrieve the digest parameters specified when the database was created.

Parameters

db (string) – PCDB (SQLite db) to connect to

Returns

results – tuple of parameters (min_length, max_length, missed_cleavages, digest_rule, date_created). If parameters are not found, None is returned.

Return type

tuple or None