pcutilities
Module¶
download_uniprot¶
- proteoclade.pcutilities.download_uniprot(*targets, download_folder='fastas')¶
Download FASTA protein sequences from UniProt
- Parameters
targets (tuple) –
One or more tuples containing an integer and a string (TaxonID, DBtype) TaxonID must be a NCBI-valid taxon identifier OR ‘all’ for all taxa (up to ~60GB) DBType must be one of: “s”, “sr”, “r”, “t”, “a”
(SwissProt, SwissProt Reference, Reference, TrEMBL, or All, respectively)
download_folder (string) – Folder which will contain downloaded FASTA files. Default: ‘fasta’ subdirectory
Examples
>>> download_uniprot((9606, ‘s’),(10090, ‘s’)) #downloads human and mouse SwissProt entries >>> download_uniprot((‘all’,’a’)) #downloads every entry in UniProt
Notes
Downloads UniProt-derived FASTA file(s) with specified parameters. Naming convention: taxonid_StartingEntryCount.fasta
download_uniprot_batch¶
- proteoclade.pcutilities.download_uniprot_batch(file, download_folder='fastas')¶
Download UniProt entries from a tab delimited txt file
- Parameters
file (string) –
tab-delimited txt file with 2 columns: [TaxonID,UniProtMods]
Use the uniprot_options constant for available mods
download_folder (string) – subdirectory to store downloads (default ‘fastas’)
Notes
Easier method to download larger numbers of taxa.
download_cRAP¶
- proteoclade.pcutilities.download_cRAP(directory='fastas')¶
Downloads contaminant Repository for Affinity Purification, (cRAP) sequence database.
- Parameters
directory (string) – folder to download cRAP to (default ‘fastas’)
Notes
Supplies an edited fasta file from cRAP
download_taxonomy¶
- proteoclade.pcutilities.download_taxonomy(directory='taxonomy_downloads')¶
Download taxonomy mappings from NCBI.
- Parameters
directory (string) – Where to store temporary files downloaded from NCBI (default ‘taxonomy_downloads’)
Notes
Unzips taxonomy files from NCBI. Will call _taxonomy_mapper to produce PCTAXA file in working directory. PCTAXA file is a pickled dict of taxonomy mapping. Naming will be Y-M-D formatted so you can remember when it was retrieved.
load_taxonomy¶
- proteoclade.pcutilities.load_taxonomy(file)¶
Loads PCTAXA file into memory.
- Parameters
file (string) – A .pctaxa file created using the download_taxonomy function.
- Returns
taxonomy dictionary – This dictionary contains all NCBI taxonomy mappings for an organism ID.
dictionary[TaxID] = {‘species’: species, ‘genus’: genus, … }
- Return type
dict
Example
>>> taxonomy = load_taxonomy('190101.pctaxa') >>> taxonomy[9606].get('species') Homo sapiens
ncbi_check¶
- proteoclade.pcutilities.ncbi_check(taxa)¶
Validates a list of taxa by making sure they are NCBI-valid ranks.
- Parameters
taxa (tuple or list) – List of taxonomic ranks, i.e. (‘order’,’family’)
- Returns
List of only valid taxonomic ranks.
- Return type
list
db_stats¶
- proteoclade.pcutilities.db_stats(db)¶
Prints database parameters from database for the user.
- Parameters
db (string) – .pcdb file created with create_pcdb; prints out stats.
db_fetch_params¶
- proteoclade.pcutilities.db_fetch_params(db)¶
Used in pcannotate.py but also may be useful for the user. Will retrieve the digest parameters specified when the database was created.
- Parameters
db (string) – PCDB (SQLite db) to connect to
- Returns
results – tuple of parameters (min_length, max_length, missed_cleavages, digest_rule, date_created). If parameters are not found, None is returned.
- Return type
tuple or None