pcdb
Module¶
create_pcdb¶
- proteoclade.pcdb.create_pcdb(database_name, fasta_directory='fastas', min_length=7, max_length=55, missed_cleavages=2, m_excision='always', li_swap=True, rule='trypsin/p', temp_directory=None, worker_count=None, reverse=False)¶
Creates the PCDB file which stores in silico digested peptides, genes, and organism info.
- Parameters
database_name (string) – Name of pcdb file; should be descriptive of what it contains
fasta_directory (string) – Directory of FASTAs to use as input (default ‘fastas’)
min_length (integer) – Minimum peptide amino acid count to include in database (default 7)
max_length (integer) – Maximum peptide amino acid count to include in database (default 55)
missed_cleavages (integer) – Number of times a protease is allowed to miss a cut site. (default 2)
m_excision (string) – Whether or not N-terminal methionines are excised from proteins (default ‘always’) Options: ‘always’, ‘never’, ‘both’
li_swap (bool) – Whether peptides stored will have leucines converted to isoleucines (default True)
rule (string or tuple) –
Protease rule for cutting sites (default ‘trypsin/p’).
if string: must be an enzyme option available in ProteoClade. See Appendix.
if tuple: must be tuple of strings, (“regex_sites”,”terminus”) ex. (r”[RK]”, “C”). Use tuple for custom enzyme rules.
temp_directory (None or string) –
Directory for temporary database operations if space is a concern (default None)
if None: uses working directory
worker_count (None or integer) – Number of worker processes to use. Only set to experiment with performance. (default None) if None: determines processes up to a maximum of 6 to use. More processes does not necessarily increase performance.
reverse (bool) – Whether to reverse protein sequences prior to digestion and storage. Used for FDR mitigation.
Examples
>>> create_pcdb("human_mouse_ref.pcdb") #creates a trypsin PCDB file using default settings >>> create_pcdb("bacteria_swissprot.pcdb", rule = "asp-n") #creates an AspN PCDB file
Notes
Creates a .pcdb SQLite file in the working directory.
merge_fastas¶
- proteoclade.pcdb.merge_fastas(merged_fasta_name, fasta_directory='fastas')¶
For merging fastas together in a directory. Used in preparation of a targeted database search, i.e. MaxQuant/Mascot.
- Parameters
merged_fasta_name (string) – Name of .fasta file that will result from merging other fastas
fasta_directory (string) – Directory from which to read fastas (default ‘fastas’)
Notes
Creates a .fasta file containing all read fasta entries.