create_pcdb(database_name, fasta_directory='fastas', min_length=7, max_length=55, missed_cleavages=2, m_cleave=True, li_swap=True, rule='trypsin/p', temp_directory=None, worker_count=None, reverse=False)¶
Creates the PCDB file which stores in silico digested peptides, genes, and organism info.
- database_name (string) – Name of pcdb file; should be descriptive of what it contains
- fasta_directory (string) – Directory of FASTAs to use as input (default ‘fastas’)
- min_length (integer) – Minimum peptide amino acid count to include in database (default 7)
- max_length (integer) – Maximum peptide amino acid count to include in database (default 55)
- missed_cleavages (integer) – Number of times a protease is allowed to miss a cut site. (default 2)
- m_cleave (bool) – Whether or not N-terminal methionines are cleaved from proteins (default True)
- li_swap (bool) – Whether peptides stored will have leucines converted to isoleucines (default True)
- rule (string or tuple) –
Protease rule for cutting sites (default ‘trypsin/p’).
if string: must be an enzyme option available in ProteoClade. See Appendix.
if tuple: must be tuple of strings, (“regex_sites”,”terminus”) ex. (r”[RK]”, “C”). Use tuple for custom enzyme rules.
- temp_directory (None or string) –
Directory for temporary database operations if space is a concern (default None)
if None: uses working directory
- worker_count (None or integer) – Number of worker processes to use. Only set to experiment with performance. (default None) if None: determines processes up to a maximum of 6 to use. More processes does not necessarily increase performance.
- reverse (bool) – Whether to reverse protein sequences prior to digestion and storage. Used for FDR mitigation.
>>> create_pcdb("human_mouse_ref.pcdb") #creates a trypsin PCDB file using default settings >>> create_pcdb("bacteria_swissprot.pcdb", rule = "asp-n") #creates an AspN PCDB file
Creates a .pcdb SQLite file in the working directory.
For merging fastas together in a directory. Used in preparation of a targeted database search, i.e. MaxQuant/Mascot.
- merged_fasta_name (string) – Name of .fasta file that will result from merging other fastas
- fasta_directory (string) – Directory from which to read fastas (default ‘fastas’)
Creates a .fasta file containing all read fasta entries.