pcdb Module


proteoclade.pcdb.create_pcdb(database_name, fasta_directory='fastas', min_length=7, max_length=55, missed_cleavages=2, m_cleave=True, li_swap=True, rule='trypsin/p', temp_directory=None, worker_count=None, reverse=False)

Creates the PCDB file which stores in silico digested peptides, genes, and organism info.

  • database_name (string) – Name of pcdb file; should be descriptive of what it contains
  • fasta_directory (string) – Directory of FASTAs to use as input (default ‘fastas’)
  • min_length (integer) – Minimum peptide amino acid count to include in database (default 7)
  • max_length (integer) – Maximum peptide amino acid count to include in database (default 55)
  • missed_cleavages (integer) – Number of times a protease is allowed to miss a cut site. (default 2)
  • m_cleave (bool) – Whether or not N-terminal methionines are cleaved from proteins (default True)
  • li_swap (bool) – Whether peptides stored will have leucines converted to isoleucines (default True)
  • rule (string or tuple) –

    Protease rule for cutting sites (default ‘trypsin/p’).

    if string: must be an enzyme option available in ProteoClade. See Appendix.

    if tuple: must be tuple of strings, (“regex_sites”,”terminus”) ex. (r”[RK]”, “C”). Use tuple for custom enzyme rules.

  • temp_directory (None or string) –

    Directory for temporary database operations if space is a concern (default None)

    if None: uses working directory

  • worker_count (None or integer) – Number of worker processes to use. Only set to experiment with performance. (default None) if None: determines processes up to a maximum of 6 to use. More processes does not necessarily increase performance.
  • reverse (bool) – Whether to reverse protein sequences prior to digestion and storage. Used for FDR mitigation.


>>> create_pcdb("human_mouse_ref.pcdb") #creates a trypsin PCDB file using default settings
>>> create_pcdb("bacteria_swissprot.pcdb", rule = "asp-n") #creates an AspN PCDB file


Creates a .pcdb SQLite file in the working directory.


proteoclade.pcdb.merge_fastas(merged_fasta_name, fasta_directory='fastas')

For merging fastas together in a directory. Used in preparation of a targeted database search, i.e. MaxQuant/Mascot.

  • merged_fasta_name (string) – Name of .fasta file that will result from merging other fastas
  • fasta_directory (string) – Directory from which to read fastas (default ‘fastas’)


Creates a .fasta file containing all read fasta entries.