De Novo Sequencing (dnAb sequencing) is an analysis process which uses tandem mass spectrometry (MS/MS) to obtain ions, and calculates the mass of amino acid residues on the peptide chain according to the mass difference between the two fragments of ions, thus obtaining the amino acid sequence of the protein. The biggest advantage of this technology is that it can analyze the sequence of known or unknown proteins without relying on any database. The samples were subjected to multi-enzymatic hydrolysis experiments, and the peptides after enzymatic hydrolysis were detected by MS. The mass spectrogram after data collection was analyzed by de novo algorithm to obtain the corresponding peptide sequence, and the amino acid sequence of the protein was obtained from the N-end to the C-end by comparing and splicing the peptide segments.
The analysis flow of antibody dnAb sequencing based on ms technology: firstly, the antibody protein to be sequenced is subjected to multiple enzyme digestion to ensure full coverage of light and heavy chain peptide sequences, and then the enzyme digestion peptide segments are analyzed by ms, and the complete antibody sequences are spliced by unbiased accurate sequence identification and sequence reverse verification of ms data.
At present, dnAb sequencing platform can be established by combining liquid chromatography with Obitrap Fusion high-resolution mass spectrometer, which ensures the sensitivity of identification of low-abundance peptide fragments. At the same time, the fragmentation mode of combining HCD with ETD is adopted to distinguish leucine/isoleucine isomers.
Step
1. Sample preparation and enzyme digestion
- Multi-enzyme digestion strategy: Use a variety of proteases (such as Trypsin, Chymotrypsin, Glu-C, Lys-C, etc.) to digest antibody samples to achieve 100% protein coverage and generate rich and highly overlapping peptides.
Peptide fragmentation in MS-based de novo sequencing (de Graaf SC et al., 2022).
2. Mass spectrometry data acquisition
- Ionization and fragmentation: various ion fragmentation technologies such as high energy collision dissociation (HCD) and electron transfer dissociation (ETD) are adopted to generate b/y ions and c/z ions.
3. Peptide splicing and sequence derivation
- Screening of peptide fragments: select the peptide fragments with higher kurtosis from the peptide fragments produced by multi-enzyme digestion for analysis.
- Sequence derivation: the peptide sequence was deduced by using the information of b/y ions generated by HCD and c/z ions generated by ETD.
- Overlapping peptide analysis: splicing the complete sequence through the information of overlapping peptide segments to ensure the continuity and accuracy of the sequence.
4. The distinction between leucine and isoleucine
- Leu and Ile have the same molecular mass, so it is difficult to distinguish them by traditional ms. It is necessary to combine ETD and HCD technology to generate W ions, and distinguish them by analyzing the mass difference of W ions (Leu lost 43 Da and Ile lost 29 Da). In the CDR of antibody, it is very important to accurately distinguish Leu/Ile for maintaining antigen binding affinity and specificity.
5. Molecular weight verification and sequence confirmation
- Molecular weight analysis: the accuracy of the sequence was verified by analyzing the complete molecular weight, reduction state and deglycosylation state of the antibody.
- Modification identification: detect common post-translation modifications (such as oxidation, deamidation, glycosylation, etc.) to ensure the integrity of the sequence.
- Database reconstruction: establish the dnAb sequencing results as a new database, analyze the coverage of the original ms data, and further confirm the sequence matching degree.
6. Final sequence assembly and annotation
- Sequence assembly: splicing the deduced peptide sequence into complete VH and VL)sequences.
- Functional Note: Mark variable region (CDR) and constant region (Fc), and analyze the functional characteristics of antibody (such as antigen binding site, affinity, etc.).
- Results Output: The final antibody sequence report was generated, including sequence information, modification sites and functional notes.
Select Service
Learn more
Application
Development of mAbs
DnAb sequencing can help to develop new monoclonal antibodies, especially for some complex or new targets (such as tumors, viruses, bacteria, etc.). By sequencing the antibody gene sequence in immune cells (such as B cells or plasma cells), researchers can quickly obtain the complete sequence of the target antibody, which provides a basis for the development of antibody drugs.
The discovery of most antibodies depends on B cell sequencing, but the corresponding relationship between B cells and circulating IgG antibodies is limited, which indicates that some key antibodies may be missed in B cell sequencing. Circulating IgG is the final product of humoral immune response, not the product of B cells themselves. Most IgG antibodies are produced by plasma cells in bone marrow, so it may not be possible to completely cover all antibody sequences by relying solely on B cell sequencing. Although the de novo protein polyclonal sequencing faces great challenges, it provides the potential for a comprehensive understanding of humoral immune response and can make up for the deficiency of B cell sequencing. The researchers extracted antibodies from blood samples and analyzed them by IgSeq (immunoglobulin sequencing) and de novo protein sequencing. Combined with ELISA and SPR test, the binding ability of antibody to SARS-CoV-2 antigen was verified, and the protective effect of antibody was confirmed by neutralization test. The peptide was analyzed by mass spectrometer, and the ambiguity of protein sequence was resolved by EThcD method, thus the structure of antibody was confirmed. RNA was extracted from PBMC samples, NGS sequencing was performed, antibody gene pool was analyzed, and immune response of different patients was analyzed by immunological detection method. IgSeq and ms data were integrated to optimize the antibody design and improve its antigen binding ability. The ability of antibody to recognize SARS-CoV-2 was verified by CDR region analysis and antigen specificity test, and the potential antibody was screened. In this study, protein omics and de novo protein sequencing were combined, and six new antibodies were found, whose affinity was similar to that of natural polyclonal antibodies. Antibodies in blood samples were analyzed by IgSeq and de novo protein sequencing technology, peptide fragments were generated by enzyme digestion, and the antibody sequence was determined by ms. Finally, it was compared with immunoglobulin gene bank, and the VH and VL parts of antibodies and their roles in antigen recognition were analyzed. Although the sample size is limited, the research results show that IgSeq antibody and de novo protein sequencing antibody are different in neutralization activity, but both have certain affinity. Combined with the verification of affinity and neutralization activity, the accuracy of antibody sequence was further confirmed (Le Bihan T et al., 2024).
There are functional VRC01 antibodies in B cell transcripts, which target the CD4 binding site on the gp120 of HIV-1 virus, have a wide range of neutralizing activity, and appear in multiple donors. The researchers first attempted to identify VRC01 class antibodies using a pedigree ranking method, but failed, probably because these antibodies were less frequent in C38 donors. In order to further confirm the VRC01 antibody, the authors used the dnAb sequencing technology to study the serum and PBMC infected with HIV-1. The VH of VRC01 antibody was selected and paired with 13 candidate light chains. Among these 13 light chains, 6 can form complete antibodies with VRC01 heavy chain, and the light chain named gVRC-L1dC38 showed weak neutralization effect on two HIV-1 isolates. Next, the researchers used the most effective C38 heavy chain gVRC-H3dC38 obtained from the cross-donor analysis and the only effective light chain gVRC-L1dC38 previously screened to continue screening. Eight new heavy / light chain pairings were found, and these antibodies showed a wide range of neutralizing effects, especially the gVRC-H3dC38 / gVRC-L1dC38 pairing. Compared with the pairing of VRC01 heavy chain, the pairing of C38 heavy chain and light chain shows a wider neutralization effect. The gVRC-L1dC38 is still the most effective C38 light chain, and the C38 heavy chain paired with this light chain shows significant neutralization activity. Through two rounds of screening, the optimal combination of heavy chain and light chain gVRC-H3dC38 / gVRC-L1dC38 showed the best neutralization effect (Zhu J et al., 2013).
Antibody diversity analysis
The diversity of antibodies produced by immune system can be comprehensively analyzed by dnAb sequencing, which helps to understand the differences of immune responses of different individuals or species.
Human leukocyte antigen (HLA), immunoglobulin (IG), T cell receptor (TCR) and killer cell immunoglobulin-like receptor (KIR) gene regions not only contain a large number of highly similar genes, but also are mixed with pseudogenes and repetitive elements, which leads to their complexity in gene variation analysis. Although HLA regions have been widely studied and related to many diseases, the complexity of immunoglobulin, TCR and KIR gene regions still makes it difficult to analyze them comprehensively in large-scale disease association research. In order to reveal the complexity of these genes in the immune system, the author used different sequencing platforms and technologies to generate an accurate genome assembly (ab initio) of HV31 in a healthy volunteer from multiple data sources, paying special attention to the immune system region. The researchers collected data from several complementary platforms, including PacBio Sequel II, MGI short reading and long sequencing, Bionano Saphyr optical mapping and other technologies, and provided a variety of data from CD14+ monocytes in peripheral blood mononuclear cells (PBMC) of HV31, aiming at minimizing the influence of cell-specific events such as V(D)J recombination of T cells and somatic hypermutation, and ensuring the accuracy of assembly. It is found that the assembly of most regions of HV31 genome is consistent with the verification data, especially in the regions with consistent copy numbers (such as some repetitive regions). However, in some regions, such as highly repetitive immune system-related regions such as IGK, IGL and HLA, there are great differences in the verification process. These differences indicate that there may be assembly errors or structural variations. HV31 has significant structural differences between haplotypes and compared with GRCh38 reference sequences, especially in the core immune system genes. By analyzing these differences in detail, the researchers made up for many big variations that the current methods failed to call accurately. Four gaps in the GRCh38 reference sequence of immunoglobulin κ region and T cell receptor γ region were also analyzed, and these gaps were successfully filled in the assembly (Zhang JY et al., 2021).
Creation and optimization of antibody library
In antibody screening and optimization, dnAb sequencing can help to analyze the antibody sequence in natural antibody library and help to construct antibody library for specific targets. It can also help to detect and modify the affinity and specificity of existing antibodies.
Antibodies participate in immune response through specific binding to a large number of antigens, and antibody library is formed by somatic recombination of V (variable), D (diversity) and J (linkage) germline gene fragments. Immunosequencing has become the main method to study antibody library and immune response. Most of the existing researches on immunogenomics rely on population-level germ-line genes, rather than individual-specific germ-line genes, which may lead to errors and omissions, especially for non-European people. In addition, it is difficult to accurately identify the variation of germline genes, especially when distinguishing SHM from undiscovered alleles. The incompleteness of IMGT database limits the monoclonal analysis of antibodies, especially in some non-model species such as camels or sharks. The lack of a complete germline gene database limits the application of antibody sequencing tools. The author developed a new algorithm IgScout of De nove to find tandem CDR3, and revealed the recombination of V (D) J. With IgScout tool, researchers successfully reconstructed 20 of 25 D genes, and identified 4 new allelic variants in the analysis, expanding the number of known variants of D genes. Although IgScout mainly focuses on the reconstruction of D gene, it also provides important insights into the reconstruction of V and J genes. I IgScout also revealed the existence of tandem CDR3. In the health data set, there were 1081 different D insertions in 1900 tandem CDR3 sequences, ranging from 0 to 153 nucleotides in length. The longest two inserted fragments (I1 and I2) are formed by genes D9 and D10, and there is a single nucleotide difference. The recombination event of I2 shows RSS jumping phenomenon, which leads to the series connection of D9, I2 and D10, forming a super-long CDR3. The research shows that the super-long CDR3 is formed by RSS skipping in the D gene, which challenges the traditional view of non-overlapping genes and reveals the phenomenon of nested D genes. Immunosequencing may miss these long CDR3, but long paired readings (such as 300nt) can capture them. It was found that most CDR3 in allergic data were IgM type, while about 60% in HIV data were IgG type, indicating that CDR3 may play an important role in immune response (Safonova Y et al., 2019).
Production of customized antibody
When there is a demand for antibodies against certain pathogens or diseases, dnAb sequencing can help to quickly obtain antibody sequences and synthesize them to meet the needs of research or treatment.
Cyt c is a small heme protein, which is located in mitochondria and has important functions in cells, such as electron transfer. Its conformation can undergo reversible changes due to oxidation, phosphorylation or other PTM, which may affect its function, such as the transfer of cyt c from mitochondria to cytoplasm during apoptosis. Previously, there was a monoclonal antibody (mAb 1D3) that recognized the conformational change of cytochrome C, but the hybridoma cells produced by it have been lost. In order to "revive" the antibody, the researchers determined the amino acid sequences of the heavy chain and light chain of the antibody by ms, and expressed them in eukaryotic cells by recombinant method. The recombinant antibody (R1D3) showed similar characteristics to the original antibody in antigen binding. The resurrected R1D3 was used to identify and study the conformations of oxidized cytochrome C, especially the formation of alternative conformations in cells. It can identify the conformational changes of cytochrome C under the influence of oxidation, aging or other stress conditions, and help reveal how these changes are related to cell function, especially the translocation of cytochrome C to the nucleus. DnAb sequencing not only helps to restore the original antibody function, but also expands its ability to recognize new protein forms (Tomasina F et al., 2022).
Vaccine development
In vaccine research, dnAb sequencing can help identify and design effective antibodies against specific pathogens.
MUC1 is a transmembrane mucin expressed on the surface of epithelial cells, which can form a barrier on the cell surface and resist the invasion of pathogens. The glycosylation pattern of MUC1 is different between normal tissues and cancer cells. Cancer cells are usually over-glycosylated and over-expressed, which makes MUC1 a potential target for cancer immunotherapy. Monoclonal antibody 139H2 is an anti-human breast cancer antibody, which has been applied to the diagnosis and treatment of MUC1 overexpression cancer. The sequence of full-length 139H2 IgG antibody was obtained by dnAb sequencing based on ms, and the pairing of heavy chain (IGHV1-53) and light chain (IGKV8-30) of 139H2 IgG antibody was confirmed. It was found that the CDR coverage of the antibody was 10 to 100aa, which indicated that the sequencing accuracy was high. In addition, there are some somatic high mutations in the CDR region of heavy chain and light chain, which are mainly concentrated in the framework region and CDRH2 flank. After codon optimization, the complete antibody domain was cloned into the expression vector containing mouse IgG1 heavy chain (with 8xHis tag) and κ light chain skeleton, and the purified 139H2 IgG mAb was obtained. The binding characteristics of the antibody to MUC1 were evaluated. The results showed that the antibody could recognize colon cancer cell HT29-MTX with high expression of MUC1, and the binding signal almost disappeared in MUC1 knockout cells, indicating that it had high specificity. Compared with Fab fragment, it was also found that the affinity of recombinant 139H2 antibody was slightly higher than that of monovalent Fab fragment, indicating that recombinant IgG antibody had higher affinity when it existed in bivalent form (Peng W et al., 2024).
References
- Le Bihan T, Nunez de Villavicencio Diaz T, Reitzel C, Lange V, Park M, Beadle E, Wu L, Jovic M, Dubois RM, Couzens AL, Duan J, Han X, Liu Q, Ma B. "De novo protein sequencing of antibodies for identification of neutralizing antibodies in human plasma post SARS-CoV-2 vaccination." Nat Commun. 2024 ;15(1):8790. doi: 10.1038/s41467-024-53105-8
- Zhang JY, Roberts H, Flores DSC, Cutler AJ, Brown AC, Whalley JP, Mielczarek O, Buck D, Lockstone H, Xella B, Oliver K, Corton C, Betteridge E, Bashford-Rogers R, Knight JC, Todd JA, Band G. "Using de novo assembly to identify structural variation of eight complex immune system gene regions." PLoS Comput Biol. 2021 ;17(8):e1009254. doi: 10.1371/journal.pcbi.1009254
- Safonova Y, Pevzner PA. "De novo Inference of Diversity Genes and Analysis of Non-canonical V(DD)J Recombination in Immunoglobulins." Front Immunol. 2019;10:987. doi: 10.3389/fimmu.2019.00987
- Zhu J, Wu X, Zhang B, McKee K, O'Dell S, Soto C, Zhou T, Casazza JP; NISC Comparative Sequencing Program; Mullikin JC, Kwong PD, Mascola JR, Shapiro L. "De novo identification of VRC01 class HIV-1-neutralizing antibodies by next-generation sequencing of B-cell transcripts." Proc Natl Acad Sci U S A. 2013 ;110(43):E4088-97. doi: 10.1073/pnas.1306262110
- Tomasina F, Martínez J, Zeida A, Chiribao ML, Demicheli V, Correa A, Quijano C, Castro L, Carnahan RH, Vinson P, Goff M, Cooper T, McDonald WH, Castellana N, Hannibal L, Morse PT, Wan J, Hüttemann M, Jemmerson R, Piacenza L, Radi R. "De novo sequencing and construction of a unique antibody for the recognition of alternative conformations of cytochrome c in cells." Proc Natl Acad Sci U S A. 2022 ;119(47):e2213432119. doi: 10.1073/pnas.2213432119
- Peng W, Giesbers KC, Šiborová M, Beugelink JW, Pronker MF, Schulte D, Hilkens J, Janssen BJ, Strijbis K, Snijder J. "Reverse-engineering the anti-MUC1 antibody 139H2 by mass spectrometry-based de novo sequencing." Life Sci Alliance. 2024;7(6):e202302366. doi: 10.26508/lsa.202302366