Antibody sequence analysis is an important technology in modern immunology and biomedicine, which aims to reveal the structure, function and interaction between antibodies and antigens. Antibody is composed of heavy chain and light chain, and has the characteristic of variable region binding to antigen. This analysis helps researchers to deeply understand the diversity, affinity and binding characteristics of antibodies through high-throughput sequencing, comparison annotation and computer modeling. Its applications include vaccine design, development of therapeutic antibodies, antibody affinity optimization, immune escape mechanism research and antibody diversity exploration, which has promoted the progress in the fields of disease diagnosis, vaccine development and immunotherapy. The following are the key aspects of antibody sequence analysis :
Antibody structure
- Antibodies consist of two Heavy chain and two Light chain, and each chain consists of a Variable region and a Constant region. Variable region is the key region for antibody to recognize antigen, while constant region determines the effector function of antibody.
V(D)J rearrangement plays a key role in the immune system of mammals, which can produce diversified TCR and BCR through gene rearrangement, thus responding to various antigens. In this paper, high-throughput sequencing (HTS) and single cell RNA sequencing (scRNA-seq) techniques were used to analyze the rearrangement data of TCRβ CDR3 libraries from different species (such as humans and mice), and the frequency of V and J genes and the relationship between these frequencies and RSS quality and gene distance were discussed. The rearrangement frequency of distal and proximal V gene (or J gene) and D gene is significantly higher than that of intermediate V gene (or J gene). This phenomenon is common in different species, including primates (such as human monkeys and rhesus monkeys), rodents (such as BALB/c, C57BL/6 and Kunming mice), Artiodactyla (buffalo) and Rhinolophus affinis. The quality of recombinant signal sequence (RSS) of V and J genes directly affects their rearrangement frequency. Especially, when the RIC score of V-RSS is lower than -45, the rearrangement frequency of V gene is significantly reduced. V and J genes far away from D gene have good RSS quality and accessibility of recombinant structure, which makes them easier to be selected first in the rearrangement process. In the process of mammalian evolution, D-J distance is significantly shorter than V-D distance, and the quality of J-RSS is also significantly better than V-RSS. This may be one of the mechanisms to maintain the rule that "D-to-J precedes V-to-DJ" and ensure that D gene and J gene are rearranged first, thus forming an effective rearrangement mode of T cell receptor (TCR). The RSS quality, gene location and relative distance of V, D and J genes jointly determine the efficiency of rearrangement. V and J genes, which are far away from D gene, have the advantage of priority utilization in rearrangement because of their high quality RSS and better accessibility (Wu Y et al.,2024).
Analysis of VDJ rearrangement efficiency in several animals (Wu Y et al., 2024).
- VH and VL: In the variable region of antibody, the sequence of VH and VL determines the specificity of antibody, which can recognize and bind different antigens.
BCR consists of Igh and Igκ or Igλ. V, D and J gene fragments of heavy chain and light chain genes generate functional BCR through complex rearrangement process. During the development of B cells, the D-J rearrangement of Igh heavy chain occurs first, then the V-D-J rearrangement, and then the V-J rearrangement of light chain genes, finally forming a functional BCR. The rearrangement of Igh and Igκ loci is significantly different. Although V, D and J gene fragments will be rearranged at different frequencies, the mechanism behind this unequal V gene rearrangement is not fully understood. Recent studies show that the newly described enhancer in V gene region plays an important role in regulating the recombination frequency of V gene. The study on the enhancers of Igκ locus shows that the three enhancers located near Jκ and Cκ genes play an important role in regulating the rearrangement of Igκ genes. The deletion of these enhancers will lead to the reduction of rearrangement and the decrease of the number of B cells. The newly discovered enhancers in V region mainly affect the diversity of antibody library by regulating V gene rearrangement, but these enhancers have little effect on the total number of B cells or the overall level of rearrangement (Barajas-Mora EM et al.,2023).
- CDR (Complementarity Determining Regions): CDR is a highly variable part of antibody variable region, and it is the key region of interaction between antibody and antigen. CDR usually includes three pairs of regions, namely CDR1, CDR2 and CDR3. CDR3 is particularly critical and usually contains the main information for identifying antigens.
Designed antibodies against bee hyaluronidase according to the CDR region (Adolf-Bryfogle J et al., 2018).
Antibodies interact with protein surface through aromatic side chains and polar residues in CDR. The specific contact of aromatic residues at the interface is considered to be the main determinant of the specificity and affinity of antibody-protein binding. Among them, aromatic CDR residues form stereospecific contact through the interaction with skeleton atoms and side chain carbon of protein when antibodies recognize protein. Polar CDR residues interact with protein through direct or water-mediated hydrogen bonds. Water-mediated hydrogen bonding is more extensive than direct hydrogen bonding, but because the donor and acceptor of hydrogen bonding can exchange with water molecules, this type of interface is not specific enough, so it contributes less to the selection of epitope positions. Studies show that water-mediated hydrogen bonding is more common than direct hydrogen bonding, and these polar interactions are helpful for antibodies to recognize homologous protein antigens. In S88 data set, the optimization level of hydrogen bonds plays a more important role than geometric complementarity. Compared with the randomly generated S880 data set, it is found that the interface polarity interaction in the natural antibody-protein complex is better, which can effectively compensate the influence of the desolvation of the antibody-protein interface. These optimizations not only enhance the interaction between antibody and protein, but also maintain chemical complementarity, especially in the contact of polar and aromatic amino acids. Although the diversity of protein sequences and surface shapes is almost infinite, the limited sequence and structural changes of natural antibodies enable them to recognize almost all protein antigens. This universality is achieved through the specific interaction between aromatic side chains and protein surface. CDR region of natural antibody can recognize a wide range of protein antigens through its limited amino acid types. This is because the specific interaction supported by aromatic and polar amino acids can cross different protein antigen surfaces and form effective antibody-protein complexes (Peng HP et al., 2022).
Diversity of antibody sequences
The diversity of antibodies comes from the V(D)J rearrangement process, which will lead to different antibody variations. By analyzing the antibody sequence, researchers can reveal the source of its diversity and understand the mechanism of its immune response. The analysis of antibody library (such as phage display library) is also to search for antibodies with specificity and high affinity by screening a large number of antibody sequences.
Mycobacterium tuberculosis is the pathogen of tuberculosis and has a synergistic effect with other infectious factors (such as HIV). Studies have shown that the complexity of Mtb infection makes the diagnosis and treatment of tuberculosis difficult. The adaptive mechanism of Mtb in the host includes that Mtb can escape the immune attack of host macrophages. Some known virulence factors, such as KatG, SodA, GroES and PstS1, were studied, which are closely related to the survival and pathogenicity of Mtb in the host. In order to develop possible therapeutic antibodies, the researchers deciphered the sequence of IgV region corresponding to Mtb-related virulence factors (SodA, KatG, GroES and PstS1). The rapid amplification of cDNA terminal (RACE-PCR) technique was used to amplify the CDR and framework region of IgV region, and combined with NGS technique, all potential IgV sequences (IgV from hybridomas PhoS1/PstS1NRC-2410, SodANRC-13810, KatGNRC-49680 and Groesnrc-2410) were systematically identified. The amplified IgV sequence was analyzed, and the abnormal IgV chain was identified and eliminated. These IgV regions are composed of heavy chain (IgVH) and light chain (IgVL), and the recognition specificity of antigen is determined by complementarity determining region (CDR). Subsequently, these sequences were verified by cloning, Sanger sequencing and bioinformatics analysis. The binding activity of recombinant IgV fragments with corresponding Mtb antigens (such as Mtb-SodA, Mtb-KatG, Mtb-GroES and Mtb-PstS1) was indicated, and it was proved that these recombinant antibodies had the same antigen binding activity as the antibodies secreted from the parent hybridoma (Foreman HC et al., 2021).
Antibody sequence alignment
One of the commonly used techniques in antibody sequence analysis is Sequence Alignment. By comparing different antibody sequences, researchers can reveal the evolutionary relationship of antibody families, the conservation and variability of functional regions. (1) Full-length sequence alignment: Comparing the similarity of two or more antibody sequences. (2) Local comparison: it is usually used to compare CDR regions or some specific antibody functional regions. The comparison results can help researchers identify specific antibody epitope, that is, regions where antibodies recognize antigens.
Yvis database provides a large amount of structural data for antibody research, which is updated once a week, involving antibody and antigen information from different organisms. The database contains 3423 antibody structures from 22 different antibody-producing organisms, involving 155 different protein antigens, 184 hapten structures, 22 nucleic acid structures and 106 carbohydrate structures. The database also provides detailed organism information, and supports searching and filtering based on conditions such as antibody-producing organism, antigen type, antigen-producing organism and gene. Using Yvis database to analyze HIV antibodies and observe the contact characteristics of different antibodies with gp120. For example, analysis shows that neutralizing antibodies often have a long CDRH3 region, and some conservative positions (such as 8, 22, 119, 121) are closely related to the interaction of antigens. The antibody heavy chain sequence was extracted from the transcription sequence of HIV-infected people, and the analysis range was limited by strain filter. In this process, amino acid changes at specific positions (such as 36, 66, 92, 93 and 95) were found, revealing the variation and evolution of antibodies. HIV neutralizing antibodies were selected from specific alleles (such as VH1-2*02) for comparison. It was found that the changes of some amino acids in neutralizing antibodies may be closely related to the antigen binding characteristics of antibodies, especially in CDR2 and frame region 3 (Carvalho MB et al., 2019).
BCR is an important part of the immune system, which enhances antibody affinity by recognizing antigens and evolving. BCR library analysis can help researchers understand the diversity, evolution and specificity of BCR to antigens, thus providing an important tool for studying infectious diseases, allergies, vaccination, autoimmune diseases and so on. Although there are many BCR sequencing analysis tools, there are still bottlenecks in the current multiple sequence alignment (MSA) method, especially when dealing with a large number of BCR data, which faces the challenges of long calculation time, large memory consumption and data diversity. The author puts forward a new method-Abalign, which is an MSA tool specially designed for BCR, and adopts a reference MSA based on 3D structure, which is constructed by using 1800 antibody structures in SAbDab database. Abalign uses antibody number information for comparison instead of relying on traditional guide tree, which greatly improves the efficiency and quality of comparison. The SP and TC scores of Abalign in the heavy chain are 0.955 and 0.653, respectively, and the SP and TC scores in the light chain are 0.976 and 0.889, respectively, showing excellent alignment effect. Compared with other MSA tools, Abalign's performance is comparable to that of MUSCLE and ClustalO, even better in some indicators, while in comparison with MAFFT, Abalign is significantly better than MAFFT. Abalign can accurately align highly conserved motifs, while clearly maintaining a clear boundary between FR (Frame Segment) and CDR (Complementarity Determination Segment). Abalign's ultra-fast MSA function can quickly process a large number of BCR sequencing data, which is especially suitable for immunological research that requires high computing performance, and it avoids relying on high-performance computing clusters. Abalign can be consistent with the mature antibody numbering scheme, ensuring that its results are seamlessly connected with the existing BCR data analysis methods. This feature makes it easier for users to integrate data from different sources (Zong F et al., 2023).
Affinity Maturation
Affinity maturation of antibody means that antibody can improve immune effect by gradually increasing its affinity to antigen during immune response. By analyzing the mutation spectrum and selection pressure of antibodies, we can identify which mutations contribute to the affinity enhancement of antibodies.
Through a method called "library pruning", the affinity of antibodies can be effectively improved without complicated post-immune selection process. By selecting and pruning the B cell population, mAb infusion can increase the proportion of high affinity antibodies, especially in CD4bs-specific B cells, and avoid the competition of low affinity B cells during mAb infusion, thus enhancing the effect of immune response. Although mAb infusion has inhibited the overall affinity maturation, a small number of B cells can compete with high affinity through accurate immunization strategies, thus improving the affinity of antibodies. These high affinity B cells also showed more variation, which further enhanced their affinity and selection efficiency. In addition, the study also showed that the infusion of mAb may reduce the antigen binding of CD4bs-specific B cells, and promote the low-affinity cells to quit the immune response, thus allowing the high-affinity cells to dominate the response. Infusion of high affinity mAb will reduce somatic hypermutation (SHM) and affinity in most CD4bs-specific B cells. This indicates that these antibodies may affect the response of germinal center by selectively inhibiting the recruitment of most B cells, and inhibit the selection of competitive B cells. Although most B cells were inhibited, in some specific B cell subsets, somatic hypermutation and affinity increased significantly. This shows that for these specific B cell subsets, antibody infusion can promote the maturity and optimization of their affinity by increasing the selection pressure. High-throughput sequencing analysis showed that CD4bs-specific plasma cells showed higher SHM after antibody infusion, and the phylogenetic tree topology showed that the differentiation speed of these plasma cells was accelerated. This further supports the conclusion that antibody infusion can promote the rapid affinity maturation of some B cells (Thomas P et al., 2024).
Prediction and design of antibody sequence
With the development of bioinformatics, more and more tools can help researchers predict the structure, function and interaction of antibodies with antigens. For example: (1) Prediction of antibody structure: Based on the amino acid sequence of antibody, the three-dimensional structure of antibody is predicted by calculation method, which is helpful to understand its binding characteristics. (2) Antibody optimization: Optimize the affinity, specificity and stability of antibodies through engineering methods, such as "humanization" and affinity maturation.
With the development of high-throughput sequencing technology, it is easier to obtain a large number of antibody sequences. However, due to the lack of structural information, it still takes a lot of time and resources to develop antibody-based diagnosis and treatment methods. Therefore, the application of computational methods, especially the prediction of antibody structure and function from antibody sequences by using deep learning and natural language processing technology, can significantly improve the efficiency of antibody development. Antibody structure prediction methods based on deep learning, such as AlphaFold2 and IgFold, have certain potential in antibody structure prediction. In this paper, a bionic antibody language model (BALM) is proposed. BALM integrates the position information of antibody into position embedding, and uses adaptive masking strategy to accurately capture biological characteristics. By using 336 million unlabeled antibody sequences and 150 million model parameters from OAS data set for training, BALM successfully captured the biological characteristics of antibody sequences by using transformer self-attention mechanism and combining with a new antibody position coding method to infer its binding function. BALMFold, an end-to-end atomic structure prediction algorithm based on pre-trained BALM model, is developed, which can accurately predict 3D structure on a single antibody sequence. In order to overcome the problems of limited homology of antibodies and scarcity of structural templates, BALMFold combines the antibody structural data from SAbDab, and provides efficient structural prediction through the cooperative work of BAformer module and structure module. BALM reveals the process of antibody affinity maturation by learning the mutation trajectory in antibody sequence, and can predict many characteristics of antibody, such as antigen binding ability, binding site, immune redundancy and affinity maturation. It provides accurate antibody function prediction and supports antibody development and optimization through multi-task learning algorithm. Compared with other models, BALM performed well in predicting antibody specificity and affinity. It can also effectively capture the diversity and redundancy of antibodies, enhance immune response and surpass the accuracy of other models in affinity prediction (Jing H et al., 2024).
Select Service
Learn more
Antibody analysis tool
Antibody analysis tools are widely used in antibody design, optimization, prediction and functional evaluation. The following are some common antibody analysis tools:
- IgBLAST: Extract immunoglobulin gene fragments (such as V, D and J regions) from antibody sequences and make comparative analysis to help users identify the gene source and structure of antibodies.
- AbYsis: It provides an antibody sequence database and a variety of analytical tools, which can help users predict the affinity, structure and binding ability of antibodies to antigens.
- Rosetta Antibody Design: use Rosetta software to predict and design the structure of antibodies, and support the tasks of antibody optimization and affinity improvement.
- Pigs (protein interaction & gene sequence): used to analyze the interaction between antibodies and other protein. It support that prediction of binding site between antibody and antigen.
- Docking Tools (such as HADDOCK, ClusPro): used to simulate molecular docking between antibodies and antigens, and to predict affinity and binding sites by calculating antibody-antigen interaction.
- SAbPred: an online tool for predicting the three-dimensional structure of antibodies and their binding affinity to antigens.
- DeepAb: A tool based on deep learning, which is used to predict the structure of antibody sequence and evaluate the antigen binding ability.
- Antibody Studio: provides the analysis function of antibody sequence, structure and affinity, and supports antibody design and optimization.
- AbaTools: focuses on the prediction of antibody epitopes and the analysis of antigen-antibody binding sites, and supports the identification and design of functional epitopes.
- BepiPred: A special tool for predicting B cell epitopes, which can predict the immunogenic region of antigens.
References
- Wu Y, Wu F, Ma Q, Li J, Ma L, Zhou H, Gong Y, Yao X. "HTS and scRNA-seq revealed that the location and RSS quality of the mammalian TRBV and TRBJ genes impact biased rearrangement." BMC Genomics. 2024;25(1):1010. doi: 10.1186/s12864-024-10887-x
- Barajas-Mora EM, Feeney AJ. "Enhancers within the Ig V Gene Region Orchestrate Chromatin Topology and Regulate V Gene Rearrangement Frequency to Shape the B Cell Receptor Repertoire Specificities." J Immunol. 2023;211(11):1613-1622. doi: 10.4049/jimmunol.2300261
- Peng HP, Hsu HJ, Yu CM, Hung FH, Tung CP, Huang YC, Chen CY, Tsai PH, Yang AS. "Antibody CDR amino acids underlying the functionality of antibody repertoires in recognizing diverse protein antigens." Sci Rep. 2022;12(1):12555. doi: 10.1038/s41598-022-16841-9
- Foreman HC, Frank A, Stedman TT. "Determination of variable region sequences from hybridoma immunoglobulins that target Mycobacterium tuberculosis virulence factors." PLoS One. 2021 ;16(8):e0256079. doi: 10.1371/journal.pone.0256079
- Carvalho MB, Molina F, Felicori LF. "Yvis: antibody high-density alignment visualization and analysis platform with an integrated database." Nucleic Acids Res. 2019;47(W1):W490-W495. doi: 10.1093/nar/gkz387
- Zong F, Long C, Hu W, Chen S, Dai W, Xiao ZX, Cao Y. "Abalign: a comprehensive multiple sequence alignment platform for B-cell receptor immune repertoires." Nucleic Acids Res. 2023;51(W1):W17-W24. doi: 10.1093/nar/gkad400
- Thomas P, Rees-Spear C, Griffith S, Muir L, Touizer E, Andrabi R, Priest R, Percival-Alwyn J, Hayward D, Buxton A, Traylen W, Chain B, Wattam T, Nandin IS, McCoy LE. "High affinity mAb infusion can enhance maximum affinity maturation during HIV Env immunization." iScience. 2024 ;27(4):109495. doi: 10.1016/j.isci.2024.109495
- Jing H, Gao Z, Xu S, Shen T, Peng Z, He S, You T, Ye S, Lin W, Sun S. "Accurate prediction of antibody function and structure using bio-inspired antibody language model." Brief Bioinform. 2024;25(4):bbae245.doi: 10.1093/bib/bbae245