Molecular recognition feature

Molecular recognition features (MoRFs) are small (10-70 residues) intrinsically disordered regions in proteins that undergo a disorder-to-order transition upon binding to their partners. MoRFs are implicated in protein-protein interactions, which serve as the initial step in molecular recognition. MoRFs are disordered prior to binding to their partners, whereas they form a common 3D structure after interacting with their partners.^[1]^[2] As MoRF regions tend to resemble disordered proteins with some characteristics of ordered proteins,^[2] they can be classified as existing in an extended semi-disordered state.^[3]

Categorization

MoRFs can be separated in 4 categories according to the shape they form once bound to their partners.^[2]

The categories are:

α-MoRFs (when they form alpha-helixes)
β-MoRFs (when they form beta-sheets)
irregular-MoRFs (when they don't form any shape)
complex-MoRFs (combination of the above categories)

MoRF predictors

Determining protein structures experimentally is a very time-consuming and expensive process. Therefore, recent years have seen a focus on computational methods for predicting protein structure and structural characteristics. Some aspects of protein structure, such as secondary structure and intrinsic disorder, have benefited greatly from applications of deep learning on an abundance of annotated data. However, computational prediction of MoRF regions remains a challenging task due to the limited availability of annotated data and the rarity of the MoRF class itself.^[4] Most current methods have been trained and benchmarked on the sets released by the authors of MoRFPred^[5] in 2012, as well as another set released by the authors of MoRFChibi^[6]^[7]^[8] based on experimentally-annotated MoRF data. The table below details some methods available as of 2019 for MoRF prediction (related problems are also touched upon).^[9]


Predictor	Year Published	Predicts for	Methodology	Uses MSA
ANCHOR Archived 2009-10-23 at the Wayback Machine^[10]	2009	Protein Binding Regions	Amino acid propensity and energy estimation analysis.	N
ANCHOR2^[11]	2018	Protein Binding Regions	Amino acid propensity and energy estimation analysis.	N
DISOPRED3^[12]	2015	Protein Intrinsic Disorder and Protein Binding Sites	Multistage component prediction (utilizing neural network, Support Vector Machine, and K-nearest neighbour models) for protein disorder prediction. Also uses an additional Support Vector Machine to interpolate binding regions from the disorder predictions.	Y
DisoRDPbind^[13]	2015	RNA, DNA, and Protein Binding Regions	Multiple logistic regression models based on predicted disorder, amino acid properties, and sequence composition. The result is aligned with transferred annotations from a functionally-annotated database.	N
fMoRFPred^[4]	2016	MoRFs	Faster version of MoRFPred without the use of multiple sequence alignments.	N
MoRFchibi SYSTEM^[6]^[7]^[8]	2015	MoRFs	Hierarchy of different in-house MoRF prediction models: MoRFchibi: Utilizes Bayes rule to combine the outcomes of two support Vector Machine modules using amino acid composition (Sigmoid kernel) and sequence similarity (RBF kernel). MoRFchibi_light: Utilizes Bayes rule to combine MoRFchibi and disorder prediction hierarchically. MoRFchibi_web: Utilizes Bayes rule to combine MoRFchibi, disorder prediction and PSSM (MSA) hierarchically.	N/Y
MoRFPred^[5]	2012	MoRFs	Support Vector Machine based on predicted sequence characteristics and alignment of input sequence to known MoRF database.	Y
MoRFPred-Plus^[14]	2018	MoRFs	Combined predictions from two Support Vector Machines, predicting for both MoRF regions and MoRF residues.	Y
OPAL^[15]	2018	MoRFs	Support Vector Machine based on physicochemical properties and predicted structural attributes of protein residues	Y
OPAL+^[16]	2019	MoRFs	Ensemble of Support Vector Machines trained individually for length-specific MoRF regions. Also incorporates other predictors as a metapredictor.	Y
SPINE-D^[17]^[18]	2012	Protein Intrinsic Disorder and Semi-Disorder	Neural network for predicting both long and short disordered regions. Semi-disorder can be linearly interpolated from its predicted disorder probabilities (0.4<=P(D)<=0.7).	Y
SPOT-Disorder^[19]	2017	Protein Intrinsic Disorder and Semi-Disorder	Bidirectional Long Short-Term Memory network for predicting intrinsic disorder. Semi-disordered regions can be linearly interpolated from its predicted disorder probabilities (0.28<=P(D)<=0.69).	Y
SPOT-MoRF^[20]	2019	MoRFs	Transfer learning from the large disorder prediction tool SPOT-Disorder2^[21] (which itself utilizes an ensemble of Bidirectional Long Short-Term Memory networks and Inception ResNets).	Y