Kolmogorov-Arnold Networks

Kolmogorov–Arnold Networks (KANs) are a type of artificial neural network architecture inspired by the Kolmogorov–Arnold representation theorem, also known as the superposition theorem. Unlike traditional multilayer perceptrons (MLPs), which rely on fixed activation functions and linear weights, KANs replace each weight with a learnable univariate function, often represented using splines.^[1]^[2]^[3]

History

KANs (Kolmogorov-Arnold Networks) were proposed by Liu et al. (2024)^[4] as a generalization of the Kolmogorov–Arnold representation theorem (KART), with additional neurons and layers, aiming to outperform MLPs in small-scale AI and scientific tasks. Before KANs, numerous studies explored KART’s connections to neural networks or used it as a basis for designing new network architectures.

In the 1980s and 1990s, early research applied KART to neural network design. Kůrková et al. (1992)^[5], Hecht-Nielsen (1987)^[6], and Nees (1994)^[7] established theoretical foundations for multilayer networks based on KART. Igelnik et al. (2003)^[8] introduced the Kolmogorov Spline Network using cubic splines to model complex functions. Sprecher (1996, 1997)^[9]^[10] developed numerical methods for constructing network layers, while Nakamura et al. (1993)^[11] designed activation functions with guaranteed approximation accuracy. These efforts bridged KART’s theoretical potential with practical neural network implementation.

KART has also been applied in other computational and theoretical domains. Coppejans (2004)^[12] developed nonparametric regression estimators using B-splines, Bryant (2008)^[13] applied it to high-dimensional image tasks, Liu (2015)^[14] explored theoretical applications in optimal transport and image encryption, and more recently, Polar and Poluektov (2021)^[15] used Urysohn operators for efficient KART construction, while Fakhoury et al. (2022)^[16] introduced ExSpliNet, integrating KART with probabilistic trees and multivariate B-splines for enhanced function approximation.

Architecture

KANs are based on the Kolmogorov–Arnold representation theorem, which was linked to the 13th Hilbert problem.^[17]^[18]^[19]

Given $x=(x_{1},x_{2},\dots ,x_{n})$ consisting of n variables, a multivariate continuous function $f(x)$ can be represented as:

f(x)=f(x_{1},\dots ,x_{n})=\sum _{q=1}^{2n+1}\Phi _{q}\left(\sum _{p=1}^{n}\varphi _{q,p}(x_{p})\right)

(1)

This formulation contains two nested summations: an outer and an inner sum. The outer sum $\sum _{q=1}^{2n+1}$ aggregates $2n+1$ terms, each involving a function $\Phi _{q}:\mathbb {R} \to \mathbb {R}$ . The inner sum $\sum _{p=1}^{n}$ computes n terms for each q, where each term $\varphi _{q,p}:[0,1]\to \mathbb {R}$ is a continuous function of the single variable $x_{p}$ . The inner continuous functions $\varphi _{q,p}$ are universal, independent of $f$ , while the outer functions $\Phi _{q}$ depend on the specific function $f$ being represented. The representation (1) holds for all multivariate functions $f$ . If $f$ is continuous, then the outer functions $\Phi _{q}$ are continuous; if $f$ is discontinuous, then the corresponding $\Phi _{q}$ are generally discontinuous, while the inner functions $\varphi _{q,p}$ remain the same universal functions.^[19]

Liu et al.^[1] proposed the name KAN. A general KAN network consisting of L layers takes x to generate the output as:

\mathrm {KAN} (x)=(\Phi ^{L-1}\circ \Phi ^{L-2}\circ \cdots \circ \Phi ^{1}\circ \Phi ^{0})x

(3)

Here, $\Phi ^{l}$ is the function matrix of the l-th KAN layer or a set of pre-activations.

Let i denote the neuron of the l-th layer and j the neuron of the (l+1)-th layer. The activation function $\varphi _{j,i}^{l}$ connects (l, i) to (l+1, j):

\varphi _{j,i}^{l},\quad l=0,\dots ,L-1,\;i=1,\dots ,n_{l},\;j=1,\dots ,n_{l+1}

(4)

where n_l is the number of nodes of the l-th layer.

Thus, the function matrix $\Phi ^{l}$ can be represented as an $n_{l+1}\times n_{l}$ matrix of activations:

x^{l+1}={\begin{pmatrix}\varphi _{1,1}^{l}(\cdot )&\varphi _{1,2}^{l}(\cdot )&\cdots &\varphi _{1,n_{l}}^{l}(\cdot )\\\varphi _{2,1}^{l}(\cdot )&\varphi _{2,2}^{l}(\cdot )&\cdots &\varphi _{2,n_{l}}^{l}(\cdot )\\\vdots &\vdots &\ddots &\vdots \\\varphi _{n_{l+1},1}^{l}(\cdot )&\varphi _{n_{l+1},2}^{l}(\cdot )&\cdots &\varphi _{n_{l+1},n_{l}}^{l}(\cdot )\end{pmatrix}}

Functions used in KAN

The choice of functional basis strongly influences the performance of KANs. Common function families include:

B-splines: Provide locality, smoothness, and interpretability; they are the most widely used in current implementations.^[3]

RBFs: Capture localized features in data and are effective in approximating functions with non-linear or clustered structures.^[3]^[20]

Chebyshev polynomials: Offer efficient approximation with minimized error in the maximum norm, making them useful for stable function representation.^[3]^[21]

Rational functions: Useful for approximating functions with singularities or sharp variations, as they can model asymptotic behavior better than polynomials.^[3]^[22]

Fourier series: Capture periodic patterns effectively and are particularly useful in domains such as physics-informed machine learning.^[3]^[23]

Wavelet functions (DoG, Mexican hat, Morlet, Shannon): Used for feature extraction as they can capture both high-frequency and low-frequency data components.^[3]^[24]^[25]

Piecewise linear functions: Provide efficient approximation for multivariate functions in KANs.^[26]^[27]

Usage

KANs are usually employed as drop-in replacements for MLP layers in modern neural architectures such as convolutional neural networks (CNNs), recurrent neural networks (RNNs), and Transformers. While KANs are designed for general purposes, scientists have developed and applied them to a variety of tasks:

Science Machine Learning (SciML): Function fitting^[1], partial differential equations (PDEs)^[1]^[28]^[29] and physical/mathematical laws.^[2]

Continual learning: KANs better preserve previously learned information during incremental updates, avoiding catastrophic forgetting due to the locality of spline adjustments.^[2]^[30]

Graph neural networks: Extensions such as Kolmogorov–Arnold Graph Neural Networks (KA-GNNs) integrate KAN modules into message-passing architectures, showing improvements in molecular property prediction tasks.^[3]^[31]^[32]

Drawbacks of KAN

KANs can be computationally intensive and require a large number of parameters due to their use of polynomial functions to capture data.^[33]^[34]

References

^ ^a ^b ^c ^d Liu, Ziming; Tegmark, Max (2024). "KAN: Kolmogorov–Arnold Networks". arXiv:2404.19756 [cs.LG].
^ ^a ^b ^c Liu, Ziming; Ma, Pingchuan; Wang, Yilun; Matusik, Wojciech; Tegmark, Max (2024). "KAN 2.0: Kolmogorov–Arnold Networks Meet Science". arXiv:2408.10205 [cs.LG].
^ ^a ^b ^c ^d ^e ^f ^g ^h Somvanshi, S.; Javed, S. A.; Islam, M. M.; Pandit, D.; Das, S. (2024). "A Survey on Kolmogorov-Arnold Network". ACM Computing Surveys.
^ Liu, Z., Wang, Y., Vaidya, S., Ruehle, F., Halverson, J., Soljačić, M., ... & Tegmark, M. (2024). Kan: Kolmogorov-Arnold networks. arXiv preprint arXiv:2404.19756.
^ Kůrková, V. (1992). Kolmogorov's theorem and multilayer neural networks. Neural Networks, 5(3), 501-506.
^ Hecht-Nielsen, R. (1987, June). Kolmogorov’s mapping neural network existence theorem. In Proceedings of the International Conference on Neural Networks (Vol. 3, pp. 11-14). New York, NY, USA: IEEE Press.
^ Nees, M. (1994). Approximative versions of Kolmogorov's superposition theorem, proved constructively. Journal of Computational and Applied Mathematics, 54(2), 239-250.
^ Igelnik, B., & Parikh, N. (2003). Kolmogorov's spline network. IEEE Transactions on Neural Networks, 14(4), 725-733.
^ Sprecher, D. A. (1996). A numerical implementation of Kolmogorov's superpositions. Neural Networks, 9(5), 765-772.
^ Sprecher, D. A. (1997). A numerical implementation of Kolmogorov's superpositions II. Neural Networks, 10(3), 447-457.
^ Nakamura, M., Mines, R., & Kreinovich, V. (1993). Guaranteed intervals for Kolmogorov’s theorem (and their possible relation to neural networks). Interval Computations, 3, 183-199.
^ Coppejans, M. (2004). On Kolmogorov's representation of functions of several variables by functions of one variable. Journal of Econometrics, 123(1), 1-31.
^ Bryant, D. W. (2008). Analysis of Kolmogorov's superposition theorem and its implementation in applications with low and high dimensional data. University of Central Florida.
^ Liu, X. (2015). Kolmogorov superposition theorem and its applications (Doctoral dissertation, Imperial College London).
^ Polar, A., & Poluektov, M. (2021). A deep machine learning algorithm for construction of the Kolmogorov–Arnold representation. Engineering Applications of Artificial Intelligence, 99, 104137.
^ Fakhoury, D., Fakhoury, E., & Speleers, H. (2022). ExSpliNet: An interpretable and expressive spline-based neural network. Neural Networks, 152, 332-346.
^ Kolmogorov, A. N. (1963). "On the representation of continuous functions of many variables by superposition of continuous functions of one variable and addition". Translations of the American Mathematical Society. 2 (28): 55–59.
^ Schmidt-Hieber, Johannes (2021). "The Kolmogorov–Arnold representation theorem revisited". Neural Networks. 137: 119–126. doi:10.1016/j.neunet.2021.01.020. PMID 33592434.
^ ^a ^b Ismayilova, Aysu; Ismailov, Vugar (August 2024). "On the Kolmogorov Neural Networks". Neural Networks. 176 (Article 106333). arXiv:2311.00049. doi:10.1016/j.neunet.2024.106333. PMID 38688072.
^ Ta, H. T. (2024). "BSRBF-KAN: a combination of B-splines and radial basis functions in Kolmogorov-Arnold networks". Proceedings of the International Symposium on Information and Communication Technology. Singapore: Springer Nature Singapore. pp. 3–15.
^ Guo, Chunyu; Sun, Lucheng; Li, Shilong; Yuan, Zelong; Wang, Chao (2025). "Physics-informed Kolmogorov–Arnold network with Chebyshev polynomials for fluid mechanics". Physics of Fluids. 37 (9) 095120. doi:10.1063/5.0284999.
^ Aghaei, Amirmojtaba A. (2024). "RKAN: Rational Kolmogorov-Arnold Networks". arXiv:2406.14495 [cs.LG].
^ Liang, J.; Mu, L.; Fang, C. (2025). "Topology Identification of Distribution Network Based on Fourier Kolmogorov–Arnold Networks". IEEJ Transactions on Electrical and Electronic Engineering. 20 (10): 1579–1588. doi:10.1002/tee.70031.
^ Song, Y.; Zhang, H.; Man, J.; Jin, X.; Li, Q. (2025). "AWKNet: A Lightweight Neural Network for Motor Imagery Electroencephalogram Classification Based on Adaptive Wavelet Transform Kolmogorov–Arnold Networks". IEEE Transactions on Consumer Electronics. 71 (1): 1. doi:10.1109/TCE.2025.3540970.
^ Bozorgasl, Z., & Chen, H. Wav-kan: Wavelet Kolmogorov-Arnold networks, 2024. arXiv preprint arXiv:2405.12832.
^ Polar, A.; Poluektov, M. (2021-03-01). "A deep machine learning algorithm for construction of the Kolmogorov–Arnold representation". Engineering Applications of Artificial Intelligence. 99 104137. doi:10.1016/j.engappai.2020.104137. ISSN 0952-1976.
^ Poluektov, Michael; Polar, Andrew (2025-07-11). "Construction of the Kolmogorov-Arnold networks using the Newton-Kaczmarz method". Machine Learning. 114 (8): 185. doi:10.1007/s10994-025-06800-6. ISSN 1573-0565.
^ Zhang, Z.; Wang, Q.; Zhang, Y.; Shen, T.; Zhang, W. (2025). "Physics-informed neural networks with hybrid Kolmogorov–Arnold network and augmented Lagrangian function for solving partial differential equations". Scientific Reports. 15 (1): 10523. doi:10.1038/s41598-025-81853-2. PMID 40148388.
^ Yeo, S.; Nguyen, P. A.; Le, A. N.; Mishra, S. (2024). "KAN-PDEs: A Novel Approach to Solving Partial Differential Equations Using Kolmogorov-Arnold Networks—Enhanced Accuracy and Efficiency". Proceedings of the International Conference on Electrical and Electronics Engineering. Singapore: Springer Nature Singapore. pp. 43–62.
^ Hu, Yusong; Liang, Zichen; Yang, Fei; Hou, Qibin; Liu, Xialei; Cheng, Ming-Ming (2025). "KAC: Kolmogorov-Arnold Classifier for Continual Learning". Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 15297–15307.
^ Li, Longlong; Zhang, Yipeng; Wang, Guanghui; Xia, Kelin (2025). "Kolmogorov–Arnold graph neural networks for molecular property prediction". Nature Machine Intelligence. 7 (8): 1346–1354. doi:10.1038/s42256-025-01087-7.
^ Yang, Zhen; Mao, Ling; Ye, Liang; Ma, Yuan; Song, Zihan; Chen, Zhe (2025). "AKGNN: When Adaptive Graph Neural Network Meets Kolmogorov-Arnold Network for Industrial Soft Sensors". IEEE Transactions on Instrumentation and Measurement. doi:10.1109/TIM.2025.3512345.
^ Le, T. X. H., Tran, T. D., Pham, H. L., Le, V. T. D., Vu, T. H., Nguyen, V. T., & Nakashima, Y. (2024, November). Exploring the limitations of Kolmogorov-Arnold networks in classification: Insights to software training and hardware implementation. In 2024 Twelfth International Symposium on Computing and Networking Workshops (CANDARW) (pp. 110–116). IEEE.
^ Ta, H. T., Thai, D. Q., Tran, A., Sidorov, G., & Gelbukh, A. (2025). PRKAN: Parameter-reduced Kolmogorov-Arnold Networks. arXiv preprint arXiv:2501.07032.

[liu_kan-1] Liu, Ziming; Tegmark, Max (2024). "KAN: Kolmogorov–Arnold Networks". arXiv:2404.19756 [cs.LG].

[liu_kan_2.0-2] Liu, Ziming; Ma, Pingchuan; Wang, Yilun; Matusik, Wojciech; Tegmark, Max (2024). "KAN 2.0: Kolmogorov–Arnold Networks Meet Science". arXiv:2408.10205 [cs.LG].

[survey-3] ^ ^a ^b ^c ^d ^e ^f ^g ^h Somvanshi, S.; Javed, S. A.; Islam, M. M.; Pandit, D.; Das, S. (2024). "A Survey on Kolmogorov-Arnold Network". ACM Computing Surveys.

[4] Liu, Z., Wang, Y., Vaidya, S., Ruehle, F., Halverson, J., Soljačić, M., ... & Tegmark, M. (2024). Kan: Kolmogorov-Arnold networks. arXiv preprint arXiv:2404.19756.

[5] Kůrková, V. (1992). Kolmogorov's theorem and multilayer neural networks. Neural Networks, 5(3), 501-506.

[6] Hecht-Nielsen, R. (1987, June). Kolmogorov’s mapping neural network existence theorem. In Proceedings of the International Conference on Neural Networks (Vol. 3, pp. 11-14). New York, NY, USA: IEEE Press.

[7] Nees, M. (1994). Approximative versions of Kolmogorov's superposition theorem, proved constructively. Journal of Computational and Applied Mathematics, 54(2), 239-250.

[8] Igelnik, B., & Parikh, N. (2003). Kolmogorov's spline network. IEEE Transactions on Neural Networks, 14(4), 725-733.

[9] Sprecher, D. A. (1996). A numerical implementation of Kolmogorov's superpositions. Neural Networks, 9(5), 765-772.

[10] Sprecher, D. A. (1997). A numerical implementation of Kolmogorov's superpositions II. Neural Networks, 10(3), 447-457.

[11] Nakamura, M., Mines, R., & Kreinovich, V. (1993). Guaranteed intervals for Kolmogorov’s theorem (and their possible relation to neural networks). Interval Computations, 3, 183-199.

[12] Coppejans, M. (2004). On Kolmogorov's representation of functions of several variables by functions of one variable. Journal of Econometrics, 123(1), 1-31.

[13] Bryant, D. W. (2008). Analysis of Kolmogorov's superposition theorem and its implementation in applications with low and high dimensional data. University of Central Florida.

[14] Liu, X. (2015). Kolmogorov superposition theorem and its applications (Doctoral dissertation, Imperial College London).

[15] Polar, A., & Poluektov, M. (2021). A deep machine learning algorithm for construction of the Kolmogorov–Arnold representation. Engineering Applications of Artificial Intelligence, 99, 104137.

[16] Fakhoury, D., Fakhoury, E., & Speleers, H. (2022). ExSpliNet: An interpretable and expressive spline-based neural network. Neural Networks, 152, 332-346.

[17] Kolmogorov, A. N. (1963). "On the representation of continuous functions of many variables by superposition of continuous functions of one variable and addition". Translations of the American Mathematical Society. 2 (28): 55–59.

[18] Schmidt-Hieber, Johannes (2021). "The Kolmogorov–Arnold representation theorem revisited". Neural Networks. 137: 119–126. doi:10.1016/j.neunet.2021.01.020. PMID 33592434.

[Ismailov2024-19] Ismayilova, Aysu; Ismailov, Vugar (August 2024). "On the Kolmogorov Neural Networks". Neural Networks. 176 (Article 106333). arXiv:2311.00049. doi:10.1016/j.neunet.2024.106333. PMID 38688072.

[20] Ta, H. T. (2024). "BSRBF-KAN: a combination of B-splines and radial basis functions in Kolmogorov-Arnold networks". Proceedings of the International Symposium on Information and Communication Technology. Singapore: Springer Nature Singapore. pp. 3–15.

[21] Guo, Chunyu; Sun, Lucheng; Li, Shilong; Yuan, Zelong; Wang, Chao (2025). "Physics-informed Kolmogorov–Arnold network with Chebyshev polynomials for fluid mechanics". Physics of Fluids. 37 (9) 095120. doi:10.1063/5.0284999.

[22] Aghaei, Amirmojtaba A. (2024). "RKAN: Rational Kolmogorov-Arnold Networks". arXiv:2406.14495 [cs.LG].

[23] Liang, J.; Mu, L.; Fang, C. (2025). "Topology Identification of Distribution Network Based on Fourier Kolmogorov–Arnold Networks". IEEJ Transactions on Electrical and Electronic Engineering. 20 (10): 1579–1588. doi:10.1002/tee.70031.

[24] Song, Y.; Zhang, H.; Man, J.; Jin, X.; Li, Q. (2025). "AWKNet: A Lightweight Neural Network for Motor Imagery Electroencephalogram Classification Based on Adaptive Wavelet Transform Kolmogorov–Arnold Networks". IEEE Transactions on Consumer Electronics. 71 (1): 1. doi:10.1109/TCE.2025.3540970.

[25] Bozorgasl, Z., & Chen, H. Wav-kan: Wavelet Kolmogorov-Arnold networks, 2024. arXiv preprint arXiv:2405.12832.

[26] Polar, A.; Poluektov, M. (2021-03-01). "A deep machine learning algorithm for construction of the Kolmogorov–Arnold representation". Engineering Applications of Artificial Intelligence. 99 104137. doi:10.1016/j.engappai.2020.104137. ISSN 0952-1976.

[27] Poluektov, Michael; Polar, Andrew (2025-07-11). "Construction of the Kolmogorov-Arnold networks using the Newton-Kaczmarz method". Machine Learning. 114 (8): 185. doi:10.1007/s10994-025-06800-6. ISSN 1573-0565.

[28] Zhang, Z.; Wang, Q.; Zhang, Y.; Shen, T.; Zhang, W. (2025). "Physics-informed neural networks with hybrid Kolmogorov–Arnold network and augmented Lagrangian function for solving partial differential equations". Scientific Reports. 15 (1): 10523. doi:10.1038/s41598-025-81853-2. PMID 40148388.

[29] Yeo, S.; Nguyen, P. A.; Le, A. N.; Mishra, S. (2024). "KAN-PDEs: A Novel Approach to Solving Partial Differential Equations Using Kolmogorov-Arnold Networks—Enhanced Accuracy and Efficiency". Proceedings of the International Conference on Electrical and Electronics Engineering. Singapore: Springer Nature Singapore. pp. 43–62.

[30] Hu, Yusong; Liang, Zichen; Yang, Fei; Hou, Qibin; Liu, Xialei; Cheng, Ming-Ming (2025). "KAC: Kolmogorov-Arnold Classifier for Continual Learning". Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 15297–15307.

[31] Li, Longlong; Zhang, Yipeng; Wang, Guanghui; Xia, Kelin (2025). "Kolmogorov–Arnold graph neural networks for molecular property prediction". Nature Machine Intelligence. 7 (8): 1346–1354. doi:10.1038/s42256-025-01087-7.

[32] Yang, Zhen; Mao, Ling; Ye, Liang; Ma, Yuan; Song, Zihan; Chen, Zhe (2025). "AKGNN: When Adaptive Graph Neural Network Meets Kolmogorov-Arnold Network for Industrial Soft Sensors". IEEE Transactions on Instrumentation and Measurement. doi:10.1109/TIM.2025.3512345.

[33] Le, T. X. H., Tran, T. D., Pham, H. L., Le, V. T. D., Vu, T. H., Nguyen, V. T., & Nakashima, Y. (2024, November). Exploring the limitations of Kolmogorov-Arnold networks in classification: Insights to software training and hardware implementation. In 2024 Twelfth International Symposium on Computing and Networking Workshops (CANDARW) (pp. 110–116). IEEE.

[34] Ta, H. T., Thai, D. Q., Tran, A., Sidorov, G., & Gelbukh, A. (2025). PRKAN: Parameter-reduced Kolmogorov-Arnold Networks. arXiv preprint arXiv:2501.07032.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]