Example 1: Function Fitting =========================== In this example, we will cover how to leverage grid refinement to maximimze KANs’ ability to fit functions intialize model and create dataset .. code:: ipython3 from kan import * device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') print(device) # initialize KAN with G=3 model = KAN(width=[2,1,1], grid=3, k=3, seed=1, device=device) # create dataset f = lambda x: torch.exp(torch.sin(torch.pi*x[:,[0]]) + x[:,[1]]**2) dataset = create_dataset(f, n_var=2, device=device) .. parsed-literal:: cuda checkpoint directory created: ./model saving model version 0.0 Train KAN (grid=3) .. code:: ipython3 model.fit(dataset, opt="LBFGS", steps=20); .. parsed-literal:: | train_loss: 4.16e-02 | test_loss: 4.35e-02 | reg: 9.79e+00 | : 100%|█| 20/20 [00:03<00:00, 6.03it .. parsed-literal:: saving model version 0.1 .. parsed-literal:: The loss plateaus. we want a more fine-grained KAN! .. code:: ipython3 # initialize a more fine-grained KAN with G=10 model = model.refine(10) .. parsed-literal:: saving model version 0.2 Train KAN (grid=10) .. code:: ipython3 model.fit(dataset, opt="LBFGS", steps=20); .. parsed-literal:: | train_loss: 6.96e-03 | test_loss: 6.10e-03 | reg: 9.75e+00 | : 100%|█| 20/20 [00:02<00:00, 7.32it .. parsed-literal:: saving model version 0.3 .. parsed-literal:: The loss becomes lower. This is good! Now we can even iteratively making grids finer. .. code:: ipython3 grids = np.array([3,10,20,50,100]) train_losses = [] test_losses = [] steps = 200 k = 3 for i in range(grids.shape[0]): if i == 0: model = KAN(width=[2,1,1], grid=grids[i], k=k, seed=1, device=device) if i != 0: model = model.refine(grids[i]) results = model.fit(dataset, opt="LBFGS", steps=steps) train_losses += results['train_loss'] test_losses += results['test_loss'] .. parsed-literal:: checkpoint directory created: ./model saving model version 0.0 .. parsed-literal:: | train_loss: 1.46e-02 | test_loss: 1.53e-02 | reg: 8.83e+00 | : 100%|█| 200/200 [00:10<00:00, 19.67 .. parsed-literal:: saving model version 0.1 saving model version 0.2 .. parsed-literal:: | train_loss: 2.84e-04 | test_loss: 3.29e-04 | reg: 8.84e+00 | : 100%|█| 200/200 [00:15<00:00, 13.09 .. parsed-literal:: saving model version 0.3 saving model version 0.4 .. parsed-literal:: | train_loss: 4.21e-05 | test_loss: 4.04e-05 | reg: 8.84e+00 | : 100%|█| 200/200 [00:09<00:00, 21.22 .. parsed-literal:: saving model version 0.5 saving model version 0.6 .. parsed-literal:: | train_loss: 1.02e-05 | test_loss: 1.24e-05 | reg: 8.84e+00 | : 100%|█| 200/200 [00:10<00:00, 18.76 .. parsed-literal:: saving model version 0.7 saving model version 0.8 .. parsed-literal:: | train_loss: 1.64e-04 | test_loss: 1.74e-03 | reg: 8.86e+00 | : 100%|█| 200/200 [00:17<00:00, 11.72 .. parsed-literal:: saving model version 0.9 .. parsed-literal:: Training dynamics of losses display staircase structures (loss suddenly drops after grid refinement) .. code:: ipython3 plt.plot(train_losses) plt.plot(test_losses) plt.legend(['train', 'test']) plt.ylabel('RMSE') plt.xlabel('step') plt.yscale('log') .. image:: Example_1_function_fitting_files/Example_1_function_fitting_12_0.png Neural scaling laws (For some reason, this got worse than pykan 0.0. We’re still investigating the reason, probably due to the updates of curve2coef) .. code:: ipython3 n_params = 3 * grids train_vs_G = train_losses[(steps-1)::steps] test_vs_G = test_losses[(steps-1)::steps] plt.plot(n_params, train_vs_G, marker="o") plt.plot(n_params, test_vs_G, marker="o") plt.plot(n_params, 100*n_params**(-4.), ls="--", color="black") plt.xscale('log') plt.yscale('log') plt.legend(['train', 'test', r'$N^{-4}$']) plt.xlabel('number of params') plt.ylabel('RMSE') .. parsed-literal:: Text(0, 0.5, 'RMSE') .. image:: Example_1_function_fitting_files/Example_1_function_fitting_14_1.png