kan package
Submodules
kan.KAN module
- class kan.KAN.KAN(*args: Any, **kwargs: Any)
Bases:
Module
KAN class
Attributes:
- biases: a list of nn.Linear()
biases are added on nodes (in principle, biases can be absorbed into activation functions. However, we still have them for better optimization)
- act_fun: a list of KANLayer
KANLayers
- depth: int
depth of KAN
- width: list
number of neurons in each layer. e.g., [2,5,5,3] means 2D inputs, 3D outputs, with 2 layers of 5 hidden neurons.
- grid: int
the number of grid intervals
- k: int
the order of piecewise polynomial
- base_fun: fun
residual function b(x). an activation function phi(x) = sb_scale * b(x) + sp_scale * spline(x)
- symbolic_fun: a list of Symbolic_KANLayer
Symbolic_KANLayers
- symbolic_enabled: bool
If False, the symbolic front is not computed (to save time). Default: True.
Methods:
- __init__():
initialize a KAN
- initialize_from_another_model():
initialize a KAN from another KAN (with the same shape, but potentially different grids)
- update_grid_from_samples():
update spline grids based on samples
- initialize_grid_from_another_model():
initalize KAN grids from another KAN
- forward():
forward
- set_mode():
set the mode of an activation function: ‘n’ for numeric, ‘s’ for symbolic, ‘ns’ for combined (note they are visualized differently in plot(). ‘n’ as black, ‘s’ as red, ‘ns’ as purple).
- fix_symbolic():
fix an activation function to be symbolic
- suggest_symbolic():
suggest the symbolic candicates of a numeric spline-based activation function
- lock():
lock activation functions to share parameters
- unlock():
unlock locked activations
- get_range():
get the input and output ranges of an activation function
- plot():
plot the diagram of KAN
- train():
train KAN
- prune():
prune KAN
- remove_edge():
remove some edge of KAN
- remove_node():
remove some node of KAN
- auto_symbolic():
automatically fit all splines to be symbolic functions
- symbolic_formula():
obtain the symbolic formula of the KAN network
- __init__(width=None, grid=3, k=3, noise_scale=0.1, noise_scale_base=0.1, base_fun=torch.nn.SiLU, symbolic_enabled=True, bias_trainable=True, grid_eps=1.0, grid_range=[-1, 1], sp_trainable=True, sb_trainable=True, device='cpu', seed=0)
initalize a KAN model
Args:
- widthlist of int
\([n_0, n_1, .., n_{L-1}]\) specify the number of neurons in each layer (including inputs/outputs)
- gridint
number of grid intervals. Default: 3.
- kint
order of piecewise polynomial. Default: 3.
- noise_scalefloat
initial injected noise to spline. Default: 0.1.
- base_funfun
the residual function b(x). Default: torch.nn.SiLU().
- symbolic_enabledbool
compute or skip symbolic computations (for efficiency). By default: True.
- bias_trainablebool
bias parameters are updated or not. By default: True
- grid_epsfloat
When grid_eps = 0, the grid is uniform; when grid_eps = 1, the grid is partitioned using percentiles of samples. 0 < grid_eps < 1 interpolates between the two extremes. Default: 0.02.
- grid_rangelist/np.array of shape (2,))
setting the range of grids. Default: [-1,1].
- sp_trainablebool
If true, scale_sp is trainable. Default: True.
- sb_trainablebool
If true, scale_base is trainable. Default: True.
- devicestr
device
- seedint
random seed
Returns:
self
Example
>>> model = KAN(width=[2,5,1], grid=5, k=3) >>> (model.act_fun[0].in_dim, model.act_fun[0].out_dim), (model.act_fun[1].in_dim, model.act_fun[1].out_dim) ((2, 5), (5, 1))
- auto_symbolic(a_range=(-10, 10), b_range=(-10, 10), lib=None, verbose=1)
automatic symbolic regression: using top 1 suggestion from suggest_symbolic to replace splines with symbolic activations
Args:
- libNone or a list of function names
the symbolic library
- verboseint
verbosity
Returns:
None (print suggested symbolic formulas)
Example 1
>>> # default library >>> from utils import create_dataset >>> model = KAN(width=[2,5,1], grid=5, k=3, noise_scale=0.1, seed=0) >>> f = lambda x: torch.exp(torch.sin(torch.pi*x[:,[0]]) + x[:,[1]]**2) >>> dataset = create_dataset(f, n_var=2) >>> model.train(dataset, opt='LBFGS', steps=50, lamb=0.01); >>> >>> model = model.prune() >>> model(dataset['train_input']) >>> model.auto_symbolic() fixing (0,0,0) with sin, r2=0.9994837045669556 fixing (0,1,0) with cosh, r2=0.9978033900260925 fixing (1,0,0) with arctan, r2=0.9997088313102722
Example 2
>>> # customized library >>> from utils import create_dataset >>> model = KAN(width=[2,5,1], grid=5, k=3, noise_scale=0.1, seed=0) >>> f = lambda x: torch.exp(torch.sin(torch.pi*x[:,[0]]) + x[:,[1]]**2) >>> dataset = create_dataset(f, n_var=2) >>> model.train(dataset, opt='LBFGS', steps=50, lamb=0.01); >>> >>> model = model.prune() >>> model(dataset['train_input']) >>> model.auto_symbolic(lib=['exp','sin','x^2']) fixing (0,0,0) with sin, r2=0.999411404132843 fixing (0,1,0) with x^2, r2=0.9962921738624573 fixing (1,0,0) with exp, r2=0.9980258941650391
- clear_ckpts(folder='./model_ckpt')
clear all checkpoints
Args:
- folderstr
the folder that stores checkpoints
Returns:
None
- fix_symbolic(l, i, j, fun_name, fit_params_bool=True, a_range=(-10, 10), b_range=(-10, 10), verbose=True, random=False)
set (l,i,j) activation to be symbolic (specified by fun_name)
Args:
- lint
layer index
- iint
input neuron index
- jint
output neuron index
- fun_namestr
function name
- fit_params_boolbool
obtaining affine parameters through fitting (True) or setting default values (False)
- a_rangetuple
sweeping range of a
- b_rangetuple
sweeping range of b
- verbosebool
If True, more information is printed.
- randombool
initialize affine parameteres randomly or as [1,0,1,0]
Returns:
None or r2 (coefficient of determination)
Example 1
>>> # when fit_params_bool = False >>> model = KAN(width=[2,5,1], grid=5, k=3) >>> model.fix_symbolic(0,1,3,'sin',fit_params_bool=False) >>> print(model.act_fun[0].mask.reshape(2,5)) >>> print(model.symbolic_fun[0].mask.reshape(2,5)) tensor([[1., 1., 1., 1., 1.], [1., 1., 0., 1., 1.]]) tensor([[0., 0., 0., 0., 0.], [0., 0., 1., 0., 0.]])
Example 2
>>> # when fit_params_bool = True >>> model = KAN(width=[2,5,1], grid=5, k=3, noise_scale=1.) >>> x = torch.normal(0,1,size=(100,2)) >>> model(x) # obtain activations (otherwise model does not have attributes acts) >>> model.fix_symbolic(0,1,3,'sin',fit_params_bool=True) >>> print(model.act_fun[0].mask.reshape(2,5)) >>> print(model.symbolic_fun[0].mask.reshape(2,5)) r2 is 0.8131332993507385 r2 is not very high, please double check if you are choosing the correct symbolic function. tensor([[1., 1., 1., 1., 1.], [1., 1., 0., 1., 1.]]) tensor([[0., 0., 0., 0., 0.], [0., 0., 1., 0., 0.]])
- forward(x)
KAN forward
Args:
- x2D torch.float
inputs, shape (batch, input dimension)
Returns:
- y2D torch.float
outputs, shape (batch, output dimension)
Example
>>> model = KAN(width=[2,5,3], grid=5, k=3) >>> x = torch.normal(0,1,size=(100,2)) >>> model(x).shape torch.Size([100, 3])
- get_range(l, i, j, verbose=True)
Get the input range and output range of the (l,i,j) activation
Args:
- lint
layer index
- iint
input neuron index
- jint
output neuron index
Returns:
- x_minfloat
minimum of input
- x_maxfloat
maximum of input
- y_minfloat
minimum of output
- y_maxfloat
maximum of output
Example
>>> model = KAN(width=[2,3,1], grid=5, k=3, noise_scale=1.) >>> x = torch.normal(0,1,size=(100,2)) >>> model(x) # do a forward pass to obtain model.acts >>> model.get_range(0,0,0) x range: [-2.13 , 2.75 ] y range: [-0.50 , 1.83 ] (tensor(-2.1288), tensor(2.7498), tensor(-0.5042), tensor(1.8275))
- initialize_from_another_model(another_model, x)
initialize from a parent model. The parent has the same width as the current model but may have different grids.
Args:
- another_modelKAN
the parent model used to initialize the current model
- x2D torch.float
inputs, shape (batch, input dimension)
Returns:
self : KAN
Example
>>> model_coarse = KAN(width=[2,5,1], grid=5, k=3) >>> model_fine = KAN(width=[2,5,1], grid=10, k=3) >>> print(model_fine.act_fun[0].coef[0][0].data) >>> x = torch.normal(0,1,size=(100,2)) >>> model_fine.initialize_from_another_model(model_coarse, x); >>> print(model_fine.act_fun[0].coef[0][0].data) tensor(-0.0030) tensor(0.0506)
- initialize_grid_from_another_model(model, x)
initialize grid from a parent model
Args:
- modelKAN
parent model
- x2D torch.float
inputs, shape (batch, input dimension)
Returns:
None
Example
>>> model_parent = KAN(width=[1,1], grid=5, k=3) >>> model_parent.act_fun[0].grid.data = torch.linspace(-2,2,steps=6)[None,:] >>> x = torch.linspace(-2,2,steps=1001)[:,None] >>> model = KAN(width=[1,1], grid=5, k=3) >>> print(model.act_fun[0].grid.data) >>> model = model.initialize_from_another_model(model_parent, x) >>> print(model.act_fun[0].grid.data) tensor([[-1.0000, -0.6000, -0.2000, 0.2000, 0.6000, 1.0000]]) tensor([[-2.0000, -1.2000, -0.4000, 0.4000, 1.2000, 2.0000]])
- load_ckpt(name, folder='./model_ckpt')
load a checkpoint to the current model
Args:
- name: str
the name of the checkpoint to be loaded
- folderstr
the folder that stores checkpoints
Returns:
None
- lock(l, ids)
lock ids in the l-th layer to be the same function
Args:
- lint
layer index
- ids2D list
\([[i_1,j_1],[i_2,j_2],...]\) set \((l,i_i,j_1), (l,i_2,j_2), ...\) to be the same function
Returns:
None
Example
>>> model = KAN(width=[2,3,1], grid=5, k=3, noise_scale=1.) >>> print(model.act_fun[0].weight_sharing.reshape(3,2)) >>> model.lock(0,[[1,0],[1,1]]) >>> print(model.act_fun[0].weight_sharing.reshape(3,2)) tensor([[0, 1], [2, 3], [4, 5]]) tensor([[0, 1], [2, 1], [4, 5]])
- plot(folder='./figures', beta=3, mask=False, mode='supervised', scale=0.5, tick=False, sample=False, in_vars=None, out_vars=None, title=None)
plot KAN
Args:
- folderstr
the folder to store pngs
- betafloat
positive number. control the transparency of each activation. transparency = tanh(beta*l1).
- maskbool
If True, plot with mask (need to run prune() first to obtain mask). If False (by default), plot all activation functions.
- modebool
“supervised” or “unsupervised”. If “supervised”, l1 is measured by absolution value (not subtracting mean); if “unsupervised”, l1 is measured by standard deviation (subtracting mean).
- scalefloat
control the size of the diagram
- in_vars: None or list of str
the name(s) of input variables
- out_vars: None or list of str
the name(s) of output variables
- title: None or str
title
Returns:
Figure
Example
>>> # see more interactive examples in demos >>> model = KAN(width=[2,3,1], grid=3, k=3, noise_scale=1.0) >>> x = torch.normal(0,1,size=(100,2)) >>> model(x) # do a forward pass to obtain model.acts >>> model.plot()
- prune(threshold=0.01, mode='auto', active_neurons_id=None)
pruning KAN on the node level. If a node has small incoming or outgoing connection, it will be pruned away.
Args:
- thresholdfloat
the threshold used to determine whether a node is small enough
- modestr
“auto” or “manual”. If “auto”, the thresold will be used to automatically prune away nodes. If “manual”, active_neuron_id is needed to specify which neurons are kept (others are thrown away).
- active_neuron_idlist of id lists
For example, [[0,1],[0,2,3]] means keeping the 0/1 neuron in the 1st hidden layer and the 0/2/3 neuron in the 2nd hidden layer. Pruning input and output neurons is not supported yet.
Returns:
- model2KAN
pruned model
Example
>>> # for more interactive examples, please see demos >>> from utils import create_dataset >>> model = KAN(width=[2,5,1], grid=5, k=3, noise_scale=0.1, seed=0) >>> f = lambda x: torch.exp(torch.sin(torch.pi*x[:,[0]]) + x[:,[1]]**2) >>> dataset = create_dataset(f, n_var=2) >>> model.train(dataset, opt='LBFGS', steps=50, lamb=0.01); >>> model.prune() >>> model.plot(mask=True)
- remove_edge(l, i, j)
remove activtion phi(l,i,j) (set its mask to zero)
Args:
- lint
layer index
- iint
input neuron index
- jint
output neuron index
Returns:
None
- remove_node(l, i)
remove neuron (l,i) (set the masks of all incoming and outgoing activation functions to zero)
Args:
- lint
layer index
- iint
neuron index
Returns:
None
- save_ckpt(name, folder='./model_ckpt')
save the current model as checkpoint
Args:
- name: str
the name of the checkpoint to be saved
- folderstr
the folder that stores checkpoints
Returns:
None
- set_mode(l, i, j, mode, mask_n=None)
set (l,i,j) activation to have mode
Args:
- lint
layer index
- iint
input neuron index
- jint
output neuron index
- modestr
‘n’ (numeric) or ‘s’ (symbolic) or ‘ns’ (combined)
- mask_nNone or float)
magnitude of the numeric front
Returns:
None
- suggest_symbolic(l, i, j, a_range=(-10, 10), b_range=(-10, 10), lib=None, topk=5, verbose=True)
suggest the symbolic candidates of phi(l,i,j)
Args:
- lint
layer index
- iint
input neuron index
- jint
output neuron index
- libdic
library of symbolic bases. If lib = None, the global default library will be used.
- topkint
display the top k symbolic functions (according to r2)
- verbosebool
If True, more information will be printed.
Returns:
None
Example
>>> model = KAN(width=[2,5,1], grid=5, k=3, noise_scale=0.1, seed=0) >>> f = lambda x: torch.exp(torch.sin(torch.pi*x[:,[0]]) + x[:,[1]]**2) >>> dataset = create_dataset(f, n_var=2) >>> model.train(dataset, opt='LBFGS', steps=50, lamb=0.01); >>> model = model.prune() >>> model(dataset['train_input']) >>> model.suggest_symbolic(0,0,0) function , r2 sin , 0.9994412064552307 gaussian , 0.9196369051933289 tanh , 0.8608126044273376 sigmoid , 0.8578218817710876 arctan , 0.842217743396759
- symbolic_formula(floating_digit=2, var=None, normalizer=None, simplify=False, output_normalizer=None)
obtain the symbolic formula
Args:
- floating_digitint
the number of digits to display
- varlist of str
the name of variables (if not provided, by default using [‘x_1’, ‘x_2’, …])
- normalizer[mean array (floats), varaince array (floats)]
the normalization applied to inputs
- simplifybool
If True, simplify the equation at each step (usually quite slow), so set up False by default.
- output_normalizer: [mean array (floats), varaince array (floats)]
the normalization applied to outputs
Returns:
symbolic formula : sympy function
Example
>>> model = KAN(width=[2,5,1], grid=5, k=3, noise_scale=0.1, seed=0, grid_eps=0.02) >>> f = lambda x: torch.exp(torch.sin(torch.pi*x[:,[0]]) + x[:,[1]]**2) >>> dataset = create_dataset(f, n_var=2) >>> model.train(dataset, opt='LBFGS', steps=50, lamb=0.01); >>> model = model.prune() >>> model(dataset['train_input']) >>> model.auto_symbolic(lib=['exp','sin','x^2']) >>> model.train(dataset, opt='LBFGS', steps=50, lamb=0.00, update_grid=False); >>> model.symbolic_formula()
- train(dataset, opt='LBFGS', steps=100, log=1, lamb=0.0, lamb_l1=1.0, lamb_entropy=2.0, lamb_coef=0.0, lamb_coefdiff=0.0, update_grid=True, grid_update_num=10, loss_fn=None, lr=1.0, stop_grid_update_step=50, batch=-1, small_mag_threshold=1e-16, small_reg_factor=1.0, metrics=None, sglr_avoid=False, save_fig=False, in_vars=None, out_vars=None, beta=3, save_fig_freq=1, img_folder='./video', device='cpu')
training
Args:
- datasetdic
contains dataset[‘train_input’], dataset[‘train_label’], dataset[‘test_input’], dataset[‘test_label’]
- optstr
“LBFGS” or “Adam”
- stepsint
training steps
- logint
logging frequency
- lambfloat
overall penalty strength
- lamb_l1float
l1 penalty strength
- lamb_entropyfloat
entropy penalty strength
- lamb_coeffloat
coefficient magnitude penalty strength
- lamb_coefdifffloat
difference of nearby coefficits (smoothness) penalty strength
- update_gridbool
If True, update grid regularly before stop_grid_update_step
- grid_update_numint
the number of grid updates before stop_grid_update_step
- stop_grid_update_stepint
no grid updates after this training step
- batchint
batch size, if -1 then full.
- small_mag_thresholdfloat
threshold to determine large or small numbers (may want to apply larger penalty to smaller numbers)
- small_reg_factorfloat
penalty strength applied to small factors relative to large factos
- devicestr
device
- save_fig_freqint
save figure every (save_fig_freq) step
Returns:
- resultsdic
results[‘train_loss’], 1D array of training losses (RMSE) results[‘test_loss’], 1D array of test losses (RMSE) results[‘reg’], 1D array of regularization
Example
>>> # for interactive examples, please see demos >>> from utils import create_dataset >>> model = KAN(width=[2,5,1], grid=5, k=3, noise_scale=0.1, seed=0) >>> f = lambda x: torch.exp(torch.sin(torch.pi*x[:,[0]]) + x[:,[1]]**2) >>> dataset = create_dataset(f, n_var=2) >>> model.train(dataset, opt='LBFGS', steps=50, lamb=0.01); >>> model.plot()
- unfix_symbolic(l, i, j)
unfix the (l,i,j) activation function.
- unfix_symbolic_all()
unfix all activation functions.
- unlock(l, ids)
unlock ids in the l-th layer to be the same function
Args:
- lint
layer index
- ids2D list)
[[i1,j1],[i2,j2],…] set (l,ii,j1), (l,i2,j2), … to be unlocked
Example:
>>> model = KAN(width=[2,3,1], grid=5, k=3, noise_scale=1.) >>> model.lock(0,[[1,0],[1,1]]) >>> print(model.act_fun[0].weight_sharing.reshape(3,2)) >>> model.unlock(0,[[1,0],[1,1]]) >>> print(model.act_fun[0].weight_sharing.reshape(3,2)) tensor([[0, 1], [2, 1], [4, 5]]) tensor([[0, 1], [2, 3], [4, 5]])
- update_grid_from_samples(x)
update grid from samples
Args:
- x2D torch.float
inputs, shape (batch, input dimension)
Returns:
None
Example
>>> model = KAN(width=[2,5,1], grid=5, k=3) >>> print(model.act_fun[0].grid[0].data) >>> x = torch.rand(100,2)*5 >>> model.update_grid_from_samples(x) >>> print(model.act_fun[0].grid[0].data) tensor([-1.0000, -0.6000, -0.2000, 0.2000, 0.6000, 1.0000]) tensor([0.0128, 1.0064, 2.0000, 2.9937, 3.9873, 4.9809])
kan.KANLayer module
- class kan.KANLayer.KANLayer(*args: Any, **kwargs: Any)
Bases:
Module
KANLayer class
Attributes:
- in_dim: int
input dimension
- out_dim: int
output dimension
- size: int
the number of splines = input dimension * output dimension
- k: int
the piecewise polynomial order of splines
- grid: 2D torch.float
grid points
- noises: 2D torch.float
injected noises to splines at initialization (to break degeneracy)
- coef: 2D torch.tensor
coefficients of B-spline bases
- scale_base: 1D torch.float
magnitude of the residual function b(x)
- scale_sp: 1D torch.float
mangitude of the spline function spline(x)
- base_fun: fun
residual function b(x)
- mask: 1D torch.float
mask of spline functions. setting some element of the mask to zero means setting the corresponding activation to zero function.
- grid_eps: float in [0,1]
a hyperparameter used in update_grid_from_samples. When grid_eps = 0, the grid is uniform; when grid_eps = 1, the grid is partitioned using percentiles of samples. 0 < grid_eps < 1 interpolates between the two extremes.
- weight_sharing: 1D tensor int
allow spline activations to share parameters
- lock_counter: int
counter how many activation functions are locked (weight sharing)
- lock_id: 1D torch.int
the id of activation functions that are locked
- device: str
device
Methods:
- __init__():
initialize a KANLayer
- forward():
forward
- update_grid_from_samples():
update grids based on samples’ incoming activations
- initialize_grid_from_parent():
initialize grids from another model
- get_subset():
get subset of the KANLayer (used for pruning)
- lock():
lock several activation functions to share parameters
- unlock():
unlock already locked activation functions
- __init__(in_dim=3, out_dim=2, num=5, k=3, noise_scale=0.1, scale_base=1.0, scale_sp=1.0, base_fun=torch.nn.SiLU, grid_eps=0.02, grid_range=[-1, 1], sp_trainable=True, sb_trainable=True, device='cpu')
‘ initialize a KANLayer
Args:
- in_dimint
input dimension. Default: 2.
- out_dimint
output dimension. Default: 3.
- numint
the number of grid intervals = G. Default: 5.
- kint
the order of piecewise polynomial. Default: 3.
- noise_scalefloat
the scale of noise injected at initialization. Default: 0.1.
- scale_basefloat
the scale of the residual function b(x). Default: 1.0.
- scale_spfloat
the scale of the base function spline(x). Default: 1.0.
- base_funfunction
residual function b(x). Default: torch.nn.SiLU()
- grid_epsfloat
When grid_eps = 0, the grid is uniform; when grid_eps = 1, the grid is partitioned using percentiles of samples. 0 < grid_eps < 1 interpolates between the two extremes. Default: 0.02.
- grid_rangelist/np.array of shape (2,)
setting the range of grids. Default: [-1,1].
- sp_trainablebool
If true, scale_sp is trainable. Default: True.
- sb_trainablebool
If true, scale_base is trainable. Default: True.
- devicestr
device
Returns:
self
Example
>>> model = KANLayer(in_dim=3, out_dim=5) >>> (model.in_dim, model.out_dim) (3, 5)
- forward(x)
KANLayer forward given input x
Args:
- x2D torch.float
inputs, shape (number of samples, input dimension)
Returns:
- y2D torch.float
outputs, shape (number of samples, output dimension)
- preacts3D torch.float
fan out x into activations, shape (number of sampels, output dimension, input dimension)
- postacts3D torch.float
the outputs of activation functions with preacts as inputs
- postspline3D torch.float
the outputs of spline functions with preacts as inputs
Example
>>> model = KANLayer(in_dim=3, out_dim=5) >>> x = torch.normal(0,1,size=(100,3)) >>> y, preacts, postacts, postspline = model(x) >>> y.shape, preacts.shape, postacts.shape, postspline.shape (torch.Size([100, 5]), torch.Size([100, 5, 3]), torch.Size([100, 5, 3]), torch.Size([100, 5, 3]))
- get_subset(in_id, out_id)
get a smaller KANLayer from a larger KANLayer (used for pruning)
Args:
- in_idlist
id of selected input neurons
- out_idlist
id of selected output neurons
Returns:
spb : KANLayer
Example
>>> kanlayer_large = KANLayer(in_dim=10, out_dim=10, num=5, k=3) >>> kanlayer_small = kanlayer_large.get_subset([0,9],[1,2,3]) >>> kanlayer_small.in_dim, kanlayer_small.out_dim (2, 3)
- initialize_grid_from_parent(parent, x)
update grid from a parent KANLayer & samples
Args:
- parentKANLayer
a parent KANLayer (whose grid is usually coarser than the current model)
- x2D torch.float
inputs, shape (number of samples, input dimension)
Returns:
None
Example
>>> batch = 100 >>> parent_model = KANLayer(in_dim=1, out_dim=1, num=5, k=3) >>> print(parent_model.grid.data) >>> model = KANLayer(in_dim=1, out_dim=1, num=10, k=3) >>> x = torch.normal(0,1,size=(batch, 1)) >>> model.initialize_grid_from_parent(parent_model, x) >>> print(model.grid.data) tensor([[-1.0000, -0.6000, -0.2000, 0.2000, 0.6000, 1.0000]]) tensor([[-1.0000, -0.8000, -0.6000, -0.4000, -0.2000, 0.0000, 0.2000, 0.4000, 0.6000, 0.8000, 1.0000]])
- lock(ids)
lock activation functions to share parameters based on ids
Args:
- idslist
list of ids of activation functions
Returns:
None
Example
>>> model = KANLayer(in_dim=3, out_dim=3, num=5, k=3) >>> print(model.weight_sharing.reshape(3,3)) >>> model.lock([[0,0],[1,2],[2,1]]) # set (0,0),(1,2),(2,1) functions to be the same >>> print(model.weight_sharing.reshape(3,3)) tensor([[0, 1, 2], [3, 4, 5], [6, 7, 8]]) tensor([[0, 1, 2], [3, 4, 0], [6, 0, 8]])
- unlock(ids)
unlock activation functions
Args:
- idslist
list of ids of activation functions
Returns:
None
Example
>>> model = KANLayer(in_dim=3, out_dim=3, num=5, k=3) >>> model.lock([[0,0],[1,2],[2,1]]) # set (0,0),(1,2),(2,1) functions to be the same >>> print(model.weight_sharing.reshape(3,3)) >>> model.unlock([[0,0],[1,2],[2,1]]) # unlock the locked functions >>> print(model.weight_sharing.reshape(3,3)) tensor([[0, 1, 2], [3, 4, 0], [6, 0, 8]]) tensor([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
- update_grid_from_samples(x)
update grid from samples
Args:
- x2D torch.float
inputs, shape (number of samples, input dimension)
Returns:
None
Example
>>> model = KANLayer(in_dim=1, out_dim=1, num=5, k=3) >>> print(model.grid.data) >>> x = torch.linspace(-3,3,steps=100)[:,None] >>> model.update_grid_from_samples(x) >>> print(model.grid.data) tensor([[-1.0000, -0.6000, -0.2000, 0.2000, 0.6000, 1.0000]]) tensor([[-3.0002, -1.7882, -0.5763, 0.6357, 1.8476, 3.0002]])
kan.LBFGS module
- class kan.LBFGS.LBFGS(*args: Any, **kwargs: Any)
Bases:
Optimizer
Implements L-BFGS algorithm.
Heavily inspired by minFunc.
Warning
This optimizer doesn’t support per-parameter options and parameter groups (there can be only one).
Warning
Right now all parameters have to be on a single device. This will be improved in the future.
Note
This is a very memory intensive optimizer (it requires additional
param_bytes * (history_size + 1)
bytes). If it doesn’t fit in memory try reducing the history size, or use a different algorithm.- Args:
lr (float): learning rate (default: 1) max_iter (int): maximal number of iterations per optimization step
(default: 20)
- max_eval (int): maximal number of function evaluations per optimization
step (default: max_iter * 1.25).
- tolerance_grad (float): termination tolerance on first order optimality
(default: 1e-7).
- tolerance_change (float): termination tolerance on function
value/parameter changes (default: 1e-9).
history_size (int): update history size (default: 100). line_search_fn (str): either ‘strong_wolfe’ or None (default: None).
- __init__(params, lr=1, max_iter=20, max_eval=None, tolerance_grad=1e-07, tolerance_change=1e-09, tolerance_ys=1e-32, history_size=100, line_search_fn=None)
- step(closure)
Perform a single optimization step.
- Args:
- closure (Callable): A closure that reevaluates the model
and returns the loss.
kan.Symbolic_KANLayer module
- class kan.Symbolic_KANLayer.Symbolic_KANLayer(*args: Any, **kwargs: Any)
Bases:
Module
KANLayer class
Attributes:
- in_dim: int
input dimension
- out_dim: int
output dimension
- funs: 2D array of torch functions (or lambda functions)
symbolic functions (torch)
- funs_name: 2D arry of str
names of symbolic functions
- funs_sympy: 2D array of sympy functions (or lambda functions)
symbolic functions (sympy)
- affine: 3D array of floats
affine transformations of inputs and outputs
Methods:
- __init__():
initialize a Symbolic_KANLayer
- forward():
forward
- get_subset():
get subset of the KANLayer (used for pruning)
- fix_symbolic():
fix an activation function to be symbolic
- __init__(in_dim=3, out_dim=2, device='cpu')
initialize a Symbolic_KANLayer (activation functions are initialized to be identity functions)
Args:
- in_dimint
input dimension
- out_dimint
output dimension
- devicestr
device
Returns:
self
Example
>>> sb = Symbolic_KANLayer(in_dim=3, out_dim=3) >>> len(sb.funs), len(sb.funs[0]) (3, 3)
- fix_symbolic(i, j, fun_name, x=None, y=None, random=False, a_range=(-10, 10), b_range=(-10, 10), verbose=True)
fix an activation function to be symbolic
Args:
- iint
the id of input neuron
- jint
the id of output neuron
- fun_namestr
the name of the symbolic functions
- x1D array
preactivations
- y1D array
postactivations
- a_rangetuple
sweeping range of a
- b_rangetuple
sweeping range of a
- verbosebool
print more information if True
Returns:
r2 (coefficient of determination)
Example 1
>>> # when x & y are not provided. Affine parameters are set to a = 1, b = 0, c = 1, d = 0 >>> sb = Symbolic_KANLayer(in_dim=3, out_dim=2) >>> sb.fix_symbolic(2,1,'sin') >>> print(sb.funs_name) >>> print(sb.affine) [['', '', ''], ['', '', 'sin']] Parameter containing: tensor([[0., 0., 0., 0.], [0., 0., 0., 0.], [1., 0., 1., 0.]], requires_grad=True) Example 2 --------- >>> # when x & y are provided, fit_params() is called to find the best fit coefficients >>> sb = Symbolic_KANLayer(in_dim=3, out_dim=2) >>> batch = 100 >>> x = torch.linspace(-1,1,steps=batch) >>> noises = torch.normal(0,1,(batch,)) * 0.02 >>> y = 5.0*torch.sin(3.0*x + 2.0) + 0.7 + noises >>> sb.fix_symbolic(2,1,'sin',x,y) >>> print(sb.funs_name) >>> print(sb.affine[1,2,:].data) r2 is 0.9999701976776123 [['', '', ''], ['', '', 'sin']] tensor([2.9981, 1.9997, 5.0039, 0.6978])
- forward(x)
Args:
- x2D array
inputs, shape (batch, input dimension)
Returns:
- y2D array
outputs, shape (batch, output dimension)
- postacts3D array
activations after activation functions but before summing on nodes
Example
>>> sb = Symbolic_KANLayer(in_dim=3, out_dim=5) >>> x = torch.normal(0,1,size=(100,3)) >>> y, postacts = sb(x) >>> y.shape, postacts.shape (torch.Size([100, 5]), torch.Size([100, 5, 3]))
- get_subset(in_id, out_id)
get a smaller Symbolic_KANLayer from a larger Symbolic_KANLayer (used for pruning)
Args:
- in_idlist
id of selected input neurons
- out_idlist
id of selected output neurons
Returns:
spb : Symbolic_KANLayer
Example
>>> sb_large = Symbolic_KANLayer(in_dim=10, out_dim=10) >>> sb_small = sb_large.get_subset([0,9],[1,2,3]) >>> sb_small.in_dim, sb_small.out_dim (2, 3)
kan.spline module
- kan.spline.B_batch(x, grid, k=0, extend=True, device='cpu')
evaludate x on B-spline bases
Args:
- x2D torch.tensor
inputs, shape (number of splines, number of samples)
- grid2D torch.tensor
grids, shape (number of splines, number of grid points)
- kint
the piecewise polynomial order of splines.
- extendbool
If True, k points are extended on both ends. If False, no extension (zero boundary condition). Default: True
- devicestr
devicde
Returns:
- spline values3D torch.tensor
shape (number of splines, number of B-spline bases (coeffcients), number of samples). The numbef of B-spline bases = number of grid points + k - 1.
Example
>>> num_spline = 5 >>> num_sample = 100 >>> num_grid_interval = 10 >>> k = 3 >>> x = torch.normal(0,1,size=(num_spline, num_sample)) >>> grids = torch.einsum('i,j->ij', torch.ones(num_spline,), torch.linspace(-1,1,steps=num_grid_interval+1)) >>> B_batch(x, grids, k=k).shape torch.Size([5, 13, 100])
- kan.spline.coef2curve(x_eval, grid, coef, k, device='cpu')
converting B-spline coefficients to B-spline curves. Evaluate x on B-spline curves (summing up B_batch results over B-spline basis).
Args:
- x_eval2D torch.tensor)
shape (number of splines, number of samples)
- grid2D torch.tensor)
shape (number of splines, number of grid points)
- coef2D torch.tensor)
shape (number of splines, number of coef params). number of coef params = number of grid intervals + k
- kint
the piecewise polynomial order of splines.
- devicestr
devicde
Returns:
- y_eval2D torch.tensor
shape (number of splines, number of samples)
Example
>>> num_spline = 5 >>> num_sample = 100 >>> num_grid_interval = 10 >>> k = 3 >>> x_eval = torch.normal(0,1,size=(num_spline, num_sample)) >>> grids = torch.einsum('i,j->ij', torch.ones(num_spline,), torch.linspace(-1,1,steps=num_grid_interval+1)) >>> coef = torch.normal(0,1,size=(num_spline, num_grid_interval+k)) >>> coef2curve(x_eval, grids, coef, k=k).shape torch.Size([5, 100])
- kan.spline.curve2coef(x_eval, y_eval, grid, k, device='cpu')
converting B-spline curves to B-spline coefficients using least squares.
Args:
- x_eval2D torch.tensor
shape (number of splines, number of samples)
- y_eval2D torch.tensor
shape (number of splines, number of samples)
- grid2D torch.tensor
shape (number of splines, number of grid points)
- kint
the piecewise polynomial order of splines.
- devicestr
devicde
Example
>>> num_spline = 5 >>> num_sample = 100 >>> num_grid_interval = 10 >>> k = 3 >>> x_eval = torch.normal(0,1,size=(num_spline, num_sample)) >>> y_eval = torch.normal(0,1,size=(num_spline, num_sample)) >>> grids = torch.einsum('i,j->ij', torch.ones(num_spline,), torch.linspace(-1,1,steps=num_grid_interval+1)) torch.Size([5, 13])
kan.utils module
- kan.utils.add_symbolic(name, fun)
add a symbolic function to library
Args:
- namestr
name of the function
- funfun
torch function or lambda function
Returns:
None
Example
>>> print(SYMBOLIC_LIB['Bessel']) KeyError: 'Bessel' >>> add_symbolic('Bessel', torch.special.bessel_j0) >>> print(SYMBOLIC_LIB['Bessel']) (<built-in function special_bessel_j0>, Bessel)
- kan.utils.create_dataset(f, n_var=2, ranges=[-1, 1], train_num=1000, test_num=1000, normalize_input=False, normalize_label=False, device='cpu', seed=0)
create dataset
Args:
- ffunction
the symbolic formula used to create the synthetic dataset
- rangeslist or np.array; shape (2,) or (n_var, 2)
the range of input variables. Default: [-1,1].
- train_numint
the number of training samples. Default: 1000.
- test_numint
the number of test samples. Default: 1000.
- normalize_inputbool
If True, apply normalization to inputs. Default: False.
- normalize_labelbool
If True, apply normalization to labels. Default: False.
- devicestr
device. Default: ‘cpu’.
- seedint
random seed. Default: 0.
Returns:
- datasetdic
- Train/test inputs/labels are dataset[‘train_input’], dataset[‘train_label’],
dataset[‘test_input’], dataset[‘test_label’]
Example
>>> f = lambda x: torch.exp(torch.sin(torch.pi*x[:,[0]]) + x[:,[1]]**2) >>> dataset = create_dataset(f, n_var=2, train_num=100) >>> dataset['train_input'].shape torch.Size([100, 2])
- kan.utils.fit_params(x, y, fun, a_range=(-10, 10), b_range=(-10, 10), grid_number=101, iteration=3, verbose=True, device='cpu')
fit a, b, c, d such that
\[|y-(cf(ax+b)+d)|^2\]is minimized. Both x and y are 1D array. Sweep a and b, find the best fitted model.
Args:
- x1D array
x values
- y1D array
y values
- funfunction
symbolic function
- a_rangetuple
sweeping range of a
- b_rangetuple
sweeping range of b
- grid_numint
number of steps along a and b
- iterationint
number of zooming in
- verbosebool
print extra information if True
- devicestr
device
Returns:
- a_bestfloat
best fitted a
- b_bestfloat
best fitted b
- c_bestfloat
best fitted c
- d_bestfloat
best fitted d
- r2_bestfloat
best r2 (coefficient of determination)
Example
>>> num = 100 >>> x = torch.linspace(-1,1,steps=num) >>> noises = torch.normal(0,1,(num,)) * 0.02 >>> y = 5.0*torch.sin(3.0*x + 2.0) + 0.7 + noises >>> fit_params(x, y, torch.sin) r2 is 0.9999727010726929 (tensor([2.9982, 1.9996, 5.0053, 0.7011]), tensor(1.0000))