Step Law
Loss Heatmap
N(Model Size)
D(Training Token Size)
Calculate
Results
Optimal Token Wise BatchSize: -
Learning Rate: -
Model Type
Moe
Dense
N(Model Size)
Na(Activate Parameters)
D(Training Token Size)
Show
Please select parameters and click "Show"