CANN/mat-chem-sim-pred IPDT批量滚动评分基准测试
PidIpdtBatchRolloutScore Benchmark Report【免费下载链接】mat-chem-sim-pred面向工业领域聚焦计算仿真、预测两大核心场景构建面向流程工业机理数据双轮驱动的领域计算层推动AI for Science在材料化学领域的深度应用。项目地址: https://gitcode.com/cann/mat-chem-sim-predThis document records the measured CPU/NPU behavior ofPidIpdtBatchRolloutScore.EnvironmentNPU host:node202Device:Ascend910B3, device id0CANN:/usr/local/Ascend/ascend-toolkit/latestCPU baseline: benchmark program multi-thread modeBuild:-DCMAKE_BUILD_TYPERelease -DSOC_VERSIONAscend910B3 -DRUN_MODEnpuMethodThebenchmark_pid_ipdt_batch_rollout_score_aclnnprogram builds an in-process multi-thread CPU reference (ComputeRange, the same integrator recurrencey[k1] y[k] b*u[k-delay]), runs the NPU operator on the same inputs and reportsmax_abs_err,max_quality_rel_errandbest_idx_diff_count. The pass conditions arenpu_zero_score_count 0, per-candidate scores matching the CPU reference to float32 precision, and anybest_idxdifferences being near-ties (the chosen candidates metric rel-err stays small), matching the behavior of the verified FOPDT operator.CorrectnessThe IPDT kernel differs from the verified FOPDT kernel only in the state recurrence (thea*ydecay term is dropped). The candidate-axis SIMD width does not change the numerics (each tile is independent), so the accuracy profile matches FOPDT: NPU output equals the CPU reference within float32 rounding.Measured onnode202 / Ascend910B3, B128, sim_steps1024, candidate_tileC,npu_zero_score_count0:candidatesmax_abs_errmax_quality_rel_errbest_idx_diff_count10242.4e-41.5e-6040961.01.69e-31163841.5e-33.3e-51Themax_abs_err1at 4096 is the discrete settling-time metric crossing the settle band one sample later on NPU than on CPU for a single near-tie loop (dt1- abs diff 1); the corresponding metric rel-err stays 2e-3. The reference FOPDT operator shows the same behavior at this candidate count (max_abs_err1, max_quality_rel_err4.5e-3, best_idx_diff_count1), so IPDT is within the accepted baseline.Measured timingnode202 / Ascend910B3, B128, sim_steps1024, candidate_tileC, CPU 64-thread parallel reference.candidatesCPU parallel msNPU kernel msNPU kernel vs CPU102432.57.454.36x4096122.124.74.95x16384426.693.84.55xAgainst a 192-thread CPU reference the speedup is 3.8-4.0x (the wider CPU pool narrows the gap).NotesThe kernel reuses the FOPDT wide-lane (kLane768) and fused inner-loop optimizations unchanged; the only algorithmic difference is the integrator recurrence, which removes one vector multiply per timestep.【免费下载链接】mat-chem-sim-pred面向工业领域聚焦计算仿真、预测两大核心场景构建面向流程工业机理数据双轮驱动的领域计算层推动AI for Science在材料化学领域的深度应用。项目地址: https://gitcode.com/cann/mat-chem-sim-pred创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考