summaryrefslogtreecommitdiff
path: root/Microbenchmarks
AgeCommit message (Collapse)AuthorLines
12 daysSTREAM: Re-enable -fltoHEADmainBirte Kristina Friesel-1/+1
SDK 2024.1.0 shipped a compiler bug. With 2024.2.0, it is working again.
12 daysCPU-DPU microbenchmark: use default number of threads per poolBirte Kristina Friesel-6/+2
2024-08-23Make STREAM Microbenchmark work with SDK 2024.1.0Birte Kristina Friesel-1/+1
For whatever reason, -flto causes a "Heap Full" error even in an otherwise empty program: * thread #1, name = 'DPUthread0', stop reason = fault 1 (Heap Full) * frame #0: 0x80000350 dpu_code`mem_alloc_nolock(size=64) at alloc.c:32:5 frame #1: 0x80000170 dpu_code`main_kernel1 [inlined] mem_alloc(size=64) at alloc.c:52:21 frame #2: 0x80000168 dpu_code`main_kernel1 at add.c:73 frame #3: 0x80000308 dpu_code`main at add.c:42:12 frame #4: 0x80000050 dpu_code`__bootstrap at crt0.c:36:5 I do not know why this is the case.
2024-07-26CPU-DPU: specify SDK versionBirte Kristina Friesel-1/+5
2024-07-24CPU-DPU: correctly log numa_node_rankBirte Kristina Friesel-9/+12
2024-07-24CPU-DPU dimes24-transfer: decrease input size; do not pin memory for >20 ranksBirte Kristina Friesel-6/+8
2024-07-17CPU-DPU dimes24-hetsim-transfer: decrease number of repetitionsBirte Kristina Friesel-3/+3
2024-07-15dimes24-hetsim-transfer.sh: set -e within function onlyBirte Kristina Friesel-2/+1
2024-07-15dimes24-hetsim-alloc: fix parallel invocation and set -eBirte Kristina Friesel-4/+3
2024-07-11alloc and transfer microbenchmarks: switch to GNU parallelBirte Kristina Friesel-75/+72
2024-06-07add DIMES'24 HetSim benchmark scriptsBirte Kristina Friesel-0/+88
2024-06-07CPU-DPU: use numa_alloc to ensure correct data placementBirte Kristina Friesel-3/+13
2024-06-07CPU-DPU: NUMA supportBirte Kristina Friesel-3/+139
2024-06-07CPU-DPU: Remove unused C2 arrayBirte Kristina Friesel-3/+1
2024-06-07Allow benchmarking with #ranks instead of #dpusBirte Kristina Friesel-4/+11
2024-05-13Print more info, set localeMarcel Köppen-1/+4
2024-05-13Flush stdout after printing iteration resultsMarcel Köppen-0/+1
2024-05-13Test different parameter setMarcel Köppen-3/+3
2024-05-13Parameterize iteration count and timeoutMarcel Köppen-1/+4
2024-05-13Treat unroll like the other parametersMarcel Köppen-7/+6
2024-05-13Print hostname and compiler versionsMarcel Köppen-1/+7
2024-05-13Make the linker happy by switching HOST_SOURCES and HOST_FLAGSMarcel Köppen-1/+1
2024-05-13Add depenceny on bin directoryMarcel Köppen-2/+2
2024-05-13past derf has been naughty and not committed changes in timeBirte Kristina Friesel-10/+12
2024-03-12CPU-DPU: support large data transfers on >1000 DPUs (uint32 is a bit small)Birte Kristina Friesel-21/+27
2024-03-04STREAM: support SERIAL (default) and PUSH (new) transfersBirte Kristina Friesel-9/+46
2024-03-04CPU-DPUBirte Kristina Friesel-88/+53
2024-03-04STREAM: Include date in output file nameBirte Kristina Friesel-2/+2
2024-02-29STREAM: adjust run-rank BL rangeBirte Kristina Friesel-3/+3
2024-02-26CPU-DPU: Add n_elements_per_dpuBirte Kristina Friesel-3/+3
2024-02-26CPU-DPU microbenchmarks: explicitly log number of instructionsBirte Kristina Friesel-8/+12
2024-02-23STREAM: shell scripting is hard, let's go shoppingBirte Kristina Friesel-0/+1
2024-02-23STREAM: no alloc/load overhead for nowBirte Kristina Friesel-2/+2
2024-02-22CPU-DPU: overwrite old logs; skip one-element transfersBirte Kristina Friesel-8/+8
2024-02-22STREAM: nitsBirte Kristina Friesel-37/+3
2024-02-22CPU-DPU microbenchmark: switch to nanosecondsBirte Kristina Friesel-73/+126
2024-02-22STREAM: Add idle and stress versions of run-rankBirte Kristina Friesel-6/+24
2024-02-22STREAM run-rank.sh: -vBirte Kristina Friesel-10/+10
2024-02-22STREAM: Use nano- rather than microsecond precision internallyBirte Kristina Friesel-41/+77
2024-02-21STREAM output: Add n_elements_per_dpuBirte Kristina Friesel-2/+2
2023-12-15... oopsBirte Kristina Friesel-3/+3
2023-12-15WRAM copy: report latency and throughputBirte Kristina Friesel-20/+33
2023-12-15MRAM-Latency: correctly calculate and label throughputBirte Kristina Friesel-5/+5
2023-12-15nitBirte Kristina Friesel-1/+2
2023-12-15MRAM latency benchmark: report DMA latency and bandwidth; merge r/w modesBirte Kristina Friesel-84/+81
2023-12-12WRAM WiPBirte Kristina Friesel-41/+78
2023-12-11CPU-DPU run scrips: kill stress and mpstatBirte Kristina Friesel-2/+12
2023-12-11Microbenchmarks/CPU-DPU: Extend logfiles with CPU overhead of SDKBirte Kristina Friesel-0/+126
2023-12-08CPU-DPU alloc and transfer microbenchmarks: -search space +cpu loadBirte Kristina Friesel-149/+140
2023-11-29CPU-DPU: include number of ranks in configuration spaceBirte Kristina Friesel-2/+4