Age | Commit message (Collapse) | Author | Lines | |
---|---|---|---|---|
6 days | STREAM: Re-enable -fltoHEADmain | Birte Kristina Friesel | -1/+1 | |
SDK 2024.1.0 shipped a compiler bug. With 2024.2.0, it is working again. | ||||
6 days | CPU-DPU microbenchmark: use default number of threads per pool | Birte Kristina Friesel | -6/+2 | |
2024-08-23 | Make STREAM Microbenchmark work with SDK 2024.1.0 | Birte Kristina Friesel | -1/+1 | |
For whatever reason, -flto causes a "Heap Full" error even in an otherwise empty program: * thread #1, name = 'DPUthread0', stop reason = fault 1 (Heap Full) * frame #0: 0x80000350 dpu_code`mem_alloc_nolock(size=64) at alloc.c:32:5 frame #1: 0x80000170 dpu_code`main_kernel1 [inlined] mem_alloc(size=64) at alloc.c:52:21 frame #2: 0x80000168 dpu_code`main_kernel1 at add.c:73 frame #3: 0x80000308 dpu_code`main at add.c:42:12 frame #4: 0x80000050 dpu_code`__bootstrap at crt0.c:36:5 I do not know why this is the case. | ||||
2024-07-26 | CPU-DPU: specify SDK version | Birte Kristina Friesel | -1/+5 | |
2024-07-24 | CPU-DPU: correctly log numa_node_rank | Birte Kristina Friesel | -9/+12 | |
2024-07-24 | CPU-DPU dimes24-transfer: decrease input size; do not pin memory for >20 ranks | Birte Kristina Friesel | -6/+8 | |
2024-07-17 | CPU-DPU dimes24-hetsim-transfer: decrease number of repetitions | Birte Kristina Friesel | -3/+3 | |
2024-07-15 | dimes24-hetsim-transfer.sh: set -e within function only | Birte Kristina Friesel | -2/+1 | |
2024-07-15 | dimes24-hetsim-alloc: fix parallel invocation and set -e | Birte Kristina Friesel | -4/+3 | |
2024-07-11 | alloc and transfer microbenchmarks: switch to GNU parallel | Birte Kristina Friesel | -75/+72 | |
2024-06-07 | add DIMES'24 HetSim benchmark scripts | Birte Kristina Friesel | -0/+88 | |
2024-06-07 | CPU-DPU: use numa_alloc to ensure correct data placement | Birte Kristina Friesel | -3/+13 | |
2024-06-07 | CPU-DPU: NUMA support | Birte Kristina Friesel | -3/+139 | |
2024-06-07 | CPU-DPU: Remove unused C2 array | Birte Kristina Friesel | -3/+1 | |
2024-06-07 | Allow benchmarking with #ranks instead of #dpus | Birte Kristina Friesel | -4/+11 | |
2024-05-13 | Print more info, set locale | Marcel Köppen | -1/+4 | |
2024-05-13 | Flush stdout after printing iteration results | Marcel Köppen | -0/+1 | |
2024-05-13 | Test different parameter set | Marcel Köppen | -3/+3 | |
2024-05-13 | Parameterize iteration count and timeout | Marcel Köppen | -1/+4 | |
2024-05-13 | Treat unroll like the other parameters | Marcel Köppen | -7/+6 | |
2024-05-13 | Print hostname and compiler versions | Marcel Köppen | -1/+7 | |
2024-05-13 | Make the linker happy by switching HOST_SOURCES and HOST_FLAGS | Marcel Köppen | -1/+1 | |
2024-05-13 | Add depenceny on bin directory | Marcel Köppen | -2/+2 | |
2024-05-13 | past derf has been naughty and not committed changes in time | Birte Kristina Friesel | -10/+12 | |
2024-03-12 | CPU-DPU: support large data transfers on >1000 DPUs (uint32 is a bit small) | Birte Kristina Friesel | -21/+27 | |
2024-03-04 | STREAM: support SERIAL (default) and PUSH (new) transfers | Birte Kristina Friesel | -9/+46 | |
2024-03-04 | CPU-DPU | Birte Kristina Friesel | -88/+53 | |
2024-03-04 | STREAM: Include date in output file name | Birte Kristina Friesel | -2/+2 | |
2024-02-29 | STREAM: adjust run-rank BL range | Birte Kristina Friesel | -3/+3 | |
2024-02-26 | CPU-DPU: Add n_elements_per_dpu | Birte Kristina Friesel | -3/+3 | |
2024-02-26 | CPU-DPU microbenchmarks: explicitly log number of instructions | Birte Kristina Friesel | -8/+12 | |
2024-02-23 | STREAM: shell scripting is hard, let's go shopping | Birte Kristina Friesel | -0/+1 | |
2024-02-23 | STREAM: no alloc/load overhead for now | Birte Kristina Friesel | -2/+2 | |
2024-02-22 | CPU-DPU: overwrite old logs; skip one-element transfers | Birte Kristina Friesel | -8/+8 | |
2024-02-22 | STREAM: nits | Birte Kristina Friesel | -37/+3 | |
2024-02-22 | CPU-DPU microbenchmark: switch to nanoseconds | Birte Kristina Friesel | -73/+126 | |
2024-02-22 | STREAM: Add idle and stress versions of run-rank | Birte Kristina Friesel | -6/+24 | |
2024-02-22 | STREAM run-rank.sh: -v | Birte Kristina Friesel | -10/+10 | |
2024-02-22 | STREAM: Use nano- rather than microsecond precision internally | Birte Kristina Friesel | -41/+77 | |
2024-02-21 | STREAM output: Add n_elements_per_dpu | Birte Kristina Friesel | -2/+2 | |
2023-12-15 | ... oops | Birte Kristina Friesel | -3/+3 | |
2023-12-15 | WRAM copy: report latency and throughput | Birte Kristina Friesel | -20/+33 | |
2023-12-15 | MRAM-Latency: correctly calculate and label throughput | Birte Kristina Friesel | -5/+5 | |
2023-12-15 | nit | Birte Kristina Friesel | -1/+2 | |
2023-12-15 | MRAM latency benchmark: report DMA latency and bandwidth; merge r/w modes | Birte Kristina Friesel | -84/+81 | |
2023-12-12 | WRAM WiP | Birte Kristina Friesel | -41/+78 | |
2023-12-11 | CPU-DPU run scrips: kill stress and mpstat | Birte Kristina Friesel | -2/+12 | |
2023-12-11 | Microbenchmarks/CPU-DPU: Extend logfiles with CPU overhead of SDK | Birte Kristina Friesel | -0/+126 | |
2023-12-08 | CPU-DPU alloc and transfer microbenchmarks: -search space +cpu load | Birte Kristina Friesel | -149/+140 | |
2023-11-29 | CPU-DPU: include number of ranks in configuration space | Birte Kristina Friesel | -2/+4 | |