<JamesMunns[m]>
I'm surprised CPU designers don't just have a memcpy instruction at the hardware level these days
<Kin-o-matix[m]>
there's a peripheral that does this
<JamesMunns[m]>
yeah, most chips can do mem2mem copies with DMA
<Kin-o-matix[m]>
dma may as well be called memcpy engine
<JamesMunns[m]>
yeah, but you need to worry about stride there too 😃
<Kin-o-matix[m]>
* memcpy engine/peripheral
<JamesMunns[m]>
if you do a 1-byte aligned dma transfer, it's 4x the bus ops that a 4-byte aligned transfer would be
<Kin-o-matix[m]>
very true
<Kin-o-matix[m]>
the key to this is to never have unaligned things and instead throw exceptions when it happens, fix the janky software by having the hardware give the middle finger to this sort of thing
<Kin-o-matix[m]>
bob widlar style
<JamesMunns[m]>
I mean, [u8; 64] is not guaranteed to be aligned
<JamesMunns[m]>
it is only 1-byte aligned
<JamesMunns[m]>
it's not necessarily janky software, just the rules as agreed at a lang level.
<JamesMunns[m]>
or if I want to copy from the back 13 bytes of a slice: that's a memcpy
<Kin-o-matix[m]>
i feel like rust needs the equivalent of picolibc/newlibc for core, a size focused implementation of this stuff
<Kin-o-matix[m]>
it just does not seem size is really considered
<KevinPFleming[m]>
<JamesMunns[m]> "I'm surprised CPU designers don..." <- Most regular CPUs do, but apparently not small Cortex
<KevinPFleming[m]>
<Kin-o-matix[m]> "is there a reason the compiler..." <- An open coded loop in my application is about 70% smaller
<TomB[m]>
Yeah the memcpy I see when building with zephyr is like 28 bytes
<TomB[m]>
So, that seems massive and surprising
<KevinPFleming[m]>
still true with rust 1.88; if i build using the open-coded loop the .text section is more than 1K smaller than if use the 'copy_within' function from the slice traits
Alex[m] has joined #rust-embedded
<Alex[m]>
<JamesMunns[m]> "See Dion's `optimize-for-size..." <- Isn’t it now `optimize(size/speed/none)`? I remembered it so.