Wanda[cis] changed the topic of #prjunnamed to: FPGA toolchain project · rule #0 of prjunnamed: no one should ever burn out building software · https://prjunnamed.org · https://codeberg.org/prjunnamed/prjunnamed · logs: https://libera.catirclogs.org/prjunnamed
ldcd has quit [Ping timeout: 264 seconds]
cr1901_ has joined #prjunnamed
cr1901 has quit [Read error: Connection reset by peer]
ldcd has joined #prjunnamed
ldcd has quit [Remote host closed the connection]
ldcd has joined #prjunnamed
ldcd has quit [Ping timeout: 265 seconds]
ldcd has joined #prjunnamed
ldcd has quit [Ping timeout: 255 seconds]
ldcd has joined #prjunnamed
ldcd has quit [Ping timeout: 276 seconds]
juri_ has quit [Ping timeout: 246 seconds]
juri_ has joined #prjunnamed
megng has quit [Ping timeout: 244 seconds]
mwk has quit [Ping timeout: 268 seconds]
mwk has joined #prjunnamed
megng has joined #prjunnamed
<Wanda[cis]> so in current news
<Wanda[cis]> I've been working on defining a formal model of IR semantics that we could verify transformations against
<Wanda[cis]> specifically, one that can represent things like async signals and clock domain crossings, and what is or isn't allowed when those are involved
<Wanda[cis]> I am currently sitting on a reasonably promising model that would even be viable to simulate, and I'm about to write it down fully and try to verify it with some model checking software
<Wanda[cis]> the first core idea is that if you want to model things like setuphold violations (so you can be sure a CDC is valid) without simulating actual timings, you need to have a concept of events occuring "simultanously enough" to cause timing violations or not
<Wanda[cis]> but also events that occur actually simultanously are not a problem, because timing analysis can deal with those easily
<Wanda[cis]> so I'm introducing a concept of a two-phase commit for clock edges
<Wanda[cis]> you can start a clock edge, or any amount of clock edges simuitanously, and all relevant flops will all reiliably capture the previous state; some timesteps later you can finish a clock edge, and all these flops will produce the stable value at their outputs
<Wanda[cis]> but if you start another clock edge (on a different clock net) in a timestep between these two events, attempting to latch the output of the first clock domain will give you an "unstable" state, modeling a setuphold violation
<Wanda[cis]> so you'd be able to model async clock domains as two independent processes with random timings scheduling "start" and "finish" events on clocks
<Wanda[cis]> to actually realize that, and be able to model flops driving clock signals of other flops etc, I use a 6-valued logic of 01XRFU: 0, 1, undef but stable (won't cause timing violations, can be put through freeze), rising, falling, unstable (result of timing violation, will fuck up any circuitry not explicitly made of async-safe primitives)
<Wanda[cis]> now, all this seems like something that someone would've explored before in academia; would anyone happen to recognize this as something familiar?
<Wanda[cis]> tagging jix because it seems like something up your alley
<whitequark> okay, that sounds interesting and valuable for cxxrtl
<Wanda[cis]> oh yeah I want us to have a simulator that actually evaluates all that
<Wanda[cis]> that and a lowering to SAT so you can actually verify things
<Wanda[cis]> there's a bit of a problem in that there's some nondeterminism involved that I see no way to exorcise: sooner or later you end up needing a freeze cell, which has the semantics of freeze X = nondeterministic() ? 1 : 0; freeze a = a for all remaining values
<Wanda[cis]> it's easy enough to model for FV (you just grab a free variable), but in sim you end up needing rand()
<mei[m]> so, when does freeze re-randomize?
<Wanda[cis]> I believe the answer is "every timestep"
<mei[m]> can the value of the freeze change in the middle of a clock cycle, if some other clock domain happened to cause an extra timestep?
<Wanda[cis]> ie. a set of guaranteed-simultanous events caused ultimately by the same clock edge will observe the same value
<Wanda[cis]> but two observations at different timesteps (from unrelated clock domains) could observe mismatched values
<Wanda[cis]> this is a compromise that doesn't really reflect any kind of physical reality, it's just simple enough to model and I don't think we'd have any actual use of a stronger model for optimization purposes
<Wanda[cis]> one could imagine a more refined model, where a freeze only changes its value when it's source clock domain ticks (which would be observable via a simple 1:2 synchronous clock domain crossing), but that means you have to somehow model the concept of a "source clock domain" for any signal, which we currently don't have in the model, and besides it's not even a uniquely-defined concept
<Wanda[cis]> another alternative would be freeze with a clock input, which would also cause no end of problems
whitequark has quit [Ping timeout: 252 seconds]
whitequark has joined #prjunnamed
<ignaloidas> in theory you could have clock ticks that don't commit in this model? that might be a bit annoying for proving properties
<Wanda[cis]> I'm not sure I understand the question?
<Wanda[cis]> you mean, like an unbounded number of timesteps without advancing a clock? well yeah
<ignaloidas> not just advancing, unbounded number of timesteps of the clock being in the rising/falling
<Wanda[cis]> mmm, yes
<ignaloidas> if clock just didn't tick, then other domains can get values from it just fine, but if it's always "in the middle of" ticking it's just poisoning everything relating to it
<Wanda[cis]> hmmm.
<Wanda[cis]> okay, there is a problem here
<Wanda[cis]> but it's a subtle one
<Wanda[cis]> so, the poisoning behavior itself is fine: we do want to model a setuphold violation after all, and that is its exact behavior
<Wanda[cis]> the idea would be that you'd have a special cell type, a synchronizer FF pair, which does not get poisoned
<Wanda[cis]> such a thing forces you to wait two cycles of the target clock, but always gives you a clean 0 or 1 (though perhaps nondeterministically)
<Wanda[cis]> (could give you an X in principle, but I don't think there's any drawbacks to just stuffing an implicit freeze on them)
<Wanda[cis]> the subtle problem is: if the source domain is in the middle of a tick that would change an output from 0 to 1, and multiple ticks of the target domain happen in between the start and finish timesteps of the source, it may be the case that the first sample nondeterministically comes up as 1, and the second as 0, which is a situation that would never occur in real hardware and would break many common CDC primitives
<Wanda[cis]> I see two solutions to this problem
<Wanda[cis]> the first solution is... well, there's an important observation: if we assume nothing at all about the clock inputs (ie. they are completely adversarial), there's already plenty of reasonable assumptions a circuit could make that'd be broken; for example, you could reasonably construct a CDC primitive that assumes clock A is at least twice the speed of clock B
<Wanda[cis]> following that observation, we can just declare that it's the responsibility of the top simulation / FV driver to provide "reasonable" stimulus. for FV you'd model that as a bunch of assumptions (in the assume sense) of how much clock A can advance without clock B advancing in turn. such a thing could prevent this pathological situation.
<Wanda[cis]> as a sidenote, this is also the kind of shim that'd be needed if you want to prove any sort of liveness properties, so that the tool doesn't give you a counterexample where the clocks simply never tick at all
<Wanda[cis]> (I believe this is already a common problem with yosys in multiclock mode)
<Wanda[cis]> the second solution (somewhat of the "obvious rule patch" variety) is to define the semantics of the special synchronizer FF such that when there's an R on its input, it can only flip its nondeterministic output from 0 to 1 and not the other way around until it observes a different input.
<Wanda[cis]> I believe solution 2 is easy and obvious enough to just make it a part of the model; however, I also believe that any realistic sim or FV harness with multiple clocks will need elements of solution 1 anyway for other reasons
<ignaloidas> I'll point out that solution 1 would mean that the compiler needs to know those assumptions to have formal verification of transformations that deal with multiple clocks (whether it makes sense to have such transformations I'm not sure, but I think it would be a weird limitation)
<Wanda[cis]> we do! it's called timing constraints
<Wanda[cis]> I'm not sure if it's in any way useful for synthesis transformations, but we do in fact already have a way (and reason) to have this information in the compiler
<ignaloidas> right, but not at all times/levels? IIRC there is some consideration about the ability to optimize logic without a specific target, that could then be then synthethized for some specific target later
<Wanda[cis]> well, sure
<Wanda[cis]> but I don't really see your point here
<Wanda[cis]> if you don't provide any timing constraits, compiler will operate making no assumptions, and thus avoid any sort of optimizations that would make use of them; likewise, you wouldn't be allowed to rely on such in your FV
<Wanda[cis]> let's also consider another viewpoint for this issue
<Wanda[cis]> what would very long rising state mean?
<Wanda[cis]> at the physical level, this would mean a clock signal with very slow slew rate
<Wanda[cis]> physical flops have constraints on slew rates; if you do a slow enough rising edge, you violate that constraint, and all your signal values are forfeit
<Wanda[cis]> I don't think there's any realistic target where you could fit several clock edges of a fast clock into the rise time of a slow clock without violating either that constraint or pulsewidth constraint
<Wanda[cis]> it's likely also impossible to get such a signal into the FPGA in the first place; any IOB input buffer will either schmitt the waveform into a fast enough single clean edge, or have its own slew constraint violated and oscillate wildly, making it unsuitable for a clock in the first place
<Wanda[cis]> now, matching the physical reality to the circuit semantics under the abstract model is the job of the timing analyzer. the timing analyzer is within its rights to error out when given an infeasible task, for any number of reasons (the usual one being "can't find a way to get rid of negative slack"). further, compiler is under no obligation to give you a working circuit if your timing constraints don't hold.
<Wanda[cis]> so, the compiler can assume there are no unbounded rising edges: if there were, you'd either be in violation of a timing constraint, or the timing constraints specified a slew rate that is impossible to consume on the target, in which case the timing analyzer will reject the circuit.
<Wanda[cis]> (in practice such a timing constraint would likely not be specified explicitly, but be considered implicit as part of the common-sense "the specified input clock is a suitable clock at all" assumption.
ldcd has joined #prjunnamed
ldcd has quit [Ping timeout: 256 seconds]
<mupuf> Wanda[cis]: thanks for the status update!
whitequark has quit [Remote host closed the connection]
whitequark has joined #prjunnamed
soulpheonix has quit [Remote host closed the connection]
<ignaloidas> so I'm thinking about stuff that crosses the boundries - say you figure out that the input to the synchronizer FF is constant, so you can eliminate the synchronizer. To prove that this is valid you want to show that there's no difference in behavior within some number of timesteps, and you'd normally want to choose that number based on the assumptions you can make about the clocks
<ignaloidas> but if compiler doesn't have a bound, the assumptions should be pretty loose, even if there is one, which means that the number of timesteps for this would be very large
<ignaloidas> maybe for such case it's not truly necessary to run through many timesteps, but I could imagine for stuff like maybe moving some logic from one clock to another to reduce FF count it could make sense
ldcd has joined #prjunnamed
ldcd has quit [Ping timeout: 255 seconds]
ldcd has joined #prjunnamed
ldcd has quit [Ping timeout: 264 seconds]
ldcd has joined #prjunnamed
ldcd has quit [Ping timeout: 244 seconds]
soulpheonix has joined #prjunnamed
whitequark has quit [Ping timeout: 255 seconds]
whitequark has joined #prjunnamed
<ignaloidas> also, with regards to realistic targets - quick one that came to mind is a modern SoC with a built-in RTC clock - that will end up with some really low slewrate clocks in the RTC for power savings, together with stuff in GHz region for the CPU cores and stuff
<whitequark> why would the slewrate be low?
soulpheonix has quit [Remote host closed the connection]
soulpheonix has joined #prjunnamed