cr1901 has quit [Read error: Connection reset by peer]
cr1901 has joined #prjcombine
melnary has quit [Remote host closed the connection]
melnary has joined #prjcombine
<azonenberg> Hey mwk i was wondering this, maybe you'd know given all your digging into low level FPGA uarch stuff...
<azonenberg> why do shift register LUTs always seem to have one less input than regular LUTs
<azonenberg> like a xilinx LUT6 turns into a SRL with 2^5 entries and a 5-bit address
<azonenberg> an efinix LUT4 turns into a SRL with 2^3 entries and a 3-bit address
Wanda[cis] has joined #prjcombine
<Wanda[cis]> excellent question
<azonenberg> I guess the one exception i know of is spartan-3 which managed to do 2^4 entry LUTRAM / SRLs with LUT4s
<Wanda[cis]> it's not always, though: pre-LUT6 Xilinx actually had SRL16 in LUT4
<Wanda[cis]> ... right, that
<Wanda[cis]> anyway
<Wanda[cis]> I do not know
<Wanda[cis]> my best guess is that they use dual-phase clock
<Wanda[cis]> and use the other half of the LUT as staging
<Wanda[cis]> if you want to find out, I'd suggest doing a patent search
<Wanda[cis]> I'm almost certain this is the kind of clever stuff that Xilinx would file a patent on
<azonenberg> interesting idea, I havent actually looked into CLB implementations before
<Wanda[cis]> (and if you find out, please do let me know)
<azonenberg> I'm idly curious but not at the point i would want to spend significant amounts of research time on it as of now
<Wanda[cis]> fun fact: I actually wanted to search for this exact thing like half an hour ago, but I'm currently preoccupied
<azonenberg> lol
<azonenberg> yeah i was just reading the efinix trion/titanium primitives guide and noticed SRL8s
<Wanda[cis]> (I looked at Renesas FPGA datasheet and they do the exact same thing)
<azonenberg> I have a trion devkit out for delivery now
<Wanda[cis]> I'd expect that to be like one well-aimed search query FWIW
<azonenberg> I might dig into it later on if i remember
<azonenberg> anyway, semi related, I still want to find time to clean up my ku+ GTY info and upstream it
<azonenberg> i've been too busy to work on that
<azonenberg> it's nowhere near all inclusive but there is enough info (or will be after i tweak stuff) to get a lot of things working
<azonenberg> still a bunch of "unknown, always 16'hblah"
<Wanda[cis]> ... you may get a rather large conflict
<azonenberg> and a few that change with no clear pattern, i know the values for various common configs but havent figured out details
<azonenberg> oh?
<azonenberg> you re'd the bitstream bits for it?
<Wanda[cis]> I've migrated the entire docs from sphinx to mdbook in the meantime
<Wanda[cis]> and to markdown
<azonenberg> lol
<azonenberg> well i prefer markdown to rst so thats fine by me
<Wanda[cis]> also I've actually started the doc skeleton for ultrascale
<azonenberg> reformatting my stuff will be easy enough
<azonenberg> I just want to avoid duplication of work and get my info up there
<Wanda[cis]> oh hey I was right
<Wanda[cis]> https://patents.google.com/patent/US7202697B1/en?q=(shift+register+LUT6)&assignee=Xilinx&before=priority:20100101&oq=(shift+register+LUT6)+inassignee:+Xilinx+before:+2010
<azonenberg> anyway, as of now i've re'd enough to have 10Gbase-R and a custom 5 Gbps 8b10b based protocol working using raw GTYs and no wizard
<azonenberg> using the QPLL
<azonenberg> for the CPLL I need to finish reading this one blog article by some random guy in israel
<azonenberg> who describes a silicon bug that you need an RTL workaround on to make it lock
<Wanda[cis]> this pretty much confirms the whole "using two memory cells as the two latches of the flop" part
<Wanda[cis]> though I thiiink this may be a followup to some earlier patent?
<azonenberg> tl;dr you wait for it to say it's locked, measure the actual frequency against a known frequency reference (BUFG clock)
<azonenberg> if it's locked, good
<azonenberg> if it's off (i guess a multiple or divisor of the real clock because the vco went mutliphase or something?) you reset it, poke some undocumented registers in various ways, and try again
<azonenberg> and eventually it comes up
<azonenberg> QPLL doesnt have this issue and just works
<Wanda[cis]> lovely
<azonenberg> but anyway, i want to extract the relevant bits and algorithm from his post and mention it in my GTY docs then link to his post for all the dirty details
<Wanda[cis]> anyway, no, I haven't done any proper (functional) reversing of the transceivers and I have no plans to
<azonenberg> yeah i just think it makes sense for the info to be in the same overall set of docs
<Wanda[cis]> I will at some point reverse the bitstream mapping and that's it
<azonenberg> since it's kinda important for an end to end foss workflow
<Wanda[cis]> mhm
<Wanda[cis]> I mean
<azonenberg> anyway i'll have a look at updating my stuff to your new doc format in the next couple days if i have a chance
<Wanda[cis]> there's actually a difference here
<azonenberg> ?
<Wanda[cis]> the bitststream and raw routing stuff is not of interest to end user
<Wanda[cis]> the functional description is
<azonenberg> True
<azonenberg> But like, 95% of the primitives are documented by xilinx and it doesnt seem to make a whole lot of sense to have a separate project just for the other 5%
<Wanda[cis]> oh yes
<azonenberg> it makes sense to me to lump it in with bitstream stuff under the "stuff xilinx doesnt talk about" bit
<Wanda[cis]> I just want to do a bit more restructuring to have it on a separate subpage
<azonenberg> because realistically what we are going to end up with long term is a foss IP block that wraps the primitives and hides all the magic undefined constants
<azonenberg> behind some sv attributes for things like 'data rate in gbps' and 'channel insertion loss in db'
<azonenberg> and then the only folks who will have to read the details are those who are poking drp registers for runtime data rate changes etc
<azonenberg> which overlaps quite strongly with bitstream stuff
<azonenberg> Wanda[cis]: also innteresting looking at the patent
<azonenberg> i've never actually looked at circuit level structures of modern FPGAs only the coolrunners
<azonenberg> which were very straightforward, basically 6T SRAM cells with one or both of the inverter loop outputs wired directly to inputs of other combinatorial logic blocks
<azonenberg> (in some cases even on poly lol)
<Wanda[cis]> yeah I haven't looked at them much, but there's a lot of the low-level structure that obviously shows up in the bitstream or even functional behavior
<Wanda[cis]> I expect the modern CLB to be very tightly optimized by hand
<azonenberg> i mean the coolrunner is too
<Wanda[cis]> (there's also a bunch of stuff at the bitstream level that only makes sense if you read the right xilinx patent to understand the low-level structure; I did a lot of that when looking at the I/O blocks and the DCMs)
<azonenberg> its not like parallel rows of standard cells
<azonenberg> they have like 3 different 6t bitcell structures depending on where you are
<azonenberg> this is very clearly not machine layout lol
<Wanda[cis]> btw, fun fact: the Virtex 6 and Virtex 7 CLB are completely functionally identical, and have the exact same bitstream fields; however, the locations of some bitstream bits have been moved a bit between the two, probably as part of manual re-layout for the new process
<azonenberg> interesting. i knew xc6v and xc7v had strong lineage but i didnt realize they were bit for bit identical
<azonenberg> so xc7c is basically just a die shrink of xc6v as far as the actual logic fabric goes?
<azonenberg> xc7v*
<Wanda[cis]> as far as the CLB goes, yes
<azonenberg> with only the io and hard ip being significantly changed?
<Wanda[cis]> the DSP and BRAM, mostly so, though there new undocumented knobs present (at least in BRAM, not sure about the DSP)
<Wanda[cis]> however, they have modified the interconnect structure
<Wanda[cis]> Virtex 6 has single/double/quad and 16-length long wires; Virtex 7 changes some quad lines to hex lines, and has 12-length and 18-length long wires
<Wanda[cis]> also, interconnect geometry; Virtex 6 has a column structure of CLB-INT-CLB-INT-CLB-INT; Virtex 7 has CLB-INT-INT-CLB-CLB-INT-INT-CLB, and they make use of it to share the last mile of clock routing between the pair of adjacent INT columns
<Wanda[cis]> (which then Ultrascale changed further to CLB-INT-CLB-CLB-INT-CLB structure, except when you look closely at the INT tile you kinda see two sub-tiles for the left and right sides)
<azonenberg> interesting