Mesa Hostmot2 Xenomai4 OOB networking support#4199
Conversation
|
BTW: I know, rootless xenomai is still todo. I will do that after #4132 is in. It will conflict otherwhise. |
|
@pcw-mesa What do you think aboat this PR? Do we want two drivers? hm2_eth.c and hm2_eth_net.c? Shouldnt be better implement OOB to hm2_eth.c? To @hdiethelm : I can help with testing 7i96s. |
This way, I can cleanly separate the two Ethernet implementations and not break the existing one by accident. Also hm2_eth.c is already quite big and I did not want to blow it up even more. But you are right, having one module and instead using for example rt_eth_type=evl,posix,..., similar to board_ip=ip[,ip…] would be an alternative and it would even allow to only use one xenomai capable network card if you have a second board that is not that important and connected to a non-xenomai capable NIC. I will look into it.
Thanks! I have also a 7i96s, this PR is already tested both with posix and evl networking. But a second test is always good to have, not all configs behave the same. |
This variant will be easier for switching "posix and evl". If you want evl to be used normally, you will need to add it to PNCconf. For most Mesa card users, installing a LinuxCNC distribution is the best they can do. |
|
ChatGPT is helping me with this PR. We have these questions: Before I start preparing the environment, I have a few questions:
|
|
You will find most information here: For PREEMPT_RT, i use Debian Trixie, stock preempt kernel. Intel I350 uses the igb driver. For e1000e, you need quite an old card. |
f2f5ab0 to
32b5122
Compare
|
So, I went back to having one component with two network backends:
Xenomai4 EVL mode: |
32b5122 to
56d4a15
Compare
|
I'm currently stuck on getting the EVL kernel working: The "board_rtnet" parameter looks good. Do you think it would be possible to add: board_rtnet=evl - this configuration doesn't make sense when: board_rtnet=posix - does this configuration make sense when: The less the user has to configure, the better. |
I uploaded a non-rt variant and the matching config. See: hdiethelm/xenomai4-linuxcnc#1 Thanks for the good description.
However, I have to check if LinuxCNC is running in Xenomai4 mode when you configure |
I respect your opinion, but I think LinuxCNC should be friendly. Should be possible check if Linux is running in Xenomai4 mode and check network card supports OOB networking in PNCconf? For example pressbutton next checkbutton "EVL mode" I know it's too early to ask this. EVL needs to be tested first. But I'm interested. |
I believe if you invest the effort to get xenimai4 running, setting That said, if we discover that Xenimai4 really performs better than PREEMPT_RT for many users, then we can still invest some time in:
For now, it's mostly a toy project for me to learn Xenimai4. At least on my setup, it doesn't perform much better than PREEMPT_RT but this old PC anyway runs nearly perfect, max latency of 20us including Ethernet is absolutely unnecessary for a Mesa card but still fun... ;-) It will be nice to have a report from you if it works well on your setup. Never the less, having this part also merged to master makes it testable for many people. I will update the hm2_eth doc in this PR after testing, I don't like to change it many times based on feedback about the implementation. |
|
I think options like OS detection belong more in the configuration utilities or maybe startup scripts I am really curious what Ethernet performance is possible with Xenomai/OOB networking, as trying to get higher |
Agreed.
I have to check tomorrow, but as I remember, I have a loop time of 50us avg / 70us max. with a well tuned PC and an Intel I350. BTW: For a long time, the HAL time measurements where in cycles. This PR changed all to ns: #4082 |
|
Thanks already for testing. I will look into it in detail tomorrow. |
|
|
So, what I can tell so far:
Reason why all this effort above: mesa sends a package and waits for the response. Due to the waiting, it is way more efficient latency wise to run all including the ethernet task and interrupt on one core. Otherwise, the data has to be copied from one core to main memory and back to the other. At least in my setup, the jitter got reduced by a lot doing this. I use the eth-rt-affinity at startup in I wanted to create a doc and might be even a tool to do it from my tuning experience since some time but a few other projects went between... ;-) @pcw-mesa My timing info: |
|
@zz912 I don't like it that much but it seams that this is the only way I can run a long duration init function in the real time context. I have a chicken and egg problem:
|
408596c to
1aade8f
Compare
Looks like you use |
You are right. Thanks. I work with ChatGPT and it(he) change initf => addf. |
ChatGPT lies all the time... :-D That's why I normally don't use AI stuff except for generating samples where I afterwards check the real doc and then integrate it by hand. |
|
On the other hand, without ChatGPT I wouldn't have a chance to test your PR. I use ChatGPT as a teacher to explain what you do, what you need, and how some things work. I am mechanical designer. Programing is only my hobby. |
Yes for this it's not to bad. Now it works, no |
|
Actual terminal: |
|
Nice, now there is no error any more from hm2_modbus, so one issue solved. :-) Can you try to set the kernel command line suggested in #4199 (comment) and the rerun your timing tests just for Xenomai? If you have issues, you can also just ask me. |
Shure, all fine.
There is also hardware-irq-coalesce-tx-usecs, at least for my card. Set this to 0 as well. You can check the settings with: |
It is not possible for me: |
|
Hi Hannes, Thanks for sharing the eth-rt-affinity script. I have a question about the intended long-term usage. Is the script meant to be run manually before starting LinuxCNC, or do you recommend integrating it into the system startup (for example via systemd or /etc/network/interfaces)? I'm trying to understand what you consider the preferred way for a permanent LinuxCNC installation. Do you consider all parts of the script (rx-usecs, queue count, IRQ affinity and realtime priority) equally important, or are some of them just optional optimizations? Here is my Wireshark after coalising turn off: |
That's depending on the network card. I did the script for me to use it in /etc/network/interfaces: #4199 (comment) All settings in the script improved it. Especially the IRQ affinity. It works in my setup for a realtec card and the I350. Some of the settings are not available for all the cards but if they are not available, it is fine. I might create one day a PR but not now, to many open projects... ;-) |
Seams to be better now. But can you post also a screenshot from Mesa PC test? Wireshark was just once to see if the timing from Wireshark matches the Mesa PC test. It does, so all fine there. I think a proper tuning tool that sets the kernel parameter but also the ethernet card correctly would help a lot. Even thought I use linux since ages, it took me some time to set it all up and of course, I also missed the coalesce parameter in the manual... :-D Took me some time to figure that out. |
That doesn't really answer the question. A few bytes more or less should not change a lot. I found it in the manual, looks like the communication happens in the background and should not block: I see the sporadic 112 byte packages in the wireshark, I guess these are the additional modbus packages.
You mean servo-thread.tmax / servo-thread.time? It's not so bad, if its over 1000000, it fails. So you have to measure it to be below, right? Additionally a check for read.time/tmax and write.time/tmax helps to see if the issue is there or somewhere else. |
|
Hmm, i have to check myself. I rebased to master, I dont remember if I tested it after. I would have to test it tomorrow. Can you test withouth smp_affinity? Note: Interrupt numbers can change from boot to boot. If you take my script, this should work. If not, you have to do it manually if your PC has dynamic interrupt numbers. Took me by surprise too... ;-) |
PC has dynamic interrupt numbers - ChatGPT warning me too. I did not reboot PC Can you test withouth smp_affinity? - how? ChatGPT helped me with git history. This is for you:
|
|
Ah ChatGPT... ;-) No smp_affinity: |
|
Original value for /proc/irq/26/smp_affinity was 3. So I made test with echo 3 | sudo tee /proc/irq/26/smp_affinity and it did not help. ChatGPT wanted cat /proc/irq/26/smp_affinity before echo. :-) |
|
ChatGPT has no clue. The only changes coming in from master where not related to this PR. It tested it again anyway, no difference. Probably you did something wrong while tuning Ethernet. Or it was just a random glitch that hit. While Tuning: Always reboot after setting random parameters. If you set something, it might get stuck until you reboot. Then set what worked again and test again. Only set one parameter at a time, test, next parameter. Question to ChatGPT: Why echo 3 ? ;-) Did you try my eth-rt-affinity script and my recommendations for the kernel command line? At least for me, it reduced the jitter by a lot, but it might be hardware dependent. |
ChatGPT studied your script before we run it. ChatGPT explain me, how your script works. I made cat /proc/irq/26/smp_affinity before we made anything. Results was 3. Then I made: I made tests with: |
|
Here, it was 48%: #4199 (comment) So something must have changed. I don't think it's the rebase. But who knows, at least for me, it was no issue and looking at the changes, it is all fine. You can try this branch, i restored the old version if you want to try: https://github.com/hdiethelm/linuxcnc-fork/tree/hm2_eth_oob_v5_before_rebase_2 |
|
Do you have something with modbus? We should have similar conditions. |
Sadly no. But you can disable modbus for your setup, so it is also similar, a few messages before, I sent you something that should work, just without spindle. As it looks to me, you changed something at the same time while pulling the new branch and that generated the issue you are having now. There was no change in modbus or OOB Networking, only some change in other code from other people, not really connected to latency. |
|
I have to take a break from testing or my wife will divorce me. So I apologize for not responding right away. Maybe I'll get back to it on Monday. |
|
I need a break to, all fine... ;-) |
|
Did you notice above? hm2_modbus.0: error: hm2_pktuart_queue_read_data() returned an error: -1 |
|
Yes, makes no sense to me. Do you had that only once after setting ethernet affinity? Did you reboot and it went away? |
|
Would you be willing to buy this: https://a.aliexpress.com/_EHftnL8 I would buy the same for testing together. |
|
The device itsself is cheep. But I would have to connect it to my CNC / flash a new firmware and revert it afterwards. I think my VFD even supports modbus but at the time I figured just adding a few wires and control it analog is easier. Now I did not find the original firmware which was on the mesa card when I bought it. All available files have different pin info. Best would be to just buy a new mesa card for testing so I don't brick my machine but that's not cheep. At the end, the chances are high that it works well on my machine and it's a different issue... Can you reproduce this Ideally as zip, so the thread doesn't explode. |
|
"Now I did not find the original firmware which was on the mesa card when I bought it. All available files have different pin info." unless you have custom firmware, it should be in the distributed firmware images Also, you can use mesaflash's --backup-flash and --restore-flash commands |
This is my pin file: I did not find anything in http://www.mesanet.com/software/parallel/7i96s.zip that fully matches my pin file. I got my card from EUSurplus. Is there any git repo for the firmware / changelog / release note? I just discovered that the file i downloaded today has changed compared to the old one. I could of course trying a new firmware, rewiring stuff and then restoring the backup and hope nothing breaks but for now it's to much effort. |
|
That is the standard (and default) 7i96s_d.bin firmware The only difference is that the pin file in the distribution was made with a older version There would be no wiring/hal /ini file changes to use firmware that supports MODBUS |
Thanks! Is there any versioning in the firmware? It's hard to see if it is older otherwise... ;-) |
|
There are version in the individual modules but no overall version. |
Hmm, I don't see a version change if I diff these two .pin files. But the clock increased / _EncA0... is gone. Anyway, I don't need encoders, so fine. @zz912 Can you check the kernel messages after a test? I think there is a new issue after updating from 6.12.85 to 6.12.90. I only see it with my ping tool, not with linuxcnc but might be it is different with your setup. Just working on a bug report to the xenomai4 maintainer... I can create a 6.12.85 kernel for you without preemt-rt for future testing. If you have something like this, can you post it? |
|
The MPG encoders are always there with INM modules, its just that older versiona of mesaflash don't report them. Clock speed changes don't affect operation in any significant way |













This PR adds Xenomai4 EVL out of band networking support for Mesa Hostmot2.
Out of band networking is basically a fast path inside the xenomai real time kernel that enables networking without involving the normal kernel.
Due to some users might want to use Xenomai4 with the standard kernel networking, I decided to create a new component calledhm2_eth_evlwhilehm2_ethbehaves exactly like before.Common code is left inhm2_eth.cand network specific code is moved tohm2_eth_net.candhm2_eth_net_evl.c.The linker is used to link
hm2_eth_evlandhm2_ethwith two different network implementations and the same common code.To select the mode, you can use:
board_rtnet=posixorboard_rtnet=evl, where the default is posix, so it is a non-breaking change.With
board_rtnet=evl,initf hm2_eth.realtime-init servo-threadneeds to be added to the .ini file to initialize the realtime network inside the realtime context.A few changes in the existing code where performed:
Things still open:
fetch_hwaddrtohm2_fetch_hwaddrfor example to avoid conflicts?Loadinghm2_eth_evlandhm2_ethat the same time would probably create a runtime linker issue which generates undefined behavior.It's a bit hard to review due to moved code.
git diff master:src/hal/drivers/mesa-hostmot2/hm2_eth.c hm2_eth_oob_v5:src/hal/drivers/mesa-hostmot2/hm2_eth_net_posix.chelps.This two PR's need to be merged first:
#4217
#4218