ESP-Now BATMAN scaling work

A while back I built a little test rig to help with checking how my mesh network scales.

A test set up of two or three nodes simply doesn't show up the issues you get when there's twenty or more all talking.

Since I've managed to rework my code into an easily used library I'm inching towards releasing it and that means lots of testing so I'm sure it's not a complete dud.

The test rig allows me to have twenty-two nodes on my desk, eight WeMos D1 mini on USB and fourteen ESP-01 in the rig. Programming all these is quite time consuming so I tweaked the code on the USB connected nodes and when I felt it had reached an interesting point rolled it out to the ESP-01s.

When I first built the rig I was getting a lot of 'collisions', where my code detected another packet arriving while I was still working on the last one. This is done by setting and un-setting a flag to indicate 'node is currently doing something else'. I quickly found I'd made a simple coding error, not always un-setting the flag, and fixing this made all these 'collisions' go away.

What I did see however was a LOT of failures to send or forward packets once the mesh got to about 20 nodes. ESP-Now includes an acknowledgement so you can tell if a packet has been received, but complete failure to send just doesn't happen at a low node count.

ESP-Now does not use broadcasts when you send packets to all a node's neighbours/peers, it iterates through them. There's no documentation on how this is done, so I'll need to get a WiFi sniffer out to work out the exact behaviour.

Regardless this means the number of packets in the air goes up with the square of the number of neighbours when neighbours forward packets to all their neighbours, even though each one only forwards it once. This makes it believable there would be the odd in-air collision, and this is reflected in some missing ACKs, but I was seeing far too many failures to send.

I started coding in my own CSMA/CD re-transmission algorithm but had no luck reducing the failures and there's no documentation about what a complete failure to send means.

Many hours of fiddling around got me nowhere until it dawned on me this was only happening with the smallest packets, larger ones would be sent reliably even if they are not always received. As the code for this is identical to the other packet types, in a frustrated random guess I increased the packet size and it cured the problem.

I can only assume the ESP-Now library does its own CSMA/CD re-transmission and this breaks down with small packet sizes. I'll have to change my code so it pads out to some minimum size. This appears to be an ESP-Now payload of about 30 bytes. A payload of 18 bytes fails consistently once you've got 20 nodes.

With the boxed outdoor nodes I can add another ten to the test. Once I've done the code changes I'll add them in and see how things behave. This gets me very close to my target mesh size.

After this I need to check the routing algorithm again as I can see that seems to break down with lots of valid choices, flapping badly. Likewise the time sync protocol has got messy and is doing a poor job of syncing above about eight nodes. I have tried to make it discriminate and pick the sync packet that's traversed the fewest hops but clearly this is not working.

This is all quite time consuming to test but I can feel progress being made and I want to ensure when I release the library it stands up to scrutiny.

Yesterday was a good day

I've been fiddling with my B.A.T.M.A.N.-inspired mesh network code over the last few weeks and finally reached the point where I couldn't put off turning it into an Arduino library any more.

Sometime I'll learn from my past mistakes and start from the position of writing a library, but this project was not that time.

Now I'm in a position where an Arduino sketch needs just four lines of code to be a functional mesh network node that routes traffic.

  • Import the library eg. #include <EspNowMesh.h>
  • Declare an object eg. EspNowMesh mesh;
  • In setup() start the mesh eg. mesh.initEspNow();
  • In loop() keep it ticking over eg. mesh.meshHousekeeping();
As my goal is to make this simple usable code other people can work with I'm happy with this. My last mesh network code was too byzantine to be usable, even once wrapped up as a library.

I still need to refactor things to match common Arduino style conventions.

For example "initEspNow()" should really be "begin()" and so on. Also the functions I have for putting data into packets and retrieving it are very much single purpose kludgey things rather than something I'd want to offer for real use.

Maybe I'll also change the name of the class completely, I'm not sure I like EspNowMesh as a name

I've written Arduino libraries before but this was harder work. The big battle was around callback functions, which are sprinkled through the ESP8266 WiFi and ESP-Now libraries my library relies on.

Using class member functions as callbacks, or even worse, using class member functions as callbacks for C libraries is a real roadblock if your C++ skills are beginner level. When it's all a monolithic sketch this stuff 'just works'. Make it a class in a library and it's suddenly very broken. This is why I'm prone to just writing a flat Arduino sketch as a proof of concept and worrying about making it re-usable later.

I may do a couple of blog topics with the workarounds I did as other people may find them useful. Simple solutions did not jump off the page of a Google search.

Wowstick

Recently I treated myself to a Xiaomi Wowstick 1F+ as Banggood were doing big discounts around Black Friday.

These are a bit of a silly thing, a powered 'precision' screwdriver that's like a big fat pen.

Mostly they're very cute and come packaged like some kind of Apple product, all in separate little white boxes. They're the kind of thing that makes a cool gift for somebody who tinkers and even come with a carrying case that's like something from a sci-fi movie.

Notably the set I bought comes with many many high quality precision bits, perfect for taking modern consumer electronics to bits. I just used this to have the internal cover off my smartphone so I could replace the failing battery and it's already saved me from having to go and hunt a suitably small set of tools down.

There are a few variants available, I thought I was getting the one with a charging base but didn't realise this model was different, so check before order.

Squeezing a quart into a pint pot

The first mainstream appearance of the ESP8266 was the ESP-01 board and it's a breadboard unfriendly horror.

It has eight pins and only two of those are nominally GPIO, but even these are compromised as they have to be pulled high to enable normal bootup. You pull GPIO0 low to program the board.

I've got twenty ESP-01s I want to use. Mostly because I've already got them but also because barring some weird ESP8255 packages they're about as small as these things get.

For my application I need a GPS module, Infrared receiver, pushbutton, status LED and piezo buzzer and I've achieved this with a little care over pin choice.

First with the GPS module the ESP only needs to receive data. In principle you can send commands to the module but I don't need to and that saves the TX pin, GPIO1. It's not often you see this referred to but it's just a variant on the usual Arduino Serial configuration...
Serial.begin(9600, SERIAL_8N1, SERIAL_RX_ONLY);
Luckily the IR receiver is a pullup device, inverting the IR signal so that can be pretty safely connected to GPIO0. In the unlikely event you're unlucky enough to receive an IR pulse at the very moment you power the board on you'll notice as it has a startup sequence involving the buzzer and LED.

For the pushbutton I've combined it with the LED on GPIO2. The GPIO has a 10K pullup resistor, the button connects the GPIO to ground and I check for a low pin state. This is a very conventional setup for a button.

To double this up with the LED, it's connected in parallel with its cathode (normally connected to ground) connected to the GPIO and the anode to Vcc. If you push the switch then it connects the cathode to ground and the LED lights. However if you reconfigure the GPIO as an output and drive it low the LED also lights.

You can't use the button while the LED is lit but some simple logic swapping it from an input to output as needed works around this.

Which leaves the piezo buzzer but now that's enough free GPIOs to connect everything.

There's a lot of understandable negativity around ESP-01s in the community and I'm not sure I'd choose an ESP-01 afresh but you can use them in more than single use applications with some care.