I did this test back in February, but can now finally publish the results! This little SBC is
definitely going to be a hit in the ISP industry. See more information about it here.

PC Engines develops and sells small single board computers for networking
to a worldwide customer base. This article discusses a new/unreleased product which PC Engines has
developed, which has specific significance in the network operator community: an SBC which comes
with three RJ45/UTP based network ports, and one SFP optical port.

Due to the use of Intel i210-IS on the SFP port and i211-AT on the three copper ports, and due to
it having no moving parts (fans, hard disks, etc), this SBC is an excellent choice for network
appliances such as out-of-band or serial consoles in a datacenter, or routers in a small business
or home office.

Detailed findings

The APU series boards typically ship with 2GB or 4GB of DRAM,
2, 3 or 4 Intel i211-AT network interfaces, and a four core AMD GX-412TC (running at 1GHz). This
review is about the following APU6 unit, which comes with 4GB of DRAM (this preproduction unit
came with 2GB, but that will be fixed in the production version), 3x i211-AT for the RJ45
network interfaces, and one i210-IS with an SFP cage.

One other significant difference is visible – the trusty rusty DB9 connector that exposes the first
serial RS232 port is replaced with a modern CP2104 (USB vendor 10c4:ea60) from Silicon Labs which
exposes the serial port as TTL/serial on a micro USB connector rather than RS232, neat!

Transceiver Compatibility

Optics

The small form-factor pluggable (SFP) is a compact, hot-pluggable network interface module used for
both telecommunication and data communications applications. An SFP interface on networking hardware
is a modular slot for a media-specific transceiver in order to connect a fiber-optic cable or
sometimes a copper cable. Such a slot is typically called a cage.

The SFP port accepts most/any optics brand and configuration (Copper, regular 850nm/1310nm/1550nm
based, BiDi as commonly used in FTTH deployments, CWDM for use behind an OADM). I tried 6 different
vendors and types, see below for results. All modules worked, regardless of vendor or brand.

I tried 6 different SFP modules, all successfully. See the links in the list for an output of an
optical diagnostics tool (using the SFF-8472 standard for SFP/SFP+ management).

Each module provided link and passed traffic. The loadtest below was done with the BiDi optics
in one interface and a boring RJ45 copper cable in another. It’s going to be fantastic to be able
to use these APU6’s in a datacenter setting as remote / out-of-band serial devices, specifically
nowadays where UTP is becoming a scarcity and everybody has fiber infrastructure in their racks.

Vendor Type Description Details
Finisar FTLF8519P2BNL-RB 850nm duplex sfp0.txt
Generic Unknown(no DOM) 850nm duplex sfp1.txt
Cisco GLC-LH-SMD 1310nm duplex sfp2.txt
Cisco SFP-GE-BX-D 1490nm Bidirectional (FTTH CPE) sfp3.txt
Cisco SFP-GE-BX-U 1310nm Bidirectional (FTTH COR) sfp3.txt
Cisco BT-OC24-20A 1550nm OC24 SDH sfp4.txt
Finisar FTRJ1319P1BTL-C7 1310nm 20km (w/ 6dB attenuator) sfp5.txt

Network Loadtest

The choice of Intel i210/i211 network controller on this board allows operators to use Intel’s
DPDK with relatively high performance, compared to regular (kernel) based routing. I loadtested
Linux (Ubuntu 20.04), OpenBSD (6.8), and two lesser known but way cooler DPDK open source
appliances called Danos (ref) and VPP (ref)
respectively.

Specifically worth calling out that while Linux and OpenBSD struggled, both DPDK appliances had
absolutely no problems filling a bidirectional gigabit stream of “regular internet traffic”
(referred to as imix), and came close to line rate with “64b UDP packets”. The line rate of
a gigabit ethernet is 1.48Mpps in one direction, and my loadtests stressed both directions
simultaneously.

Methodology

For the loadtests, I used Cisco’s T-Rex (ref) in stateless mode,
with a custom Python controller that ramps up and down traffic from the loadtester to the device
under test (DUT) by sending traffic out port0 to the DUT, and expecting that traffic to be
presented back out from the DUT to its port1, and vice versa (out from port1 -> DUT -> back
in on port0). The loadtester first sends a few seconds of warmup, this is to ensure the DUT is
passing traffic and offers the ability to inspect the traffic before the actual rampup. Then
the loadteser ramps up linearly from zero to 100% of line rate (in our case, line rate is
one gigabit in both directions), finally it holds the traffic at full line rate for a certain
duration. If at any time the loadtester fails to see the traffic it’s emitting return on its
second port, it flags the DUT as saturated; and this is noted as the maximum bits/second and/or
packets/second.

usage: trex-loadtest.bin [-h] [-s SERVER] [-p PROFILE_FILE] [-o OUTPUT_FILE] [-wm WARMUP_MULT]
                         [-wd WARMUP_DURATION] [-rt RAMPUP_TARGET]
                         [-rd RAMPUP_DURATION] [-hd HOLD_DURATION]

T-Rex Stateless Loadtester -- pim@ipng.nl

optional arguments:
  -h, --help            show this help message and exit
  -s SERVER, --server SERVER
                        Remote trex address (default: 127.0.0.1)
  -p PROFILE_FILE, --profile PROFILE_FILE
                        STL profile file to replay (default: imix.py)
  -o OUTPUT_FILE, --output OUTPUT_FILE
                        File to write results into, use "-" for stdout (default: -)
  -wm WARMUP_MULT, --warmup_mult WARMUP_MULT
                        During warmup, send this "mult" (default: 1kpps)
  -wd WARMUP_DURATION, --warmup_duration WARMUP_DURATION
                        Duration of warmup, in seconds (default: 30)
  -rt RAMPUP_TARGET, --rampup_target RAMPUP_TARGET
                        Target percentage of line rate to ramp up to (default: 100)
  -rd RAMPUP_DURATION, --rampup_duration RAMPUP_DURATION
                        Time to take to ramp up to target percentage of line rate, in seconds (default: 600)
  -hd HOLD_DURATION, --hold_duration HOLD_DURATION
                        Time to hold the loadtest at target percentage, in seconds (default: 30)

It’s worth pointing out that almost all systems are pps-bound not bps-bound. A typical rant
I have is that network vendors are imprecise when they specify their throughput “up to 40Gbit”
they more often than not mean “under carefully crafted conditions” such as utilizing jumboframes
(9216 bytes rather than “usual” 1500 byte MTU found on ethernet, which is easier on the router
than a typical internet mixture (closer to 1100 bytes), and much easier yet than if the router
is asked to forward 64 byte packets, for instance in a DDoS attack); and only in one direction;
and only using exactly one source/destination IP address/port, which is a little bit easier to
do than to look up a destination in a forwarding table containing 1M destinations – for context
a current internet backbone router carries ~845K IPv4 destinations and ~105K IPv6 destinations.

Results

Results

For more information on the methodology and the scripts that drew these graphs, take a look
at my buddy Michal’s GitHub Page, which, given
time, will probably turn into its own subsection of this website (I can only imagine the value
of a corpus of loadtests of popular equipment in the consumer arena).

Caveats

The unit was shipped to me free of charge by PC Engines for the purposes of load- and systems
integration testing. Other than that, this is not a paid endorsement and views of this review
are my own.

Open Questions

SFP I2C

Considering the target audience, I wonder if there is a possibility to break out the I2C pins from
the SFP cage into a header on the board, so that users can connect them through to the CPU’s I2C
controller (or bitbang directly on GPIO pins), and use the APU6 as an SFP flasher. I think that
would come in incredibly handy in a datacenter setting.

CPU bound

The DPDK based router implementations are CPU bound, and could benefit from a little bit more power.
I am duly impressed by the throughput seen in terms of packets/sec/watt, but considering a typical
router has a (forwarding) dataplane and needs as well a (configuration) controlplane, we are short
about 30% CPU cycles. If a controlplane (like Bird or FRR (ref) is dedicated
one core, that leaves us three cores for forwarding, with which we obtain roughly 154% of linerate,
we’ll need that 200/154 == 1.298 to obtain line rate in both directions. That said, the APU6 has
absolutely no problems saturating a gigabit in both directions under normal (==imix)
circumstances.

Appendix 1 – Terminology

Term Description
OADM optical add drop multiplexer – a device used in wavelength-division multiplexing systems for multiplexing and routing different channels of light into or out of a single mode fiber (SMF)
ONT optical network terminal – The ONT converts fiber-optic light signals to copper based electric signals, usually Ethernet.
OTO optical telecommunication outlet – The OTO is a fiber optic outlet that allows easy termination of cables in an office and home environment. Installed OTOs are referred to by their OTO-ID.
CARP common address redundancy protocol – Its purpose is to allow multiple hosts on the same network segment to share an IP address. CARP is a secure, free alternative to the Virtual Router Redundancy Protocol (VRRP) and the Hot Standby Router Protocol (HSRP).
SIT simple internet transition – Its purpose is to interconnect isolated IPv6 networks, located in global IPv4 Internet via tunnels.
STB set top box – a device that enables a television set to become a user interface to the Internet and also enables a television set to receive and decode digital television (DTV) broadcasts.
GRE generic routing encapsulation – a tunneling protocol developed by Cisco Systems that can encapsulate a wide variety of network layer protocols inside virtual point-to-point links over an Internet Protocol network.
L2VPN layer2 virtual private network – a service that emulates a switched Ethernet (V)LAN across a pseudo-wire (typically an IP tunnel)
DHCP dynamic host configuration protocol – an IPv4 network protocol that enables a server to automatically assign an IP address to a computer from a defined range of numbers.
DHCP6-PD Dynamic host configuration protocol: prefix delegation – an IPv6 network protocol that enables a server to automatically assign network prefixes to a customer from a defined range of numbers.
NDP NS/NA neighbor discovery protocol: neighbor solicitation / advertisement – an ipv6 specific protocol to discover and judge reachability of other nodes on a shared link.
NDP RS/RA neighbor discovery protocol: router solicitation / advertisement – an ipv6 specific protocol to discover and install local address and gateway information.
SBC single board computer – a compute computer with all peripherals and components directly attached to the board.



Source link