Doing random things over at http://musteat.org
198 stories
·
4 followers

Automated Internet Speedtests for Distributed Networks

1 Share

by Stuart Vassey and Colleen Hutson

The Problem

Have you ever been curious if you’re getting the Internet speeds you pay for? Do users complain that their internet connection is slow? Do popular browser and app-based speed tests really provide accurate measurements? We wanted to answer each of these questions.

With restaurant-based technology growing rapidly at Chick-fil-A, the distributed networks that feed our systems have become critical.

Availability monitoring is possible with a ping-based approach, but that doesn’t tell the whole story. Network devices provide additional SNMP metrics with glimpses into packet loss, latency, and real-time throughput, but this data doesn’t confirm maximum available bandwidth. In this post, I’ll describe how we monitor circuit capacity at over 2,000 locations daily.

Solution Requirements

As we set out to find a solution, there were some requirements:

  • No browser-based tests, which might fail to simulate real-world scenarios
  • No third-party software installs due to security risk
  • Minimal impact to business operations
  • Dynamic execution parameters for slow and fast circuits
  • Capable of saturating 100+mbps fiber circuits
  • Capable of running at 2000+ sites simultaneously
  • Results must be viewable in a central location
  • Keep costs low by using existing tools and infrastructure

The Chick-fil-A Solution

Websites like speedof.me provide decent testing capabilities, but we wanted an automated solution that didn’t rely on a browser-based test. Software-based options are available, but we weren’t comfortable installing 3rd party software in our restaurant ecosystem. Instead, we built our own custom solution using a combination of AWS S3, Content Delivery Network (CDN), PowerShell, Windows Management Interface (WMI), Windows scheduled tasks, and LogicMonitor.

Rather than deploy additional testing infrastructure, we rely on our existing CDN and AWS S3 services to provide robust download and upload capabilities. Files ranging from 1.5MB up to 100MB are available to download through local POPs near each of our 2000+ restaurant locations. Once the files are downloaded, they’re uploaded back to AWS S3. We schedule a job to clear the uploaded files daily.

Each restaurant has a Windows machine capable of running scheduled tasks, so we developed a PowerShell script that runs nightly with minimal risk of impacting our business operations. For 15 seconds, the script downloads and saves as much content from CDN as possible, and then the script uploads the same files back to AWS S3 for 15 seconds. The timed results of these requests are averaged, to produce estimated download and upload speeds. We found that it’s important to use appropriate file sizes to produce accurate results. Big files downloaded on a DSL circuit might fail completely and bog down the network, while small files on a fiber network can lead to inaccurate results. We dynamically increase the size of our download and upload files based on the network circuit’s capacity.

Once the Internet speeds have been calculated, we store the results and test status in a custom WMI namespace, which we query using an infrastructure monitoring platform, LogicMonitor.

From LogicMonitor, we’re able to:

  • Identify stores with the fastest/slowest circuits
  • Show long term chain-wide trends from circuit upgrades
  • Estimate circuit utilization by dividing interface throughput by circuit speed

Challenges We Faced

  • PowerShell Overhead - At first, we were seeing unexpectedly slow speeds, which we attributed to slow hardware. Eventually we figured out that PowerShell needed to be optimized to run our I/O intensive script efficiently. Working with MS Support, we identified the visual progress bar as a major drag, even with the script running as a background task. Adding the option $progressPreference = 'silentlyContinue' resolved this issue and increased throughput tenfold at some stores.
  • Dynamic Test Scaling - A one-size-fits-all test wasn’t good at maximizing fast networks without bogging down slower circuits. We built some logic into our script to dynamically scale file sizes to provide a more flexible testing method.
  • Recording Results – By storing test results to a custom WMI namespace, we can easily query for the data using our monitoring platform. We hadn’t worked with custom WMI namespaces before, so this took some trial and error. You can see our approach in the attached script.
  • Upload Site - Not many places want to receive 19GB of throw-away uploads daily. We chose AWS S3 and set up a task to regularly clear these files.

Results & Conclusion

With an automated speedtest solution, we’ve created a quick and easy way to understand our restaurants’ real-world network performance. Here are a few ways we’ve used this data:

  • Support staff no longer have to remote connect to stores to run browser-based speedtests.
  • When users report slow Internet speeds, support staff can compare user experiences with historical results.
  • We can prioritize circuit upgrades for locations with slow connections.
  • We have independent and unbiased speed ratings.

In the future, we will likely migrate the speedtest task to our IoT/Edge infrastructure with some minor modifications.

Hopefully our guide helps you design your own automated speedtest solution that provides new insights for your business!

# Authors: Colleen Hutson, Geoffrey Cole, and Stuart Vassey|
# -------------------------------------------------------------------------------------------------|
# Purpose:
# - Script is designed to test the down/up rate of a particular device
# Process:
# - Download speed is determined by downloading files increasing in size from CDN
# - Upload speed is determined by uploading files increasing in size to an S3 bucket
# - Final up/down speed rates are saved to custom WMI paths
# -------------------------------------------------------------------------------------------------|
# ------------------------------------------------ #
# Script Variables
# ------------------------------------------------ #
# Calculated download speed
$_avgDownload = 0
# Calculated upload speed
$_avgUpload = 0
# CDN address
$_cdnAddress = <CDN ADDRESS>
# S3 Bucket URL
$s3EndPoint = <S3 BUCKET ENDPOINT>
# Local file path to save downloads to
$_localDownloadPath = <LOCAL PATH TO SAVE DOWNLOADS TO>
# Track up/download failed (1=failed 0=sucess)
$_spFailureTrack = 0
# WMI Namespace
$_wmiNamespace = <CUSTOM WMI NAMESPACE>
# WMI Class
$_wmiClass = <CUSTOM WMI CLASS>
# Silence up/download progress bar, so to not throttle up/download process
$progressPreference = 'silentlyContinue'
# ------------------------------------------------ #
# First calculate the average download speed
# Download is calculated first because the files that are downloaded will be uploaded to calculate upload speed
# For Loop Logic Explanation:
# - 10 files of known size located at the CDN endpoint
# - Download, in increasing file size order, all 10 files or as many as possible 15 seconds
# ------------------------------------------------ #
# File path endings
$downloadPaths = "1_5mb", "2_5mb", "5mb", "10mb", "20mb", "25mb", "40mb", "50mb", "75mb", "100mb"
# Creates the file path for downloaded objects if it does not exist.
If (-not(Test-Path -path $_localDownloadPath)) {
New-Item $_localDownloadPath -type directory -ErrorAction SilentlyContinue | Out-Null
}
# Create lists to store download time and downloaded file sizes to
[System.Collections.ArrayList]$downloadTimes = @()
[System.Collections.ArrayList]$downloadSize = @()
try {
$index = 0
# While less than 15 seconds have elapsed and files available, download file of greater size than preceeding one
For ($startTime = Get-Date; ((Get-Date)-$startTime).TotalSeconds -le 15 -and $index -lt 10; $index++) {
# File download start time
$downStart = Get-Date
# Download file
Invoke-WebRequest $_cdnAddress/$($downloadPaths[$index]) -OutFile "$_localDownloadPath\$($downloadPaths[$index])"
# Calculate time to download file and save to list
$downloadTimes.Add(((Get-Date)-$downStart).TotalSeconds)
# Calculate size of file downloaded and save to list
$downloadSize.Add((Get-Item -Path $_localDownloadPath\$($downloadPaths[$index])).Length/1024/1024)
}
# Calcaulte average download speed
$downloadSum = 0
for ($i=0; $i -lt $downloadTimes.Count; $i++) {
$downloadSum = $downloadSum + ($downloadSize[$i] / $downloadTimes[$i] * 8)
}
$_avgDownload = $downloadSum / $downloadTimes.Count
}
catch {
$_spFailureTrack = 1
}
# ------------------------------------------------ #
# Second calculate the average upload speed
# For Loop Logic Explanation:
# - Use files that were downloaded for calculating the average download speed
# - Upload, in increasing file size order, all files that were downloaded or as many as possible 15 seconds
# ------------------------------------------------ #
try {
# Get files from SpeedTest folder and sort from smallest to largest file
$uploadPaths = Get-ChildItem $_localDownloadPath\*mb | Sort-Object -Property Length
# Create lists to store download time and downloaded file sizes to
[System.Collections.ArrayList]$uploadSize = @()
[System.Collections.ArrayList]$uploadTimes = @()

$index = 0
$index
# While less than 15 seconds have elapsed and files available, upload file of greater size than preceeding one
for ($startTime = Get-Date; ((Get-Date)-$startTime).TotalSeconds -le 15 -and $index -lt $uploadPaths.Count; $index++) {
# Start time for upload
$upStart = Get-Date
# Create S3 URL and upload file
$s3URL = "$s3EndPoint/$env:COMPUTERNAME/$($uploadPaths[$index].Name).txt"
Invoke-RestMethod -Uri $s3URL -Method Put -InFile $uploadPaths[$index].FullName
# Calculate time to upload file and save to list
$uploadTimes.Add(((Get-Date)-$downStart).TotalSeconds)
$uploadSize.Add($uploadPaths[$index].Length/1024/1024)
}

# Calculate average upload speed
$uploadSum = 0
for ($i=0; $i -lt $uploadTimes.Count; $i++) {
$uploadSum = $uploadSum + ($uploadSize[$i] / $uploadTimes[$i] * 8)
}
$_avgUpload = $uploadSum / $uploadTimes.Count
}
catch {
$_spFailureTrack = 1
}
# ------------------------------------------------ #
# Third save the up/down speeds to custom WMI Objects
# ------------------------------------------------ #
# Check if wmi namespace exists and create if not
If (Get-WmiObject -Namespace "root\cimv2" -Class "__NAMESPACE" | Where-Object { $_.Name -eq $_wmiNamespace }) {
WRITE-HOST "The root\cimv2\$_wmiNamespace WMI namespace exists."
} Else {
$wmi = [wmiclass]"root\cimv2:__Namespace"
$newNamespace = $wmi.createInstance()
$newNamespace.name = $_wmiNamespace
$newNamespace.put()
}
# WMI Class
If (Get-WmiObject -List -Namespace "root\cimv2\$_wmiNamespace" | Where-Object {$_.Name -eq $_wmiClass})
{
# The class exists, Let's clean it up.
$GetExistingInstances = Get-WmiObject -Namespace "root\cimv2\$_wmiNamespace" -Class $_wmiClass
If ($GetExistingInstances -ne $Null)
{
Remove-WMIObject -Namespace "root\cimv2\$_wmiNamespace" -Class $_wmiClass
}
}
# Publish WMI Properties
$_subClass = New-Object System.Management.ManagementClass ("root\cimv2\$_wmiNamespace", [String]::Empty, $null);
$_subClass["__CLASS"] = $_wmiClass;
$_subClass.Qualifiers.Add("Static", $true)
$_subClass.Properties.Add("Name", [System.Management.CimType]::String, $false)
$_subClass.Properties["Name"].Qualifiers.Add("Key", $true) #A key qualifier must be defined to execute 'Put' command.
$_subClass.Properties.Add("Upload", [System.Management.CimType]::Real64, $false)
$_subClass.Properties.Add("Download", [System.Management.CimType]::Real64, $false)
$_subClass.Properties.Add("SpeedTestFail", [System.Management.CimType]::Real64, $false)
$_subClass.Put()
$keyvalue = "Bandwidth"
$Filter = 'Name= "' + $keyvalue + '"'
$Inst = Get-WmiObject -Class $_wmiClass -Filter $Filter -Namespace root\cimv2\$_wmiNamespace
$Inst | Remove-WMIObject
$WMIURL = 'root\cimv2\'+$_wmiNamespace+':'+$_wmiClass
$PushDataToWMI = ([wmiclass]$WMIURL).CreateInstance()
$PushDataToWMI.Name = $keyvalue
$PushDataToWMI.Upload = $_avgUpload
$PushDataToWMI.Download = $_avgDownload
$PushDataToWMI.SpeedTestFail = $_spFailureTrack
$PushDataToWMI.Put()
Read the whole story
smarkwell
1 day ago
reply
PDX
Share this story
Delete

Exclave: Hardware Testing in Mass Production, Made Easier

1 Comment and 3 Shares

Reputable factories will test 100% of every product shipped. For example, the computer or phone you’re using to read this has had a plug inserted in every connector, along with dozens of internal and external tests run to confirm everything from the correct operation of the CPU to the proper function of the buttons.


A test station at a motherboard factory (2x speed). Every port and connector gets tested.

Even highly automated processes can yield defective units: entropy happens, and constant vigilance is required to guard against it. Even a very stable manufacturing process with a raw defect rate of around 1% is considered unacceptable by any reputable brand. This is one of the elephants in the digital fabrication room – just because a tool is digital doesn’t mean it will fabricate things perfectly with a push of the button. Every tool needs maintenance, and more often than not a skilled operator is required to inspect the final product and polish over rough edges.

To better grasp the magnitude of the factory test problem, consider the software that’s loaded on your computer. How did it get in there? Devices come out of the silicon foundry mostly blank. They typically don’t even have the innate knowledge to traverse a filesystem, much less connect to the Internet to download an update. Yet everyone has had the experience of waiting for an update to download and install. Factories must orchestrate a much more time-consuming and complicated process to bootstrap every device made, in order for you to enjoy the privilege of connecting to the Internet to download updates.

One might think, “surely, there must be a standardized way for handling this”.

Shockingly, there isn’t.

How Not To Test a Product

Unfortunately, first-time product makers often make the assumption that either products don’t require 100% testing (because the boards are assembled by robots, and robots don’t make mistakes, right?), or there is some otherwise standardized way to handle the initial firmware upload. Once upon a time, I was called upon to intervene on a factory test for an Arduino-derivative product, where the original test specification was literally “plug the device into the USB port of [your] laptop, and type in this AVRDUDE command to load code, and then type in another AVRDUDE command to set the fuses, and then use a multimeter to check the voltages on these two test points”. The test documentation was literally two photographs of the laptop screen and a paragraph of text. The product’s designer argued to the factory that this was sufficient because it it’s really quick and reliable: he does it in under two minutes, how could any competent factory that handles products with AVR chips not have heard of AVRDUDE, and besides he encountered no defects in the half dozen prototypes he produced by hand. This is in addition to an over-arching attitude of “whatever, I’m the smart guy who comes up with the ideas, just get your minimum-wage Chinese laborers to stop messing them up”.

The reality is that asking someone to manually run commands from a shell and read a meter for hours on end while expecting zero defects is neither humane nor practical. Furthermore, assuming the ability and judgment to run command line scripts isn’t realistic; testing is time-consuming, and thus often the least-skilled, lowest wage laborers are employed for the process. Ironically, there is no correlation between the skills required to assemble a computer, and the skills required to operate a computer. Thus, in order for the factory to meet the product designer’s expectation of low labor cost with simultaneously high quality, it’s up to the product designer to come up with an automated, fool-proof test jig.

Introducing the Test Jig: The Product Behind the Product

“Test jig” is a generic term any tool designed to assist with production testing. However, there is a basic format for a test jig chassis, and demand for test jig chassis is so high in places like Shenzhen that entire cottage industries have sprung up to support the demand. Most circuit board test jigs look a bit like this:


Above: NeTV2 circuit board test jig

And the short video below highlights the spring-loaded pogo pins of the test jig, along with how a circuit board is inserted into a test jig and clamped in place for testing.


Above: Inserting an NeTV2 PCB into its test jig.

As you can see in the video, the circuit board is placed into a precision-milled platter that moves along spring-loaded rails, allowing the board to engage with pogo-pin style test points underneath. As test points consume precious space on the circuit board, the overall mechanical accuracy of the system has to be better than +/-1mm once all tolerances are considered over thousands of cycles of wear and tear, in order to keep the test points a reasonable size (under 2mm in diameter).

The specific test jig shown above measures 12 separate DC voltages, performs a basic JTAG ID code check on the FPGA, loads firmware, and tests the on-board DRAM all in under 20 seconds. It’s the preliminary “fast test” of the NeTV2 product, meant to screen out gross solder faults and it provides an estimated coverage of about 80% of the solder joints on the PCB. The remaining 20% of the solder joints belong principally to connectors, which require a much more labor-intensive manual test to check.

Here’s a look inside the test jig:

If it looks complicated, that’s because it is. Test jig complexity is correlated with product complexity, which is why I like to say the test jig is the “product behind the product”. In some cases, a product designer may spend even more time designing a test jig than they spend designing the product itself. There’s a very large space of problems to consider when implementing a test jig, ranging from test coverage to operator fatigue, and of course throughput and reliability.

Here’s a list of the basic issues to consider when designing a test jig:

  • Coverage: How to test every single feature?
  • UX: Who is interpreting your test data? How to internationalize the UI by using symbols and colors instead of text, and how to minimize operator fatigue?
  • Automation: What’s the quickest way to set up and tear down tests? How to avoid relying on human judgment?
  • Audit & traceability: How do you enforce testing standards? How to incorporate logging and coupons to facilitate material traceability?
  • Updates: What do you do when the tester needs a patch or update? How do you keep the test program in lock-step with the current firmware release?
  • Responsibility: Who is responsible for product quality? How do you create a natural incentive to design-for-test from the very first product sketch?
  • Code Structure: How do you maintain the tester’s code base? It’s tempting to think that test jig code should be write-once, since it’s going into a single device with a limited user base. However, the reality of production is rarely so simple, and it pays to structure your code base so that it’s self-checking, modular, reconfigurable, and reliable.

Each of these bullet points are aspects of test jig design that I have learned from the school of hard knocks.

Read on, and avoid my mistakes.

Coverage

Ideally, a tester should cover 100% of the features of a product. But what, exactly, constitutes a feature? I once designed a product called the Chumby One, and I also designed its test procedure. I tried my best to cover all of its features, but I missed one: the power button. It seemed simple enough – just a switch, what could go wrong? It turns out that over the course of production, the tolerance between the mechanical switch pusher and the electrical switch mechanism had drifted to the point where pushing on the cap would not contact the electrical switch itself, leading to a cohort of returns from that production lot.

Even the simplest of mechanisms is a feature that needs to be tested.

Since that experience, I’ve adopted an “inside/outside” methodology to derive the test feature list. First, I look “inside” the product, going through the schematic and picking key features for testing. The priority is to check for solder faults as quickly as possible, based on the assumption that the constituent components are 100% pre-tested and reliable. Then, I look at the product from the “outside”, as a consumer might approach it. First, I look at the marketing brochure and see what was promised: “world class WiFi performance” demands a different level of test from “product has WiFi”. Then, I try to imagine all the ways a customer might interact with the product – such as pressing the power button – and add those points to the test list. This means every connector needs to have something stuffed in it, every switch pressed, every indicator light must get checked.


Red arrow calls out the mechanical switch pusher that drifted out of tolerance with the corresponding electrical switch

UX

Test jig UX can have a large impact on test throughput and reliability; test operators are human, and like all humans are susceptible to fatigue and boredom. A startup I worked with once told me a story of how a simple UX change drastically improved test throughput. They had a test that would take 10 minutes on average to run, so in order to achieve a net throughput of around 1 minute per unit, they provided the factory 10 testers. Significantly, the test run-time would vary from unit to unit, with a variance of several minutes from unit to unit. Unfortunately, the only indicator of test state was a single light that could either flash or change color. Furthermore, the lighting pattern of units that failed testing bore a resemblance to units that were still running the test, so even when the operator noticed a unit that finished testing, they would often overlook failed units, assuming they were still running the test. As a result, the actual throughput achieved on their first production run was about one unit every 5 minutes — driving up labor costs dramatically.

Once the they refactored the UX to include an audible chime that would play when the test was finished, aggregate test cycle time dropped to a bit over a minute – much closer to the original estimate.

Thus, while one might think UX is just for users, I’ve found it pays to make wireframes and mock-ups for the tester itself, and to spend some developer cycles to create an operator-friendly test program. In some ways, tester UX design is more challenging than the product UX: ideally, you’re creating a UX with icons that are internationally recognizeable, using little or no text, so operators anywhere in the world can just sit down and use it with no special training. Furthermore, you’re trying to create user engagement with something as banal as a test – something that’s literally as boring as watching paint dry. I’ve even gone so far as putting a mini-game in the middle of a long test sequence to keep operators attentive. The mini-game was of course directly relevant to the testing certain hardware sensors, but it was surprisingly effective because the operators would race each other on the mini-game to see who could finish the fastest, boosting throughput and increasing worker happiness.

At the end of the day, factories are powered by humans, and it pays to employ a human-first design process when crafting test programs.

Automation

Human operators are prone to error. The more a test can be automated, the more reliable it can be, and in the long run automation will save money. I once visited a large mobile phone maker’s factory, and witnessed a gymnasium-sized room full of test stations replaced by a pair of fully robotic test stations. Instead of hundreds of operators plugging cables in and checking aspects like screen and camera quality, a delicate ballet of robotic actuators would plug connectors into every port in a fraction of a second, and every feature of the phone from the camera to the GPS is tested in a couple of minutes. The test stations apparently cost about a million dollars to develop, but the empty cavern of idle test jigs sitting next to it was clear testament to the labor cost savings of such a high degree of automation.

At the smaller scales more typical of startups, automation can happen but it needs to be judiciously applied. Every robotic actuator takes time and money to develop, and they are also prone to wear-out and eventual failure. For the Chibitronics Chibi Chip product, there’s a single mechanical switch on the board, and we developed a simple servo mechanism to actuate the plunger. However, despite using a series-elastic spring and a foam pad to avoid over-stressing the servo motor, over time, we’ve found the motor still fails, and operators have disconnected it in favor of manually pushing the button at the right time.


The Chibi Chip test jig


Detail view of the reset switch servo

Indicator lights can also be tricky to test because the lighting conditions in a factory can be highly variable. Sometimes the floor is flooded by sunlight; other times, it’s lit by dim fluorescent lamps or LED bulbs, each with distinct noise signatures. A simple photodetector will be unreliable unless you can perfectly shield the device under test (DUT) from stray light sources. However, if the product’s LEDs can be modulated (with a PWM waveform, for example), the modulation can be detected through an AC-coupled photodetector. This system tends to be more reliable as the AC coupling rejects sunlight, and the modulation frequency can be chosen to be distinct from other stray light noise sources in the factory.

In general, the gold standard for test automation is to put the DUT into a jig, press a button, wait, and then a red or green light indicates if the device passes or fails. For simple products, this should be achievable, but reasonable exceptions should be made depending upon the resources available in a startup to implement tests versus the potential frequency and impact of a particular feature escaping the test process. For example, in the case of NeTV2, the functionality of indicator LEDs and the fan are visually inspected by the operator; but in my judgment, all the components involved have generous tolerances and are less likely to be assembled incorrectly, and there are other points downstream of the PCB test during the assembly process where the LEDs and fan operation will be checked yet again, further reducing the likelihood of these features escaping the test process.

Audit and Traceability

Here’s a typical failure scenario at a factory: one operator is running two testers in parallel. The lunch bell rings, and the operator gets up and leaves without noting the status of the test (if you’ve been doing the same thing over and over for the past four hours and running on an empty belly, you’d do the same thing too). After lunch, the operator sits down again, and has to recall whether the units in front of her have been tested or not. As a result of this arbitrary judgment call, sometimes units that didn’t pass test, or weren’t even tested at all, slip into the tested product bins after a shift change.

This is one of the many reasons why it pays to incorporate some sort of audit and traceability program into the tester and product itself. The exact nature of the program will depend greatly upon the exact nature of the product and amount of developer resources available, but a simple example is structuring the test program so that a serial number isn’t generated for the product until all the tests pass – thus, the serial number is a kind of “coupon” to prove the unit has passed test. In the operator-returning-from-lunch scenario, she just has to check for the presence of a serial number to determine the testing state of a particular unit.


The Chibi Chip uses Bitmarks as a coupon to indicate when they have passed test. The Bitmarks also help prevent warranty fraud and deters cloning.

Sometimes I also burn a log of the test into the product itself. It’s important to make the log a circular buffer that can store more than one test run, because often times products that fail test the first time must be retested several times as it’s reworked and repaired. This way, if a product is returned by a user, I can query the log and see a fairly complete history of the product’s rework experience in the factory. This is incredibly helpful in debugging factory process issues and holding the factory accountable for marginal practices such as re-testing a device multiple times without repairing it, with the hope that they get lucky and get a “pass” out of the tester due to random environmental fluctuations.

Ideally, these logs are sent up to the cloud or a server directly, but that will depend heavily upon the reliability of the Internet connectivity at your facility. Internet is notoriously unreliable in China, especially to servers not located on the mainland, and so sometimes a small startup with limited resources has to make compromises about the extent and nature of audit and traceability achievable on the factory floor.

Updates

Consumer electronic products are increasingly just software wrapped in a plastic shell. While the hardware itself must stabilize months before production, the software in a product continues to evolve, especially in Internet-connected products that support over-the-air updates. Sometimes patches to a product’s firmware can profoundly alter low-level APIs, breaking the factory test program. For example, I had a product once where the audio drivers went through a major upgrade, going from OSS to ALSA. This changed the way the microphone subsystem was accessed, causing the microphone test to fail in production. Thus user firmware updates can also necessitate a tester program update.

If a test jig was engineered as a stand-alone box that requires logging into a terminal to upgrade, every time the software team pushes an update, guess what – you’re hopping on a plane to the factory to log in to the test jig and upgrade it. This is not a sustainable upgrade plan for products that have complex, constantly evolving internal firmware; thus, as the test jig designer, it’s well-advised to build a secure remote upgrade process into the test jig itself.


That’s me about 12 years ago on a factory floor at 2AM debugging a testjig update gone wrong, bringing production to a screeching halt. Don’t be like me; you can do better!

In addition a remote upgrade mechanism, you’re going to need a way to validate the test jig update without having to bring down a production line. In order to help with this, I always keep a physical copy of the production test jig in my office, so I can validate testjig updates from the comfort of my office before pushing them to the production floor. I try my best to keep the local jig an exact copy of what’s on the line; this may involve taking snapshots of the firmware image or swapping out OS drives between development and production versions, or deliberately breaking features that have somehow failed on the production jigs. This process is inspired by the engineers at JPL and NASA who keep an exact copy of Mars-based rovers on Earth, so they can thoroughly test an update before pushing it to the rover on Mars. While this discipline can be inconvenient and incurs the cost of an extra test jig, it’s inevitably cheaper than having to book a last minute flight to your factory to fix things because of an update gone wrong.

As for the upgrade mechanism itself, how fancy and secure you want to get has virtually no limit; I’ve done everything from manual swaps of USB thumb drives that contain the tester configuration data to a private VPN via a dedicated 3G-to-wifi gateway deployed at the factory site. The nature of the product (e.g. does it contain security keys, how often is the product firmware updated) and the funding level of your organization will heavily influence the architecture of the upgrade process.

Responsibility

Given how much effort it takes to build a good test jig, it’s tempting to free up precious developer resources by simply outsourcing the test jig to a third party. I’ve almost never found this to be a good idea. First of all, nobody but the developer knows what skeletons are hidden in a product’s closet. There’s what’s written in the spec, but then there is how faithfully the spec was implemented. Of course, in an ideal world, all specs were perfectly met, but only the developer has a true sense of how spot-on the implementation ended up. This drives the second point, which is avoiding the blame game. By throwing tests over the fence to a third party, if a test isn’t easy to implement or is generating false results, it’s easy to get into a finger-pointing exercise over who is at fault: the developer for not meeting the specs, or the test developer for not being creative enough to implement the test without necessitating design changes.

However, when the developer knows they are ultimately on the hook for the test jig, from day one the developer thinks about design for test. Where will the test points go? How do we make internal state easily visible? What bring-up sequence gives us the most test coverage in the shortest amount of time? By making the developer responsible for the test jig, the test program comes together as the product matures. Bring-up scripts used to validate the product are quickly converted to factory tests, and overall the product achieves a higher standard of testability while saving the money and resources that would otherwise be spent trying to coordinate between two parties with conflicting self-interests.

Code Structure

It’s tempting to think about a test jig as a pile of write-once code that doesn’t need to be maintainable. For simple products, one can definitely get away with this mentality. However, I’ve been bitten more than once by fragile code bases inside production testers. The most typical scenario where things break is when I have to change the order of tests, in order to prioritize testing problematic features first. It doesn’t make sense to test a dozen high-yielding features before running a test on a feature with a known yield issue. That just wastes operator time, and runs up the cost of production.

It’s also hard to predict before production what the most frequent mode of failure would be – after all, any failures you could have anticipated would already be designed out! So, quite often in the middle of an early production run, I’m challenged with having to change the order of tests in a complex sequence of tests to optimize operator time and improve production throughput.

Tests almost always have dependencies – you have to power on the board before you can flash the firmware; you need firmware before you can connect to wifi; you need credentials to connect to wifi; you have to clean up the test credentials before shipping the product. However, if the process that cleans up the test credentials is also responsible for cleaning up any other temporary tester files (for example, a flag that also sets Bluetooth into test mode), moving the wifi test sequence earlier could result in tester configuration files being left on the customer image, potentially leading to unexpected behaviors (such as Bluetooth still being in test mode in the shipping product!).

Thus, it’s helpful to have some infrastructure for tests that keeps each test modular while enforcing dependencies. Although one could write this code every single time from scratch, we encounter this problem so regularly that Sean ‘Xobs’ Cross set out to create a testjig management system to solve this problem “once and for all”. The result is a project he calls Exclave, with the idea being that Exclave – like an actual geographical exclave – is a tiny bit of territory that you can retain control of inside a foreign factory.

Introducing Exclave

Exclave is a scaffold designed to give structure to an otherwise amorphous blob of test code, while minimizing the amount of overhead required of the product designer to achieve this structure. The basic features of Exclave are as follows:

  • Code Re-use. During product bring-up, designers write simple scripts to validate each feature individually. Exclave attempts to re-use these scripts by making no assumption about the language used to write them. Python, C, Bash, Node.js, Rust – all are welcome, so long as they run on a command line and can return an exit code.
  • Automated dependency resolution. Each test routine is associated with a “.test” descriptor which describes the dependencies and timeout for a given script, which are then automatically resolved by Exclave.
  • Scenario management. Test descriptors are strung together into scenarios, which can be selected dynamically based on the real-time requirements of the factory.
  • Triggers. Typically a test is started by pressing a button, but Exclave’s flexible triggering system also allows tests to start on other cues, such as hot-plug events.
  • Multiple UI targets. Test jig UI can range from a red/green light to a serial console device to a full graphical interface running on a monitor. Exclave has a system for interpreting test results and driving multiple UI sinks. This allows for fast product debugging by attaching a GUI (via an HDMI monitor or laptop) while maintaining compatibility with cost-efficient LED indicators favored for production scale-up.


Above: Exclave helps migrate lab-bench validation code to production-grade factory tests.

To get a little flavor on what Exclave looks like in practice, let’s look at a couple of the tests implemented in the NeTV2 production test flow. First, the production test is split into two repositories: the test descriptors, and the graphical UI. Note that by housing all the tests in github, we also solve the tester upgrade problem by providing the factory with a set git repo management scripts mapped to double-clickable desktop icons.

These repositories are installed on a Raspberry Pi contained within the test jig, and Exclave is started on boot as a systemd service. The service runs a simple script that fires up Exclave in a target directory which contains a “.jig” file. The “netv2.jig” file specifies the default scenario, among other things.

Here’s an example of what a quick test scenario looks like:

This scenario runs a variety of scripts in different languages that: turn on the device (bash/C), checks voltages (C), checks ID code of the FPGA (bash/openOCD), loads a test bitstream (bash/openOCD), checks that the REPL shell can start on the FPGA (Expect/TCL), and then runs a RAM test (Expect/TCL) before shutting the board down (bash/C). Many of these scripts were copied directly from code used during board bring-up and system validation.

A basic operation that’s surprisingly tricky to do right is checking for terminal interaction (REPL shell) via serial port. Writing a C or bash script that does this correctly and gracefully handles all error cases is hard, but fortunately someone already solved this problem with the “Expect” TCL extension. Here’s what the REPL shell test descriptor looks like in Exclave:

As you can see, this points to a couple other tests as dependencies, sets a time-out, and also designates the location of the Expect script.

And this is what the Expect script looks like:

This one is a bit more specialized to the NeTV2, but basically, it looks for the NeTV2 tester firmware shell prompt, which is “TESTER_NX8D>”; the system will attempt to recover this prompt by sending a carriage-return sequence once every two seconds and searching for this special string in return. If it receives the string “BIOS” instead, this indicates that the NeTV2 failed to boot and escaped into the ROM BIOS, probably due to a RAM error; at which point, the Expect script prints a bunch of JSON, which is automatically passed up to the UI layer by Exclave to create a human-readable error message.

Which brings us to the interface layer. The NeTV2 jig has two options for UI: a set of LEDs, or an HDMI monitor. In an ideal world, the total amount of information an operator needs to know about a board is if it passed or failed – a green or red LED. Multiple instances of the test jig are needed when a product enters high volume production (thousands of units per day), so the cost of each test jig becomes a factor during production scale-up. LEDs are orders of magnitude cheaper than an HDMI monitor, and in general a test jig will cost less than an HDMI monitor. So LEDs instead of an HDMI monitor for UI can dramatically slash the cost to scale up production. On the other hand, a pair of LEDs does not give enough information to diagnose what’s gone wrong with a bad board. In a volume production scenario, one would typically collect the (hopefully small) fraction of failed boards and bring them to a secondary station where a more skilled technician debugs them. Exclave allows the same jig used in production to be placed at the debug station, but with an HDMI monitor attached to provide valuable detailed error reports.

With Exclave, both UI are integrated seamlessly using “.interface” files. Below is an example of the .interface file that starts up the http daemon to enable JSON debugging via an HDMI monitor.

In a nutshell, Exclave contains an event reporting system, which logs events in a fashion similar to Linux kernel messages. Events are tagged with metadata, such as severity, and the events are broadcast to interface handlers that further refine them for the respective UI element. In the case of the LEDs, it just listens for “START” [a scenario], “FAIL” [a test], and “FINISH” [a scenario] events, and ignores everything else. In the case of the HDMI interface, a browser configured to run in kiosk mode is pointed to the correct localhost webpage, and a jquery-based HTML document handles the dynamic generation of the UI based upon detailed messages from Exclave. Below is a screenshot of what the UI looks like in action.

The UI is deliberately brutalist in design, using color to highlight only the most important messages, and also includes audible alerts so that operators can zone out while the test runs.

As you can see, the NeTV2 production tester tests everything – from the LEDs to the Ethernet, to features that perhaps few people will ever use, such as the SD card slot and every single GPIO pin. Thanks to Exclave, I was able to get this complex set of tests up and running in under a month: the first code commit was made on Oct 13, 2018, and by Nov 7, I was largely just tweaking tests for performance, and to reflect operational realities discovered on the factory floor.

Also, for the hardware-curious, I did design a custom “hat” for the Raspberry Pi to add several ADC channels and various connectors to facilitate testing. You can check out the source for the tester hat at the Alphamax github repo. I had six of these boards built; five of them have found their way into various parts of the NeTV2 production flow, and if I still have one spare after production is stabilized, I’m planning on installing a replica of a tester at HAX in Shenzhen. That way, those curious to find out more about Exclave can walk up to the tester, log into it, and poke around (assuming HAX agrees to this).

Let’s Stop Re-Inventing the Test Jig!
The unspoken secret of hardware is that behind every product, there’s a robust test jig making sure that every unit shipped to end customers meets quality standards. Hardware startups that don’t anticipate the importance and difficulty of creating such a tester often encounter acute (and sometimes fatal) growing pains. Anytime I build more than a few copies of a piece of hardware, I know I’m going to need a test jig – even for bespoke, short-run products like a conference badge.

After spending months of agony re-inventing the wheel every time we shipped a product, Xobs decided to create Exclave. It’s still a work in progress, but by now it’s been used as the production test infrastructure for several volume products, including the Chibi Chip, Chibi Scope, Tomu, The Phage Blinky Badge, and now NeTV2 (those are all links to the actual Exclave test scripts for each of the respective products — open source ftw!). I feel Exclave has come along far enough that it’s time to invite more users to join the Exclave community and give it a try. The code is located on github and is 100% open source, and it’s written in Rust entirely by Xobs. It’s my hope that Exclave can mature into a tool and a community that will save countless Makers and small hardware startups the teething pains of re-inventing the test jig.


Production-proven testjigs that run Exclave. Clockwise from top-right: NeTV2, Chibi Chip, Chibi Scope, Tomu, and The Phage Blinky Badge. The badge tester has even survived a couple of weeks exposed to the harsh elements of the desert as a DIY firmware updating station!

Read the whole story
smarkwell
28 days ago
reply
PDX
Share this story
Delete
1 public comment
nocko
33 days ago
reply
Amazing!

Photo

2 Shares












Read the whole story
smarkwell
175 days ago
reply
PDX
Share this story
Delete

Edge Computing at Chick-fil-A

1 Share

by Brian Chambers, Caleb Hurd, Sean Drucker, Alex Crane, Morgan McEntire, Jamie Roberts, and Laura Jauch (the Chick-fil-A IOT/Edge team)

In a recent post, we shared about how we do bare metal clustering for Kubernetes on-the-fly at the Edge in our restaurants. One of the most common (and best) questions we got was “but why?”. In the post we will answer that question. If you are interested in more or prefer video to text, you can also check out Brian and Caleb’s presentation at QConNY 2018.

(Including Kubernetes. Look for our new container orchestration tool, Kowbernetes, coming soon)

Why K8s in a restaurant?

Why does a restaurant company like Chick-fil-A want to deploy Kubernetes at the Edge? What are the goals? That, we can answer.

  • Low-latency, internet-independent applications that can reliably run our business
  • High availability for these applications
  • Platform that enables rapid innovation and that allows delivery of business functionality to production as quickly as possible
  • Horizontal scale — infrastructure and application development teams

Capacity Pressures

Chick-fil-A is a very successful company in large part because of our fantastic food and customer service — both the result of the great people that operate and work in our restaurants. They are unmatched in the industry. The result is very busy restaurants. If you have been to a Chick-fil-A restaurant, this is not news to you. If you have not been, it’s time to go!

A busy Chick-fil-A in Manhattan w/ lines out the door. Photo credit Wall Street Journal

While we are very grateful for our successes, they have created a lot of capacity challenges for us as a business. To put it in perspective, we do more sales in six days a week than our competitors do in seven (we are closed on Sunday). Many of our restaurants are doing greater than three times the volume that they were initially designed for. These are great problems to have, but they create some extremely difficult challenges related to scale.

At Chick-fil-A, we believe the solutions to many of these capacity problems are technology solutions.

A new generation of workloads

One of our approaches to solving these problems is to invest in a smarter restaurant.

Our goal: simplify the restaurant experience for Owner/Operators and their teams and optimize for great, hot, tasty, food served quickly and with a personal touch, all while increasing the capacity of our existing footprint.

Our hypothesis: By making smarter kitchen equipment we can collect more data. By applying data to our restaurant, we can build more intelligent systems. By building more intelligent systems, we can better scale our business.

As a simple example, image a forecasting model that attempts to predict how many Waffle Fries (or replace with your favorite Chick-fil-A product) should be cooked over every minute of the day. The forecast is created by an analytics process running in the cloud that uses transaction-level sales data from many restaurants. This forecast can most certainly be produced with a little work. Unfortunately, it is not accurate enough to actually drive food production. Sales in Chick-fil-A restaurants are prone to many traffic spikes and are significantly effected by local events (traffic, sports, weather, etc.).

However, if we were to collect data from our point-of-sale system’s keystrokes in real-time to understand current demand, add data from the fryers about work in progress inventory, and then micro-adjust the initial forecast in-restaurant, we would be able to get a much more accurate picture of what we should cook at any given moment. This data can then be used to give a much more intelligent display to a restaurant team member that is responsible for cooking fries (for example), or perhaps to drive cooking automation in the future.

Goals like this led us to develop an Internet of Things (IOT) platform for our restaurants. To successfully scale our business we need the ability to 1) collect data and 2) use it to drive automation in the restaurant.

Rather than select many different vendor solutions that were silo’d and disconnected and therefore difficult to integrate, we wanted an open ecosystem that we own. Our open approach enables us to allow internal developers or external partners can develop connected “things” or applications for the Edge and leverage our platform for common needs (security, connectivity, etc.).

Put another way, the IOT/Edge team owns a small subset of the overall application base that is deployed on the edge, and is effectively a runtime platform team for the in-restaurant compute platform.

Architecture Overview

Here is a look at the architecture we developed (and are using today) to meet these goals.

Simple view of our current architecture, w/ K8s changes highlighted

Cloud Control Plane

Today, we run our higher order control plane and core IoT services in the cloud. This includes services like:

  • Authorization Server — device identity management
  • Data ingest / routing — collect data and send it where it needs to go
  • Operational time series data — logs, monitoring, alerting, tracing
  • Deployment management — self-service deployment for our application teams using GitOps (kudos to our friends at Weaveworks for sharing the term). We plan to share our work and code soon.

These services also run in Kubernetes so that we have a common paradigm across all of our team’s deployments.

Edge

“Edge Computing” is the idea of putting compute resources close to the action to attain a higher level of availability and/or lower level of latency.

We think of our Edge Computing environment as a “micro private cloud”. By this, we mean that we provide developers with a series of helpful services and a place to deploy their applications on our infrastructure.

Does this make us the largest “cloud provider” in the world? You can be the judge of that.

In all seriousness, this approach gives us a unique kind of scale. Rather than a few large K8s clusters with tens-to-hundreds-of-thousands of containers, at full scale we will have more than 2000 clusters with tens of containers per cluster. And that number grows significantly as we open many new restaurants every year.

Our edge workloads include:

  • platform services such as an authentication token service, pub/sub messaging (MQTT), log collection/exfiltration (FluentBit), monitoring (Prometheus), etc.
  • applications that interact with “things” in the restaurant
  • simple microservices that serve HTTP requests
  • machine learning models that synthesize cloud-developed forecasts with real-time events from MQTT to make decisions at the edge and drive automation

Our physical edge environment obtains it’s connectivity from multiple in-restaurant switches (and two routers), so we operate on a very highly available LAN.

Today, nearly all of the data that is collected at the Edge is ephemeral and only needs to exist for a short time before being exfiltrated to the cloud. This could change in the future, but it has kept our early-stage deployments very simple and made them much easier to manage.

Since we have a highly available Kubernetes cluster with data replicated across all nodes, we can ensure that we can retain any data that is collected until a time when the internet is again available for data exfiltration. We can also aggregate, compress, or drop data dynamically at the Edge based on business need.

Given all of this, we still use the cloud first for our services whenever possible. Edge is the fallback deployment target when high-availability, low-latency, in-restaurant applications are a must.

Following in the footsteps of giants

We run our Edge infrastructure on commodity hardware that costs us, ballpark, $1000/restaurant. We wanted to keep our hardware investment low-cost and horizontally scalable. We have not found a way to make it dynamic/elastic yet, but maybe one day (order on Amazon w/ free 30-second shipping perhaps?).

With our architecture, we can add additional compute capacity as-needed by adding additional devices. We may upsize our hardware footprint in the future, but what we have makes sense for our workloads today. Google also got started scaling workloads on commodity hardware (written in 2003, where has the time gone?)… so we’re in good company!

We hope our approach to Edge infrastructure encourages creative thinking for anyone that is working with scarce resources (physical space, budget, or otherwise). I think that’s everyone.

Everything else

Finally, we have our connectivity layer and the IOT things. Many of our devices are fully-powered (no batteries) but still constrained on chipsets/processing power. Wi-Fi is the default method for connectivity. We have not committed to a low-power protocol yet, but are interested in LoRa and the WiFi Halo (802.11ah) project.

For things/physical devices, our team also provides developers with an SDK for executing on-boarding flows (all OAuth based) and for accessing services within the Edge environment such as MQTT. Brian talked about this more extensively at QConNY 2017 (note that we were using Docker Swarm vs. Kubernetes at the time).

Containers at the Edge

Why would you want to run containers at the Edge? For the same reason you would run them in the cloud (or anywhere else).

Dependency management becomes easier. Testing is easier. The developer experience is easier and better. Teams can move much faster and more autonomously, especially when reasonable points of abstraction (such as k8s namespaces) and resource limits (CPU/RAM) are applied.

Another key design goal was reliability for our critical applications, meaning no single points-of-failure in the architecture. There are many ways this type of goal can be achieved, but we decided having > 1 physical hosts at the Edge and a container-based orchestration layer was the best approach for the types of workloads we have and for the skillset of our teams.

We initially called our strategy “Redundant Restaurant Compute”, which really speaks to the goal behind our approach. Later, we transitioned to calling it “Edge Computing”.

The ability to orchestrate services effectively and to preserve a desired number of replicas quickly and consistently attracted us to Kubernetes.

Why Kubernetes?

At first, we planned to use Docker Swarm for container orchestration, but made the move to Kubernetes in early 2018 because we believe it’s core capabilities and surrounding ecosystem are (by far) the strongest. We are excited about taking advantage of some of the developments around Kubernetes like Istio and OpenFaaS.

Most importantly, we firmly believe…

Ideas, in and of themselves, have no value.

Code, in and of itself, has no value.

The only thing that has value is code that is running in production. Only then have you created value for users and had the opportunity to impact your business.

Therefore, at Chick-fil-A, we want to optimize for taking great ideas, turning them into code, and getting that code running in production as fast as possible.

Research says that using the latest and greatest technology (like Kubernetes) has no correlation to a team or organization’s success. None. The ability to turn ideas into code and get code into production rapidly does.

If you can accomplish this goal and meet your requirements with VMWare, a single machine, some other clustering tool or orchestration layer, or using just the cloud, we would not try and convince you to switch to Kubernetes and follow our lead. The simplest possible solution is usually the best solution.

At Chick-fil-A, we believe that the best way for us to achieve these goals is for us to embrace containerization, and to do so with Kubernetes. At the end of the day, its all there to help you “Eat More Chicken”.

Next

What are our next steps?

In the next 18-24 months, we expect to increase the number of smart/connected devices in our restaurants to > 100,000 and complete our chain-wide rollout of Kubernetes to the Edge. The number of use cases for “brain applications” that control the restaurant from the edge continue to grow, and we look forward to serving them and sharing more with the community about what we learn.

We will also take a close look at the Google GKE On-Prem service that was just announced since it might save us from building and managing clusters in the future. What we have done to-date is a sunk cost, and if we can re-focus resources on the differentiating rather than infrastructure, you can be certain we will.

If we failed to answer questions you have about what we’re doing, don’t hesitate to reach out on LinkedIn. If we can help you with any challenges with Kubernetes at the Edge, please let us know. At Chick-fil-A, it is always our please to share, and (kudos to Caleb for this) “our pleasure to server you”.

Read the whole story
smarkwell
177 days ago
reply
PDX
Share this story
Delete

New US Tariffs are Anti-Maker and Will Encourage Offshoring

3 Comments and 5 Shares

The new 25% tariffs announced by the USTR, set to go into effect on July 6th, are decidedly anti-Maker and ironically pro-offshoring. I’ve examined the tariff lists (List 1 and List 2), and it taxes the import of basic components, tools and sub-assemblies, while giving fully assembled goods a free pass. The USTR’s press release is careful to mention that the tariffs “do not include goods commonly purchased by American consumers such as cellular telephones or televisions.”

Think about it – big companies with the resources to organize thousands of overseas workers making TVs and cell phones will have their outsourced supply chains protected, but small companies that still assemble valuable goods from basic parts inside the US are about to see significant cost increases. Worse yet educators, already forced to work with a shoe-string budget, are going to return from their summer recess to find that basic parts, tools and components for use in the classroom are now significantly more expensive.


Above: The Adafruit MetroX Classic Kit is representative of a typical electronics education kit. Items marked with an “X” in the above image are potentially impacted by the new USTR tariffs.

New Tariffs Reward Offshoring, Encourage IP Flight

Some of the most compelling jobs to bring back to the US are the so-called “last screw” system integration operations. These often involve the complex and precise process of integrating simple sub-assemblies into high-value goods such as 3D printers or cell phones. Quality control and IP protection are paramount. I often advise startups to consider putting their system integration operations in the US because difficult-to-protect intellectual property, such as firmware, never has to be exported if the firmware upload operation happens in the US. The ability to leverage China for low-value subassemblies opens more headroom to create high-value jobs in the US, improving the overall competitiveness of American companies.

Unfortunately, the structure of the new tariffs are exactly the opposite of what you would expect to bring those jobs back to the US. Stiff new taxes on simple components, sub-assemblies, and tools like soldering irons contrasted against a lack of taxation on finished goods pushes business owners to send these “last screw” operation overseas. Basically, with these new tariffs the more value-add sent outside the borders of the US, the more profitable a business will be. Not even concerns over IP security could overcome a 25% increase in base costs and keep operations in the US.

It seems the intention of the new tariff structure was to minimize the immediate pain that voters would feel in the upcoming mid-terms by waiving taxes on finished goods. Unfortunately, the reality is it gives small businesses that were once considering setting up shop in the US a solid reason to look off-shore, while rewarding large corporations for heavy investments in overseas operations.

New Tariffs Hurt Educators and Makers

Learning how to blink a light is the de-facto introduction to electronics. This project is often done with the help of a circuit board, such as a Microbit or Chibi Chip, and a type of light known as an LED. Unfortunately, both of those items – simple circuit boards and LEDs – are about to get 25% more expensive with the new tariffs, along with other Maker and educator staples such as capacitors, resistors, soldering irons, and oscilloscopes. The impact of this cost hike will be felt throughout the industry, but most sharply by educators, especially those serving under-funded school districts.


Above: Learning to blink a light is the de-facto introduction to electronics, and it typically involves a circuit board and an LED, like those pictured above.

Somewhere on the Pacific Ocean right now floats a container of goods for ed-tech startup Chibitronics. The goods are slated primarily for educators and Makers that are stocking up for the fall semester. It will arrive in the US the second week of July, and will likely be greeted by a heavy import tax. I know this because I’m directly involved in the startup’s operations. Chibitronics’ core mission is to serve the educator market, and as part of that we routinely offered deep discounts on bulk products for educators and school systems. Now, thanks to the new tariffs on the basic components that educators rely upon to teach electronics, we are less able to fulfill our mission.

A 25% jump in base costs forces us to choose between immediate price increases or cutting the salaries of our American employees who support the educators. These new tariffs are a tax on America’s future – it deprives some of the most vulnerable groups of access to technology education, making future American workers less competitive on the global stage.


Above: Educator-oriented learning kits like the Chibitronics “Love to Code” are slated for price increases this fall due to the new tariffs.

Protectionism is Bad for Technological Leadership

Recently, I was sent photos by Hernandi Krammes of a network card that was manufactured in Brazil around 1992. One of the most striking features of the card was how retro it looked – straight out of the 80’s, a full decade behind its time. This is a result of Brazil’s policy of protectionist tariffs on the import of high-tech components. While stiff tariffs on the import of microchips drove investment in local chip companies, trade barriers meant the local companies didn’t have to be as competitive. With less incentive to re-invest or upgrade, local technology fell behind the curve, leading ultimately to anachronisms like the Brazilian Ethernet card pictured below.


Above: this Brazilian network card from 1992 features design techniques from the early 80’s. It is large and clunky compared to contemporaneous cards.

Significantly, it’s not that the Brazilian engineers were any less clever than their Western counterparts: they displayed considerable ingenuity getting a network card to work at all using primarily domestically-produced components. The tragedy is instead of using their brainpower to create industry-leading technology, most of their effort went into playing catch-up with the rest of the world. By the time protectionist policies were repealed in Brazil, the local industry was too far behind to effectively compete on a global scale.

Should the US follow Brazil’s protectionist stance on trade, it’s conceivable that some day I might be remarking on the quaintness of American network cards compared to their more advanced Chinese or European counterparts. Trade barriers don’t make a country more competitive – in fact, quite the opposite. In a competition of ideas, you want to start with the best tech available anywhere; otherwise, you’re still jogging to the starting line while the competition has already finished their first lap.

Stand Up and Be Heard

There is a sliver of good news in all of this for American Makers. The list of commodities targeted in the trade war is not yet complete. The “List 2” items – which include all manner of microchips, motors, and plastics (such as 3D printer PLA filament and acrylic sheets for laser cutting) that are building blocks for small businesses and Makers – have yet to be ratified. The USTR website has indicated in the coming weeks they will disclose a process for public review and comment. Once this process is made transparent – whether you are a small business owner or the parent of a child with technical aspirations – I encourage you to please share your stories and concerns on how you will be negatively impacted by these additional tariffs.

Some of the List 2 items still under review include:

9030.31.00 Multimeters for measuring or checking electrical voltage, current, resistance or power, without a recording device
8541.10.00 Diodes, other than photosensitive or light-emitting diodes
8541.40.60 Diodes for semiconductor devices, other than light-emitting diodes, nesoi
8542.31.00 Electronic integrated circuits: processors and controllers
8542.32.00 Electronic integrated circuits: memories
8542.33.00 Electronic integrated circuits: amplifiers
8542.39.00 Electronic integrated circuits: other
8542.90.00 Parts of electronic integrated circuits and microassemblies
8501.10.20 Electric motors of an output of under 18.65 W, synchronous, valued not over $4 each
8501.10.60 Electric motors of an output of 18.65 W or more but not exceeding 37.5 W
8501.31.40 DC motors, nesoi, of an output exceeding 74.6 W but not exceeding 735 W
8544.49.10 Insulated electric conductors of a kind used for telecommunications, for a voltage not exceeding 80 V, not fitted with connectors
8544.49.20 Insulated electric conductors nesoi, for a voltage not exceeding 80 V, not fitted with connectors
3920.59.80 Plates, sheets, film, etc, noncellular, not reinforced, laminated, combined, of other acrylic polymers, nesoi
3916.90.30 Monafilament nesoi, of plastics, excluding ethylene, vinyl chloride and acrylic polymers

Here’s some of the “List 1” items that are set to become 25% more expensive to import from China, come July 6th:

Staples used by every Maker or electronics educator:

8515.11.00 Electric soldering irons and guns
8506.50.00 Lithium primary cells and primary batteries
8506.60.00 Air-zinc primary cells and primary batteries
9030.20.05 Oscilloscopes and oscillographs, specially designed for telecommunications
9030.33.34 Resistance measuring instruments
9030.33.38 Other instruments and apparatus, nesoi, for measuring or checking electrical voltage, current, resistance or power, without a recording device
9030.39.01 Instruments and apparatus, nesoi, for measuring or checking

Circuit assemblies (like Microbit, Chibi Chip, Arduino):

8543.90.68 Printed circuit assemblies of electrical machines and apparatus, having individual functions, nesoi
9030.90.68 Printed circuit assemblies, NESOI

Basic electronic components:

8532.21.00 Tantalum fixed capacitors
8532.22.00 Aluminum electrolytic fixed capacitors
8532.23.00 Ceramic dielectric fixed capacitors, single layer
8532.24.00 Ceramic dielectric fixed capacitors, multilayer
8532.25.00 Dielectric fixed capacitors of paper or plastics
8532.29.00 Fixed electrical capacitors, nesoi
8532.30.00 Variable or adjustable (pre-set) electrical capacitors
8532.90.00 Parts of electrical capacitors, fixed, variable or adjustable (pre-set)
8533.10.00 Electrical fixed carbon resistors, composition or film types
8533.21.00 Electrical fixed resistors, other than composition or film type carbon resistors, for a power handling capacity not exceeding 20 W
8533.29.00 Electrical fixed resistors, other than composition or film type carbon resistors, for a power handling capacity exceeding 20 W
8533.31.00 Electrical wirewound variable resistors, including rheostats and potentiometers, for a power handling capacity not exceeding 20 W
8533.40.40 Metal oxide resistors
8533.40.80 Electrical variable resistors, other than wirewound, including rheostats and potentiometers
8533.90.80 Other parts of electrical resistors, including rheostats and potentiometers, nesoi
8541.21.00 Transistors, other than photosensitive transistors, with a dissipation rating of less than 1 W
8541.29.00 Transistors, other than photosensitive transistors, with a dissipation rating of 1 W or more
8541.30.00 Thyristors, diacs and triacs, other than photosensitive devices
8541.40.20 Light-emitting diodes (LED’s)
8541.40.70 Photosensitive transistors
8541.40.80 Photosensitive semiconductor devices nesoi, optical coupled isolators
8541.40.95 Photosensitive semiconductor devices nesoi, other
8541.50.00 Semiconductor devices other than photosensitive semiconductor devices, nesoi
8541.60.00 Mounted piezoelectric crystals
8541.90.00 Parts of diodes, transistors, similar semiconductor devices, photosensitive semiconductor devices, LED’s and mounted piezoelectric crystals
8504.90.75 Printed circuit assemblies of electrical transformers, static converters and inductors, nesoi
8504.90.96 Parts (other than printed circuit assemblies) of electrical transformers, static converters and inductors
8536.50.90 Switches nesoi, for switching or making connections to or in electrical circuits, for a voltage not exceeding 1,000 V
8536.69.40 Connectors: coaxial, cylindrical multicontact, rack and panel, printed circuit, ribbon or flat cable, for a voltage not exceeding 1,000 V
8544.49.30 Insulated electric conductors nesoi, of copper, for a voltage not exceeding 1,000 V, not fitted with connectors
8544.49.90 Insulated electric conductors nesoi, not of copper, for a voltage not exceeding 1,000 V, not fitted with connectors
8544.60.20 Insulated electric conductors nesoi, for a voltage exceeding 1,000 V, fitted with connectors
8544.60.40 Insulated electric conductors nesoi, of copper, for a voltage exceeding 1,000 V, not fitted with connectors

Parts to fix your phone if it breaks:

8537.10.80 Touch screens without display capabilities for incorporation in apparatus having a display
9033.00.30 Touch screens without display capabilities for incorporation in apparatus having a display
9013.80.70 Liquid crystal and other optical flat panel displays other than for articles of heading 8528, nesoi
9033.00.20 LEDs for backlighting of LCDs
8504.90.65 Printed circuit assemblies of the goods of subheading 8504.40 or 8504.50 for telecommunication apparatus

Power supplies:

9032.89.60 Automatic regulating or controlling instruments and apparatus, nesoi
9032.90.21 Parts and accessories of automatic voltage and voltage-current regulators designed for use in a 6, 12, or 24 V system, nesoi
9032.90.41 Parts and accessories of automatic voltage and voltage-current regulators, not designed for use in a 6, 12, or 24 V system, nesoi
9032.90.61 Parts and accessories for automatic regulating or controlling instruments and apparatus, nesoi
8504.90.41 Parts of power supplies (other than printed circuit assemblies) for automatic data processing machines or units thereof of heading 8471
Read the whole story
smarkwell
216 days ago
reply
PDX
Share this story
Delete
3 public comments
jepler
218 days ago
reply
bunnie lays it out for you
Earth, Sol system, Western spiral arm
philipstorry
218 days ago
reply
Trade wars are never a good idea unless you have a monopoly.
Bunnie lays out the long-term implications for America's economy very effectively here.
It can't be the intended result that the big multinationals will potentially benefit by removing small competition. It can't be the intended result that this restricts access to educational opportunities.
Sadly, I doubt that those behind the trade war care about anything other than how tough they think they look...
London, United Kingdom
brennen
218 days ago
reply
This is pretty fucking relevant both to my interests and to my continued gainful employment.
Boulder, CO

How The New York Times Uses Software To Recognize Members of Congress

1 Share
Illustration by Greg Kletsel

Even if you’ve covered Congress for The New York Times for a decade, it can be hard to recognize which member you’ve just spoken with. There are 535 members, and with special elections every few months, members cycle in and out relatively frequently. So when former Congressional Correspondent Jennifer Steinhauer tweeted “Shazam, but for House members faces” in early 2017, The Times’s Interactive News team jumped on the idea.

Our first thought was: Nope, it’s too hard! Computer vision and face recognition are legitimately difficult computer science problems. Even a prototype would involve training a model on the faces of every member of Congress, and just getting the photographs to train with would be an undertaking.

But we did some Googling and found the Amazon Rekognition API. This service has a “RecognizeCelebrity” endpoint that happens to include every member of Congress as well as several members of the Executive branch. Problem solved! Now we’re just talking about knitting together a few APIs.

As we began working on this application, we understood that there are numerous, valid privacy concerns about using face recognition technology. And Interactive News, a programming team embedded in the newsroom, abides by all aspects of The Times’s ethical journalism handbook including how we gather information about the people we cover. In this case, we decided that the use of Rekognition’s celebrity endpoint meant we would only recognize congress people and other “celebrities” in Rekognition’s database.

By the end of the summer, Interactive News interns Gautam Hathi and Sherman Hewitt had built a prototype based on some conversations with me and my colleague Rachel Shorey. To use the prototype, a congressional reporter could snap a picture of a congress member, text it to a our app, and get back an annotated version of the photograph identifying any members and giving a confidence score.

However, we discovered a new round of difficulties. Rekognition incorrectly identified some members of Congress as similar-looking celebrities — like one particularly funny instance where it confused Bill Nelson with Bill Paxton. Additionally, our hit rate on photographs was very low because the halls of the Capitol are poorly lit and the photographs we took for testing were consistently marred by shadow and blur. Bad connectivity in the basement of the Capitol made sending and receiving an MMS slow and error-prone during our testing. And, of course, there were few places in the Capitol where we could really get the photographs we needed without committing a foul.

Gautam and Sherman got around the “wrong celebrity” problem with a novel approach: A hardcoded list of Congresspeople and their celebrity doppelgangers. We grew more confident taking photographs and only sent the ones where members were better lit.

A text-based interface is easiest for reporters to use, so while texting is slow, it’s superior to a web service in the low-bandwidth environment of the Capitol.

In addition to confirming the identity of a member, Who The Hill has helped The Times tell some stories we couldn’t have reported otherwise. Most recently, Rachel Shorey found members of Congress at an event hosted by a SuperPAC by trawling through images found on social media and finding matches.

If you’re interested in running your own version, the code for Who The Hill is open sourced under the Apache 2.0 license. The latest version includes a command-line interface in case you’d like to use the power of Amazon’s Rekognition to dig through a collection of photographs on your local machine without sending a pile of MMS messages.

Our service is far from infallible. But Times reporters like Thomas Kaplan love having a backup for when they can’t get a moment with a member to confirm their identity. “Of course,” says Kaplan, “the most reliable way to figure out a member’s identity is the old-fashioned way: Just ask them.”


How The New York Times Uses Software To Recognize Members of Congress was originally published in Times Open on Medium, where people are continuing the conversation by highlighting and responding to this story.

Read the whole story
smarkwell
224 days ago
reply
PDX
Share this story
Delete
Next Page of Stories