Doing random things over at
215 stories

Three Panel Soul - Superheroics

1 Share

New comic!

Today's News:
Read the whole story
32 days ago
Share this story

Can We Build Trustable Hardware?


Why Open Hardware on Its Own Doesn’t Solve the Trust Problem

A few years ago, Sean ‘xobs’ Cross and I built an open-source laptop, Novena, from the circuit boards up, and shared our designs with the world. I’m a strong proponent of open hardware, because sharing knowledge is sharing power. One thing we didn’t anticipate was how much the press wanted to frame our open hardware adventure as a more trustable computer. If anything, the process of building Novena made me acutely aware of how little we could trust anything. As we vetted each part for openness and documentation, it became clear that you can’t boot any modern computer without several closed-source firmware blobs running between power-on and the first instruction of your code. Critics on the Internet suggested we should have built our own CPU and SSD if we really wanted to make something we could trust.

I chewed on that suggestion quite a bit. I used to be in the chip business, so the idea of building an open-source SoC from the ground-up wasn’t so crazy. However, the more I thought about it, the more I realized that this, too was short-sighted. In the process of making chips, I’ve also edited masks for chips; chips are surprisingly malleable, even post tape-out. I’ve also spent a decade wrangling supply chains, dealing with fakes, shoddy workmanship, undisclosed part substitutions – there are so many opportunities and motivations to swap out “good” chips for “bad” ones. Even if a factory could push out a perfectly vetted computer, you’ve got couriers, customs officials, and warehouse workers who can tamper the machine before it reaches the user. Finally, with today’s highly integrated e-commerce systems, injecting malicious hardware into the supply chain can be as easy as buying a product, tampering with it, packaging it into its original box and returning it to the seller so that it can be passed on to an unsuspecting victim.

If you want to learn more about tampering with hardware, check out my presentation at 2019.

Based on these experiences, I’ve concluded that open hardware is precisely as trustworthy as closed hardware. Which is to say, I have no inherent reason to trust either at all. While open hardware has the opportunity to empower users to innovate and embody a more correct and transparent design intent than closed hardware, at the end of the day any hardware of sufficient complexity is not practical to verify, whether open or closed. Even if we published the complete mask set for a modern billion-transistor CPU, this “source code” is meaningless without a practical method to verify an equivalence between the mask set and the chip in your possession down to a near-atomic level without simultaneously destroying the CPU.

So why, then, is it that we feel we can trust open source software more than closed source software? After all, the Linux kernel is pushing over 25 million lines of code, and its list of contributors include corporations not typically associated with words like “privacy” or “trust”.

The key, it turns out, is that software has a mechanism for the near-perfect transfer of trust, allowing users to delegate the hard task of auditing programs to experts, and having that effort be translated to the user’s own copy of the program with mathematical precision. Thanks to this, we don’t have to worry about the “supply chain” for our programs; we don’t have to trust the cloud to trust our software.

Software developers manage source code using tools such as Git (above, cloud on left), which use Merkle trees to track changes. These hash trees link code to their development history, making it difficult to surreptitiously insert malicious code after it has been reviewed. Builds are then hashed and signed (above, key in the middle-top), and projects that support reproducible builds enable any third-party auditor to download, build, and confirm (above, green check marks) that the program a user is downloading matches the intent of the developers.

There’s a lot going on in the previous paragraph, but the key take-away is that the trust transfer mechanism in software relies on a thing called a “hash”. If you already know what a hash is, you can skip the next paragraph; otherwise read on.

A hash turns an arbitrarily large file into a much shorter set of symbols: for example, the file on the left is turned into “🐱🐭🐼🐻” (cat-mouse-panda-bear). These symbols have two important properties: even the tiniest change in the original file leads to an enormous change in the shorter set of symbols; and knowledge of the shorter set of symbols tells you virtually nothing about the original file. It’s the first property that really matters for the transfer of trust: basically, a hash is a quick and reliable way to identify small changes in large sets of data. As an example, the file on the right has one digit changed — can you find it? — but the hash has dramatically changed into “🍑🐍🍕🍪” (peach-snake-pizza-cookie).

Because computer source code is also just a string of 1’s and 0’s, we can also use hash functions on computer source code, too. This allows us to quickly spot changes in code bases. When multiple developers work together, every contribution gets hashed with the previous contribution’s hashes, creating a tree of hashes. Any attempt to rewrite a contribution after it’s been committed to the tree is going to change the hash of everything from that point forward.

This is why we don’t have to review every one of the 25+ million lines of source inside the Linux kernel individually – we can trust a team of experts to review the code and sleep well knowing that their knowledge and expertise can be transferred into the exact copy of the program running on our very own computers, thanks to the power of hashing.

Because hashes are easy to compute, programs can be verified right before they are run. This is known as closing the “Time-of-Check vs Time-of-Use” (TOCTOU) gap. The smaller the gap between when the program is checked versus when it is run, the less opportunity there is for malicious actors to tamper with the code.

Now consider the analogous picture for open source in the context of hardware, shown above. If it looks complicated, that’s because it is: there are a lot of hands that touch your hardware before it gets to you!

Git can ensure that the original design files haven’t been tampered with, and openness can help ensure that a “best effort” has been made to build and test a device that is trustworthy. However, there are still numerous actors in the supply chain that can tamper with the hardware, and there is no “hardware hash function” that enables us to draw an equivalence between the intent of the developer, and the exact instance of hardware in any user’s possession. The best we can do to check a modern silicon chip is to destructively digest and delayer it for inspection in a SEM, or employ a building-sized microscope to perform ptychographic imaging.

It’s like the Heisenberg Uncertainty Principle, but for hardware: you can’t simultaneously be sure of a computer’s construction without disturbing its function. In other words, for hardware the time of check is decoupled from the time of use, creating opportunities for tampering by malicious actors.

Of course, we entirely rely upon hardware to faithfully compute the hashes and signatures necessary for the perfect trust transfer of trust in software. Tamper with the hardware, and all of a sudden all these clever maths are for naught: a malicious piece of hardware could forge the results of a hash computation, thus allowing bad code to appear identical to good code.

Three Principles for Building Trustable Hardware

So where does this leave us? Do we throw up our hands in despair? Is there any solution to the hardware verification problem?

I’ve pondered this problem for many years, and distilled my thoughts into three core principles:

1. Complexity is the enemy of verification. Without tools like hashes, Merkel trees and digital signatures to transfer trust between developers and users, we are left in a situation where we are reduced to relying on our own two eyes to assess the correct construction of our hardware. Using tools and apps to automate verification merely shifts the trust problem, as one can only trust the result of a verification tool if the tool itself can be verified. Thus, there is an exponential spiral in the cost and difficulty to verify a piece of hardware the further we drift from relying on our innate human senses. Ideally, the hardware is either trivially verifiable by a non-technical user, or with the technical help of a “trustable” acquaintance, e.g. someone within two degrees of separation in the social network.

2. Verify entire systems, not just components. Verifying the CPU does little good when the keyboard and display contain backdoors. Thus, our perimeter of verification must extend from the point of user interface all the way down to the silicon that carries out the secret computations. While open source secure chip efforts such as Keystone and OpenTitan are laudable and valuable elements of a trustable hardware ecosystem, they are ultimately insufficient by themselves for protecting a user’s private matters.

3. Empower end-users to verify and seal their hardware. Delegating verification and key generation to a central authority leaves users exposed to a wide range of supply chain attacks. Therefore, end users require sufficient documentation to verify that their hardware is correctly constructed. Once verified and provisioned with keys, the hardware also needs to be sealed, so that users do not need to conduct an exhaustive re-verification every time the device happens to leave their immediate person. In general, the better the seal, the longer the device may be left unattended without risk of secret material being physically extracted.

Unfortunately, the first and second principles conspire against everything we have come to expect of electronics and computers today. Since their inception, computer makers have been in an arms race to pack more features and more complexity into ever smaller packages. As a result, it is practically impossible to verify modern hardware, whether open or closed source. Instead, if trustworthiness is the top priority, one must pick a limited set of functions, and design the minimum viable verifiable product around that.

The Simplicity of Betrusted

In order to ground the conversation in something concrete, we (Sean ‘xobs’ Cross, Tom Mable, and I) have started a project called “Betrusted” that aims to translate these principles into a practically verifiable, and thus trustable, device. In line with the first principle, we simplify the device by limiting its function to secure text and voice chat, second-factor authentication, and the storage of digital currency.

This means Betrusted can’t browse the web; it has no “app store”; it won’t hail rides for you; and it can’t help you navigate a city. However, it will be able to keep your private conversations private, give you a solid second factor for authentication, and perhaps provide a safe spot to store digital currency.

In line with the second principle, we have curated a set of peripherals for Betrusted that extend the perimeter of trust to the user’s eyes and fingertips. This sets Betrusted apart from open source chip-only secure enclave projects.

Verifiable I/O

For example, the input surface for Betrusted is a physical keyboard. Physical keyboards have the benefit of being made of nothing but switches and wires, and are thus trivial to verify.

Betrusted’s keyboard is designed to be pulled out and inspected by simply holding it up to a light, and we support different languages by allowing users to change out the keyboard membrane.

The output surface for Betrusted is a black and white LCD with a high pixel density of 200ppi, approaching the performance of ePaper or print media, and is likely sufficient for most text chat, authentication, and banking applications. This display’s on-glass circuits are entirely constructed of transistors large enough to be 100% inspected using a bright light and a USB microscope. Below is an example of what one region of the display looks like through such a microscope at 50x magnification.

The meta-point about the simplicity of this display’s construction is that there are few places to hide effective back doors. This display is more trustable not just because we can observe every transistor; more importantly, we probably don’t have to, as there just aren’t enough transistors available to mount an attack.

Contrast this to the more sophisticated color displays, which rely on a fleck of silicon with millions of transistors implementing a frame buffer and command interface, and this controller chip is closed-source. Even if such a chip were open, verification would require a destructive method involving delayering and a SEM. Thus, the inspectability and simplicity of the LCD used in Betrusted is fairly unique in the world of displays.

Verifiable CPU

The CPU is, of course, the most problematic piece. I’ve put some thought into methods for the non-destructive inspection of chips. While it may be possible, I estimate it would cost tens of millions of dollars and a couple years to execute a proof of concept system. Unfortunately, funding such an effort would entail chasing venture capital, which would probably lead to a solution that’s closed-source. While this may be an opportunity to get rich selling services and licensing patented technology to governments and corporations, I am concerned that it may not effectively empower everyday people.

The TL;DR is that the near-term compromise solution is to use an FPGA. We rely on logic placement randomization to mitigate the threat of fixed silicon backdoors, and we rely on bitstream introspection to facilitate trust transfer from designers to user. If you don’t care about the technical details, skip to the next section.

The FPGA we plan to use for Betrusted’s CPU is the Spartan-7 FPGA from Xilinx’s “7-Series”, because its -1L model bests the Lattice ECP5 FPGA by a factor of 2-4x in power consumption. This is the difference between an “all-day” battery life for the Betrusted device, versus a “dead by noon” scenario. The downside of this approach is that the Spartan-7 FPGA is a closed source piece of silicon that currently relies on a proprietary compiler. However, there have been some compelling developments that help mitigate the threat of malicious implants or modifications within the silicon or FPGA toolchain. These are:

• The Symbiflow project is developing a F/OSS toolchain for 7-Series FPGA development, which may eventually eliminate any dependence upon opaque vendor toolchains to compile code for the devices.
Prjxray is documenting the bitstream format for 7-Series FPGAs. The results of this work-in-progress indicate that even if we can’t understand exactly what every bit does, we can at least detect novel features being activated. That is, the activation of a previously undisclosed back door or feature of the FPGA would not go unnoticed.
• The placement of logic with an FPGA can be trivially randomized by incorporating a random seed in the source code. This means it is not practically useful for an adversary to backdoor a few logic cells within an FPGA. A broadly effective silicon-level attack on an FPGA would lead to gross size changes in the silicon die that can be readily quantified non-destructively through X-rays. The efficacy of this mitigation is analogous to ASLR: it’s not bulletproof, but it’s cheap to execute with a significant payout in complicating potential attacks.

The ability to inspect compiled bitstreams in particular brings the CPU problem back to a software-like situation, where we can effectively transfer elements of trust from designers to the hardware level using mathematical tools. Thus, while detailed verification of an FPGA’s construction at the transistor-level is impractical (but still probably easier than a general-purpose CPU due to its regular structure), the combination of the FPGA’s non-determinism in logic and routing placement, new tools that will enable bitstream inspection, and the prospect of 100% F/OSS solutions to compile designs significantly raises the bar for trust transfer and verification of an FPGA-based CPU.

Above: a highlighted signal within an FPGA design tool, illustrating the notion that design intent can be correlated to hardware blocks within an FPGA.

One may argue that in fact, FPGAs may be the gold standard for verifiable and trustworthy hardware until a viable non-destructive method is developed for the verification of custom silicon. After all, even if the mask-level design for a chip is open sourced, how is one to divine that the chip in their possession faithfully implements every design feature?

The system described so far touches upon the first principle of simplicity, and the second principle of UI-to-silicon verification. It turns out that the 7-Series FPGA may also be able to meet the third principle, user-sealing of devices after inspection and acceptance.

Sealing Secrets within Betrusted

Transparency is great for verification, but users also need to be able to seal the hardware to protect their secrets. In an ideal work flow, users would:

1. Receive a Betrusted device

2. Confirm its correct construction through a combination of visual inspection and FPGA bitstream randomization and introspection, and

3. Provision their Betrusted device with secret keys and seal it.

Ideally, the keys are generated entirely within the Betrusted device itself, and once sealed it should be “difficult” for an adversary with direct physical possession of the device to extract or tamper with these keys.

We believe key generation and self-sealing should be achievable with a 7-series Xilinx device. This is made possible in part by leveraging the bitstream encryption features built into the FPGA hardware by Xilinx. At the time of writing, we are fairly close to understanding enough of the encryption formats and fuse burning mechanisms to provide a fully self-hosted, F/OSS solution for key generation and sealing.

As for how good the seal is, the answer is a bit technical. The TL;DR is that it should not be possible for someone to borrow a Betrusted device for a few hours and extract the keys, and any attempt to do so should leave the hardware permanently altered in obvious ways. The more nuanced answer is that the 7-series devices from Xilinx are quite popular, and have received extensive scrutiny over its lifetime by the broader security community. The best known attacks against the 256-bit CBC AES + SHA-256 HMAC used in these devices leverages hardware side channels to leak information between AES rounds. This attack requires unfettered access to the hardware and about 24 hours to collect data from 1.6 million chosen ciphertexts. While improvement is desirable, keep in mind that a decap-and-image operation to extract keys via physical inspection using a FIB takes around the same amount of time to execute. In other words, the absolute limit on how much one can protect secrets within hardware is probably driven more by physical tamper resistance measures than strictly cryptographic measures.

Furthermore, now that the principle of the side-channel attack has been disclosed, we can apply simple mitigations to frustrate this attack, such as gluing shut or removing the external configuration and debug interfaces necessary to present chosen ciphertexts to the FPGA. Users can also opt to use volatile SRAM-based encryption keys, which are immediately lost upon interruption of battery power, making attempts to remove the FPGA or modify the circuit board significantly riskier. This of course comes at the expense of accidental loss of the key should backup power be interrupted.

At the very least, with a 7-series device, a user will be well-aware that their device has been physically compromised, which is a good start; and in a limiting sense, all you can ever hope for from a tamper-protection standpoint.

You can learn more about the Betrusted project at our github page, We think of Betrusted as more of a “hardware/software distro”, rather than as a product per se. We expect that it will be forked to fit the various specific needs and user scenarios of our diverse digital ecosystem. Whether or not we make completed Betrusted reference devices for sale will depend upon the feedback of the community; we’ve received widely varying opinions on the real demand for a device like this.

Trusting Betrusted vs Using Betrusted

I personally regard Betrusted as more of an evolution toward — rather than an end to — the quest for verifiable, trustworthy hardware. I’ve struggled for years to distill the reasons why openness is insufficient to solve trust problems in hardware into a succinct set of principles. I’m also sure these principles will continue to evolve as we develop a better and more sophisticated understanding of the use cases, their threat models, and the tools available to address them.

My personal motivation for Betrusted was to have private conversations with my non-technical friends. So, another huge hurdle in all of this will of course be user acceptance: would you ever care enough to take the time to verify your hardware? Verifying hardware takes effort, iPhones are just so convenient, Apple has a pretty compelling privacy pitch…and “anyways, good people like me have nothing to hide…right?” Perhaps our quixotic attempt to build a truly verifiable, trustworthy communications device may be received by everyday users as nothing more than a quirky curio.

Even so, I hope that by at least starting the conversation about the problem and spelling it out in concrete terms, we’re laying the framework for others to move the goal posts toward a safer, more private, and more trustworthy digital future.

The Betrusted team would like to extend a special thanks to the NLnet foundation for sponsoring our efforts.

Read the whole story
63 days ago
Share this story

Separation of positional and keyword arguments in Ruby 3.0

1 Share

This article explains the planned incompatibility of keyword arguments in Ruby 3.0


In Ruby 3.0, positional arguments and keyword arguments will be separated. Ruby 2.7 will warn for behaviors that will change in Ruby 3.0. If you see the following warnings, you need to update your code:

  • Using the last argument as keyword parameters is deprecated, or
  • Passing the keyword argument as the last hash parameter is deprecated, or
  • Splitting the last argument into positional and keyword parameters is deprecated

In most cases, you can avoid the incompatibility by adding the double splat operator. It explicitly specifies passing keyword arguments instead of a Hash object. Likewise, you may add braces {} to explicitly pass a Hash object, instead of keyword arguments. Read the section “Typical cases” below for more details.

In Ruby 3, a method delegating all arguments must explicitly delegate keyword arguments in addition to positional arguments. If you want to keep the delegation behavior found in Ruby 2.7 and earlier, use ruby2_keywords. See the “Handling argument delegation” section below for more details.

Typical cases

Here is the most typical case. You can use double splat operator (**) to pass keywords instead of a Hash.

# This method accepts only a keyword argument
def foo(k: 1)
  p k

h = { k: 42 }

# This method call passes a positional Hash argument
# In Ruby 2.7: The Hash is automatically converted to a keyword argument
# In Ruby 3.0: This call raises an ArgumentError
  # => demo.rb:11: warning: Using the last argument as keyword parameters is deprecated; maybe ** should be added to the call
  #    demo.rb:2: warning: The called method `foo' is defined here
  #    42

# If you want to keep the behavior in Ruby 3.0, use double splat
foo(**h) #=> 42

Here is another case. You can use braces ({}) to pass a Hash instead of keywords explicitly.

# This method accepts one positional argument and a keyword rest argument
def bar(h, **kwargs)
  p h

# This call passes only a keyword argument and no positional arguments
# In Ruby 2.7: The keyword is converted to a positional Hash argument
# In Ruby 3.0: This call raises an ArgumentError
bar(k: 42)
  # => demo2.rb:9: warning: Passing the keyword argument as the last hash parameter is deprecated
  #    demo2.rb:2: warning: The called method `bar' is defined here
  #    {:k=>42}

# If you want to keep the behavior in Ruby 3.0, write braces to make it an
# explicit Hash
bar({ k: 42 }) # => {:k=>42}

What is deprecated?

In Ruby 2, keyword arguments can be treated as the last positional Hash argument and a last positional Hash argument can be treated as keyword arguments.

Because the automatic conversion is sometimes too complex and troublesome as described in the final section. So it’s now deprecated in Ruby 2.7 and will be removed in Ruby 3. In other words, keyword arguments will be completely separated from positional one in Ruby 3. So when you want to pass keyword arguments, you should always use foo(k: expr) or foo(**expr). If you want to accept keyword arguments, in principle you should always use def foo(k: default) or def foo(k:) or def foo(**kwargs).

Note that Ruby 3.0 doesn’t behave differently when calling a method which doesn’t accept keyword arguments with keyword arguments. For instance, the following case is not going to be deprecated and will keep working in Ruby 3.0. The keyword arguments are still treated as a positional Hash argument.

def foo(kwargs = {})

foo(k: 1) #=> {:k=>1}

This is because this style is used very frequently, and there is no ambiguity in how the argument should be treated. Prohibiting this conversion would result in additional incompatibility for little benefit.

However, this style is not recommended in new code, unless you are often passing a Hash as a positional argument, and are also using keyword arguments. Otherwise, use double splat:

def foo(**kwargs)

foo(k: 1) #=> {:k=>1}

Will my code break on Ruby 2.7?

A short answer is “maybe not”.

The changes in Ruby 2.7 are designed as a migration path towards 3.0. While in principle, Ruby 2.7 only warns against behaviors that will change in Ruby 3, it includes some incompatible changes we consider to be minor. See the “Other minor changes” section for details.

Except for the warnings and minor changes, Ruby 2.7 attempts to keep the compatibility with Ruby 2.6. So, your code will probably work on Ruby 2.7, though it may emit warnings. And by running it on Ruby 2.7, you can check if your code is ready for Ruby 3.0.

If you want to disable the deprecation warnings, please use a command-line argument -W:no-deprecated or add Warning[:deprecated] = false to your code.

Handling argument delegation

Ruby 2.6 or prior

In Ruby 2, you can write a delegation method by accepting a *rest argument and a &block argument, and passing the two to the target method. In this behavior, the keyword arguments are also implicitly handled by the automatic conversion between positional and keyword arguments.

def foo(*args, &block)
  target(*args, &block)

Ruby 3

You need to explicitly delegate keyword arguments.

def foo(*args, **kwargs, &block)
  target(*args, **kwargs, &block)

Alternatively, if you do not need compatibility with Ruby 2.6 or prior and you don’t alter any arguments, you can use the new delegation syntax (...) that is introduced in Ruby 2.7.

def foo(...)

Ruby 2.7

In short: use Module#ruby2_keywords and delegate *args, &block.

ruby2_keywords def foo(*args, &block)
  target(*args, &block)

ruby2_keywords accepts keyword arguments as the last Hash argument, and passes it as keyword arguments when calling the other method.

In fact, Ruby 2.7 allows the new style of delegation in many cases. However, there is a known corner case. See the next section.

A compatible delegation that works on Ruby 2.6, 2.7 and Ruby 3

In short: use Module#ruby2_keywords again.

ruby2_keywords def foo(*args, &block)
  target(*args, &block)

Unfortunately, we need to use the old-style delegation (i.e., no **kwargs) because Ruby 2.6 or prior does not handle the new delegation style correctly. This is one of the reasons of the keyword argument separation; the details are described in the final section. And ruby2_keywords allows you to run the old style even in Ruby 2.7 and 3.0. As there is no ruby2_keywords defined in 2.6 or prior, please use the ruby2_keywords gem or define it yourself:

def ruby2_keywords(*)
end if RUBY_VERSION < "2.7"

If your code doesn’t have to run on Ruby 2.6 or older, you may try the new style in Ruby 2.7. In almost all cases, it works. Note that, however, there are unfortunate corner cases as follows:

def target(*args)
  p args

def foo(*args, **kwargs, &block)
  target(*args, **kwargs, &block)

foo({})       #=> Ruby 2.7: []   ({} is dropped)
foo({}, **{}) #=> Ruby 2.7: [{}] (You can pass {} by explicitly passing "no" keywords)

An empty Hash argument is automatically converted and absorbed into **kwargs, and the delegation call removes the empty keyword hash, so no argument is passed to target. As far as we know, this is the only corner case.

As noted in the last line, you can work around this issue by using **{}.

If you really worry about the portability, use ruby2_keywords. (Acknowledge that Ruby 2.6 or before themselves have tons of corner cases in keyword arguments. :-) ruby2_keywords might be removed in the future after Ruby 2.6 reaches end-of-life. At that point, we recommend to explicitly delegate keyword arguments (see Ruby 3 code above).

Other minor changes

There are three minor changes about keyword arguments in Ruby 2.7.

1. Non-Symbol keys are allowed in keyword arguments

In Ruby 2.6 or before, only Symbol keys were allowed in keyword arguments. In Ruby 2.7, keyword arguments can use non-Symbol keys.

def foo(**kwargs)
foo("key" => 42)
  #=> Ruby 2.6 or before: ArgumentError: wrong number of arguments
  #=> Ruby 2.7 or later: {"key"=>42}

If a method accepts both optional and keyword arguments, the Hash object that has both Symbol keys and non-Symbol keys was split in two in Ruby 2.6. In Ruby 2.7, both are accepted as keywords because non-Symbol keys are allowed.

def bar(x=1, **kwargs)
  p [x, kwargs]

bar("key" => 42, :sym => 43)
  #=> Ruby 2.6: [{"key"=>42}, {:sym=>43}]
  #=> Ruby 2.7: [1, {"key"=>42, :sym=>43}]

# Use braces to keep the behavior
bar({"key" => 42}, :sym => 43)
  #=> Ruby 2.6 and 2.7: [{"key"=>42}, {:sym=>43}]

Ruby 2.7 still splits hashes with a warning if passing a Hash or keyword arguments with both Symbol and non-Symbol keys to a method that accepts explicit keywords but no keyword rest argument (**kwargs). This behavior will be removed in Ruby 3, and an ArgumentError will be raised.

def bar(x=1, sym: nil)
  p [x, sym]

bar("key" => 42, :sym => 43)
# Ruby 2.6 and 2.7: => [{"key"=>42}, 43]
# Ruby 2.7: warning: Splitting the last argument into positional and keyword parameters is deprecated
#           warning: The called method `bar' is defined here
# Ruby 3.0: ArgumentError

2. Double splat with an empty hash (**{}) passes no arguments

In Ruby 2.6 or before, passing **empty_hash passes an empty Hash as a positional argument. In Ruby 2.7 or later, it passes no arguments.

def foo(*args)

empty_hash = {}
  #=> Ruby 2.6 or before: [{}]
  #=> Ruby 2.7 or later: []

Note that foo(**{}) passes nothing in both Ruby 2.6 and 2.7. In Ruby 2.6 and before, **{} is removed by the parser, and in Ruby 2.7 and above, it is treated the same as **empty_hash, allowing for an easy way to pass no keyword arguments to a method.

In Ruby 2.7, when calling a method with an insufficient number of required positional arguments, foo(**empty_hash) passes an empty hash with a warning emitted, for compatibility with Ruby 2.6. This behavior will be removed in 3.0.

def foo(x)

empty_hash = {}
  #=> Ruby 2.6 or before: {}
  #=> Ruby 2.7: warning: Passing the keyword argument as the last hash parameter is deprecated
  #             warning: The called method `foo' is defined here
  #=> Ruby 3.0: ArgumentError: wrong number of arguments

3. The no-keyword-arguments syntax (**nil) is introduced

You can use **nil in a method definition to explicitly mark the method accepts no keyword arguments. Calling such methods with keyword arguments will result in an ArgumentError. (This is actually a new feature, not an incompatibility)

def foo(*args, **nil)

foo(k: 1)
  #=> Ruby 2.7 or later: no keywords accepted (ArgumentError)

This is useful to make it explicit that the method does not accept keyword arguments. Otherwise, the keywords are absorbed in the rest argument in the above example. If you extend a method to accept keyword arguments, the method may have incompatibility as follows:

# If a method accepts rest argument and no `**nil`
def foo(*args)
  p args

# Passing keywords are converted to a Hash object (even in Ruby 3.0)
foo(k: 1) #=> [{:k=>1}]

# If the method is extended to accept a keyword
def foo(*args, mode: false)
  p args

# The existing call may break
foo(k: 1) #=> ArgumentError: unknown keyword k

Why we’re deprecating the automatic conversion

The automatic conversion initially appeared to be a good idea, and worked well in many cases. However, it had too many corner cases, and we have received many bug reports about the behavior.

Automatic conversion does not work well when a method accepts optional positional arguments and keyword arguments. Some people expect the last Hash object to be treated as a positional argument, and others expect it to be converted to keyword arguments.

Here is one of the most confusing cases:

def foo(x, **kwargs)
  p [x, kwargs]

def bar(x=1, **kwargs)
  p [x, kwargs]

foo({}) => [{}, {}]
bar({}) => [1, {}]

bar({}, **{}) => expected: [{}, {}], actual: [1, {}]

In Ruby 2, foo({}) passes an empty hash as a normal argument (i.e., {} is assigned to x), while bar({}) passes a keyword argument (i.e, {} is assigned to kwargs). So any_method({}) is very ambiguous.

You may think of bar({}, **{}) to pass the empty hash to x explicitly. Surprisingly, it does not work as you expected; it still prints [1, {}] in Ruby 2.6. This is because **{} is ignored by the parser in Ruby 2.6, and the first argument {} is automatically converted to keywords (**kwargs). In this case, you need to call bar({}, {}), which is very weird.

The same issues also apply to methods that accept rest and keyword arguments. This makes explicit delegation of keyword arguments not work.

def target(*args)
  p args

def foo(*args, **kwargs, &block)
  target(*args, **kwargs, &block)

foo() #=> Ruby 2.6 or before: [{}]
      #=> Ruby 2.7 or later:  []

foo() passes no arguments, but target receives an empty hash argument in Ruby 2.6. This is because the method foo delegates keywords (**kwargs) explicitly. When foo() is called, args is an empty Array, kwargs is an empty Hash, and block is nil. And then target(*args, **kwargs, &block) passes an empty Hash as an argument because **kwargs is automatically converted to a positional Hash argument.

The automatic conversion not only confuses people but also makes the method less extensible. See [Feature #14183] for more details about the reasons for the change in behavior, and why certain implementation choices were made.


This article was kindly reviewed (or even co-authored) by Jeremy Evans and Benoit Daloze.


  • Updated 2019-12-25: In 2.7.0-rc2, the warning message was slightly changed, and an API to suppress the warnings was added.

Posted by mame on 12 Dec 2019

Read the whole story
63 days ago
Share this story

That packet looks familiar, and that one, and that one...


Sometimes I tell stories about service outages from the point of view of it having already happened, and provide this kind of "omniscient narrator" perspective as it goes along. I can tell you to look out for things, since they'll come in handy later, and so on.

Other times, I try to present them in much the same way that they happened, complete with the whole "fog of war" thing that goes with being in the moment. This is where you can't see obvious things that are right in front of you because they seem too ridiculous to contemplate.

This is one of the latter, based on multiple events I've lived through. Yes, multiple: it's happened at several distinct companies that I've either worked for, contracted for, heard in stories from friends, or read somewhere over the years.

So, let's dive in.

You're at work at the office. It's a normal business day. Lots of other people are also at the office, doing their regular jobs. Much of this work involves poking at vaguely Unix-flavored boxes which are not physically on the premises. They're somewhere relatively far away, and so you have to telnet, ssh, or whatever to get to them. The point is, you're having to cross at least one "WAN" link to reach these things.

Things are going okay. But then it seems like everything is getting slower and slower. Interactive connections are lagging: keystrokes don't echo back quickly any more. Web browsing and other "batch" network stuff is dragging: the "throbber" in the browser is busy for far too long for ordinary activities. The chat system gets slower and slower.

You manage to hear from other people via the chat system ever so briefly. Some of them are seeing it as well. Others aren't sure, but at least you know it's not just you.

Over the next minute or so, the whole thing grinds to a halt. EVERYTHING is now dead. Nothing's working.

Depending on what decade it is, you whip out a modem and dial back in to the corporate network remote access pool, jump on the guest wireless network, or start tethering through your cell phone and get back in touch with whoever managed to stay online. You find others who have done the same thing, and it's clear there's a very big problem, but only where you are. The rest of the organization's offices are fine, but everything where you are is toast.

Someone immediately thinks it's the link to the Internet and picks up the phone to the ISP to yell at them, because it has to be them. It's the network. It's always the network. Particularly when there's someone else to yell at, right? That happens in the background.

Maybe an hour into this, someone unrelated to the problem but who knows their way around a network makes a curious observation: they have a hard-wired connection, and they are seeing a LOT of crazy traffic going past their host. It's way more multicast and broadcast traffic than they are used to. (Apparently, they have done this before, and have their own internal metric for what "normal" is, which is particularly valuable now. Otherwise, how would they know?) Hooray for tcpdump!

Some of the broadcast traffic identifies its origin. More than just a MAC address or a source IP address, some of it (depending on the decade again) is NetBIOS over TCP/IP broadcasts, or MDNS broadcasts, or whatever, and they helpfully have the names of the hosts embedded. These are workstations, and they include some aspect of the human user's unixname, so you can go "oh, that's so and so".

Did that person just unleash a hell torrent on the network? Better go check. They're right around the corner on the same floor, so why not? You get there, and they're very friendly, and no, they aren't doing anything funny. Their host DOES have the IP address and MAC address corresponding to some of the packets flying past the other box over and over, but they swear up and down they didn't do anything.

You ask nicely if they'll disable their network interface(s) temporarily just for a while to attempt isolating things, and they agree. They even do you one better and power the whole machine off and will leave it off until you give the all-clear. They even unplug the network cable(s) to thwart any Wake-on-LAN magic. There's no way it's going to cause you any trouble now.

You walk back around the corner to the first person's machine where they spotted all of of the bcast/mcast traffic, expecting it to have subsided. It hasn't. Indeed, it's still running full tilt, and then you notice in the spew that the host you just disabled is still in the packet traces. This box has its NICs disabled, is unplugged from the hard-wired network AND is powered off... and yet there are Ethernet frames going by which claim to be from it.

At this point, people who have lived through this before probably know what happened. They might not know how, or why it was able to happen, but somewhere deep down inside, they have this sinking feeling that someone did something terrible to the network and this is how it's manifesting.

The story diverges from here based on which version of it I've either lived through or heard about, but it goes along similar lines in any case.

Someone introduced a loop to the network. The most likely case is that a person showed up in the morning and went crawling under their desk to plug something in, like a power cord. Maybe it "came unplugged" overnight or over the weekend, and they're fixing it.

Then, while they're down there, they see this random Ethernet patch cable just kind of hanging out on the floor. It looks like it should be plugged in to one of the ports down there, but it's not. They plug it in to "do that person a solid" so they don't have to waste time also dealing with unplugged stuff.

Of course, as it turned out, the loose cable end was one that was ON TOP of a desk, and had fallen down. It was the end that went into a computer, and did not need to be plugged "back in"... for the other end of that same cable was already plugged into a port!

Taking that cable and jamming it into another port introduced a loop.

Again, here, a bunch of people are jumping up and down, looking for the comment form, or popping back to the Hacker News tab to say something. Patience. We'll get there.

Radia Perlman figured this out a long time ago. It's called spanning-tree protocol, and it's how your network can detect and defend against such clownery as someone looping it back onto itself.

But, again, the story branches. One time, it didn't exist yet in the networking equipment at the company. Another time, it did, but the people running the network had no idea what it was, or why they'd use it. A third time, they knew about it and decided to "not use it" since "nobody here is that stupid". Yet another time, they had just turned it off days earlier because "it was breaking stuff, and things were fine without it".

For the benefit of those who haven't lived through it yet, here's what happens. Let's take a simple loop where you've managed to plug two ports on the same switch into each other. We'll say that ports 15 and 16 are looped back.

Someone, somewhere, sends out a broadcast, multicast, or possibly even a unicast packet (for a destination not yet mapped to a port). The switch (or hub, if you're "back in the day"...) takes that packet and floods it out to all ports except the one where it came from. Maybe it came from its uplink (port 24), and so it sends the packet down ports 1-23, skipping 24.

The packet goes out port 15, makes that neat little hairpin turn, and arrives back at the switch on port 16 at nearly the same time. Assuming it's a broadcast, multicast, or a still-unknown unicast address, we're right back where we started: the switch has to take THIS packet, then, and flood it out to all of its ports less the source port. It gets sent to 1-15 and 17-24, skipping 16. Hold that thought right there for now.

Back up in time a few nanoseconds. That packet ALSO went out port 16, made the hairpin turn, passing itself going the other direction (possibly on the same wires, if in in full-duplex mode), and showed up back at the switch on port 15. At that point, the switch proceeded to flood it out to 1-14 and 16-24.

So you see, every time the packet leaves 15, it arrives at 16, and every time it leaves 16, it arrives at 15. If this was a one-way situation, you'd "merely" have a train of packets flying around forever, never dying, because there's no such thing as a TTL at this layer of your network. But, since it's happening in both directions, you actually get double the fun.

None of these packets stop being forwarded, and new ones show up, so before long, your switch is doing nothing but spraying those frames everywhere just as fast as it can. Eventually you starve out all other traffic, and everything probably grinds to a halt.

Eventually, the organization learns about things like switches, STP, 802.1x, and not having massive broadcast domains, and that crazy chapter of their history ends. However, just down the street, yet another company is waiting to add it to their own history.

So, here's a challenge for anyone who's managed to get this far: pick a nice quiet time outside of business hours, declare a maintenance window, then go deliberately loop back some Ethernet ports. Look in offices, conference rooms, and on the backs of things like VOIP phones. Be creative. See if your network actually survives it.

Or, you know, don't. If you don't test it, someone else will do it for you... eventually. They won't wait for a quiet time and won't declare a maintenance window, and they sure won't know to unplug it when things go sideways, but that's life, right?

Oh, finally, don't forget to go back to that nice person and tell them that they are okay to plug their machine back in, power it up, and re-enable the network. Then also tell them they didn't cause the whole business to tank for multiple hours, because they might be sweating bullets that somehow the whole thing will land on their head and poison their next performance review.

People deserve to know when they didn't cause a problem, particularly when they think they did! Don't forget that.

Read the whole story
84 days ago
Sydney, Australia
84 days ago
Share this story

Fresh pcaps, free for the asking


One thing network people hate is having to get access to a machine (or groups of them) in order to run tools like tcpdump. They want a packet trace from somewhere, and having to obtain not just ssh access but sufficient powers to open a raw socket and start sniffing (typically root-level) is a pain. It should be unsurprising that someone eventually wrote a tool which solved the problem for them.

How did it work? Well, the machines in question (all of them) were already running some "agent" software which logged interesting things about the network activity of each host. It had root powers on the machines, and took requests from the network. Adding the packet trace capability was just a matter of extending what was already there.

The code was written, it was attached to the service, and it was shipped with little fanfare. It just kind of hung around after that point, doing its job while presumably reducing some of the more annoying angles for the humans.

Then, one Friday afternoon, rather late, someone decided it was time to post one of those feel-good announcements about what they had accomplished. This was quite a while after it had shipped, but maybe performance reviews were coming up? Nobody's really sure why it was announced at that point, or why it was late on a Friday, but that's what happened.

A few minutes later, one of those people who looks at everything and thinks "is that safe, or could someone hurt themselves or others with it" saw the post and the wheels started turning. Instead of going and digging in to the code (which would require pulling out the laptop and getting back online), they stayed on their little mobile app and replied to the post, instead. They asked some questions: who has access to this, and how is that enforced, exactly?

The hope was to get an answer back like "the network team, specifically anyone in LDAP group X" for the first part, and "we use a (tech) (tech) in the (tech) (tech) part of (tech) to verify their identity". However, unlike what happens with Star Trek scripts, the "tech" items here would actually be appropriate and solid for the task at hand. They'd look even smarter and people would feel good about the whole thing: a tool that saves time and doesn't open more holes!

The answer which came back, to put it charitably, fell short.

"We went and looked... turns out... everyone."

They had added the method to the service but hadn't added any authentication to determine who was making the request, or authorization to make sure they had the right to make it. As a result, anyone could switch it on and get tasty, tasty packet traces from any box in the fleet.

All you needed to do was send a properly-formatted network packet, and it would do the work for you. Want to see what's happening on a sensitive system that handles money, or HR data, or really, anything else run on these machines? Turn on a packet trace and hope they "forgot" to encrypt their traffic in flight.

As you probably guessed, this was unacceptable, and so, that Friday evening, the team which produced the service, the security responders, and that engineer who tends to look for the holes in things didn't get to knock off work for the weekend. Instead they had to figure out how to turn it off (even if that meant breaking things which relied on it), get that pushed out everywhere ASAP, and then see if they can leave it like that until Monday, or if they have to actually solve for the identity and permissions stuff right then and there.

This also meant people had to go digging to (try to) see if it had already been exploited. Not everyone waits for the announcement. Some people just go around thinking "hmm, what's already listening on the network and has interesting powers and curious-sounding RPC methods?", and those people would be way ahead of you.

Finally, there's the human angle to this. Obviously, a SEV was opened for this. The "user impact" field was written to be something like this by the engineer who discovered it:

Any host running (tech) since (commit hash) which was released (month) (year) would have provided packet dumps to anyone who asked politely via (tech).

Someone (who had worked on the service) couldn't leave it alone. They had to edit it. They added this:

It is perceived to have security risks associated with such data export.

In other words, "someone annoying thought this is a big deal and that's why we're working late tonight and into the weekend".

Side note: the code change which introduced this had a comment from one person saying "you know this might be a bad idea, right?"... but it was committed and shipped anyway. Oh, random commenter, if only they had listened to you.

Read the whole story
130 days ago
Sydney, Australia
131 days ago
Share this story

Engineers without borders, silos, and vendor walls

1 Share

I've seen a bit of variety in how different companies handle engineering problems. Some are big. Others are small. Some build things, and others don't. For some places, the tech is the product, but at others, it's a distraction at best. This is just to say that no one solution is expected to solve every problem, but it's still possible to find things that generally work better than others.

Let's talk about communication and teamwork. If you think those are generally good to have in your organization, you might be prone to make decisions to maintain what you already have and perhaps grow more. Keep that in mind as I describe some scenarios.

Scenario number one is a company which has a product which then depends on a bunch of different services. The product isn't important - it could involve cat pictures, influencing elections, or delivering pizzas. We'll say they have hard dependencies on 20 different services which must all be up and running for the business to function. Maybe one service is a database, another one is a cache, a third one is a group of web servers, a fourth tracks "upvotes", "likes" or "stars", and so on.

In this organization, the 20 services are owned by 20 teams in a 1:1 mapping. Team A owns service A, team B owns service B, and so on down the line. Some of these teams are quite good at what they do, and rarely cause trouble for the business. We'll even say that most of them are pretty solid, and generally don't break things.

There are, however, three services which have significant challenges. They fall over a lot. They stop working. They take down the whole operation. It doesn't matter how reliable the other 17 parts are, since one of these "terrible three" will show up and wreck the whole thing.

Fortunately, this company is enlightened, and has embraced a philosophy where people are able to talk about these things openly, and reflect on the problems and not the people involved. More to the point, the best engineers from other parts of the company are able, willing, and allowed to "wander around" into the troubled parts to help out. This kind of knowledge transfer happens, and pretty soon the roughest spots have been smoothed out, and now everything's a lot better. The cat pictures keep going out, the elections keep being hacked, and the pizzas show up on time. Life's pretty good.

Now let's jump into scenario number two. It's another company that has similar hard dependencies. They also have 20 services which must be up and running to make their business "go". Those 20 services are also run by 20 separate teams, as before, with one team owning one service. Most of them are also pretty good, and only a few of them have issues.

Unfortunately, that's the end of the similarities. Company number two has this "siloing" thing going on. If you're not familiar with the term, consider yourself lucky. It's when a company isolates one thing (a system, department, etc) from another. Well, this company has a LOT of siloing happening. People are very possessive of their services, and their team's "turf". Outsiders are to be suspected, since they probably have some nefarious intent in mind. The fact that all of these people nominally work for the same company with the same supposed goal in mind doesn't matter.

The corporate firewalls are up, they're numerous, and they're pretty tough. If a good person is on one side of the wall, and there's an interesting problem on the other that they could fix, the wall is going to prevent it. There will not be any cross-pollination between the teams. The best you might see is a few individuals moving around between teams in a shared silo, but you'll never see anything more than that.

The organization has been set up with defenses against this. People who try to poke their nose in, even if they have the best possible goals in mind, are the enemy. They must be blocked or reported. They definitely must be thwarted. They must not see the weaknesses in the system because it will be used to attack somehow and take over part of the turf!

Imagine running a company this way, where you can get these protected realms where the employees are "untouchable" and also somehow unable to deliver on the same requirements as everyone else. Worse still, you find yourself unable to send in helpers since the immune system will flare up and attack them in every way imaginable.

In such an environment, I'd expect it to become obvious by way of the general sense of malaise, acceptance of the status quo, and a lot of shrugs - "it's always been like that". "They're protected". "Those requirements don't apply to them". Maybe you've heard some of these before.

I'd hope that most people would try to avoid this with their companies, and part of that would involve breaking down the walls between groups which aren't mixing it up enough. The whole notion of engineers moving around and sharing their talents with whoever needs it has to be supported and rewarded.

Now I'm going to throw another complication into the mix with scenario three. This is a third company which amazingly also relies on 20 different services staying up and available the whole time for their business to work.

But, unlike company #1 and company #2, this one has a completely different approach to services. By default, they go out to Hacker News, find the shiniest thing in that space, and then pull out their checkbooks and throw money at whatever products they find on this romp through the web. They end up with 20 different services, sure, but they are coming from 20 different vendors.

Maybe most of the vendors are reasonably competent and so you never notice problems with them. They take your money and you get some level of service in return. It's good enough and so you probably forget about them. This is akin to the 17 or so good ones at the other two companies.

There's always a catch, though. Three of these services are garbage, just like at the other two companies. Their vendors are just not good at doing whatever they are purportedly selling. They're consuming lots of customer money and are delivering fatal errors, timeouts, or worse.

When this kind of thing happens at company 1, there's always the option of retasking some of the better engineers to do a field trip to the troubled teams to bail them out. If it happens at company 2, the silo walls prevent it, but those silo walls can in theory be torn down, and then they can turn into company 1.

But company 3? That's a different kind of trouble. All of their services are on the other side of a vendor relationship. Maybe some of those vendors are best-in-class, and are amazing -- it happens! But, some of those vendors definitely are not. Do you think company 3 can convince the awesome vendors to send engineers over to the terrible vendors to fix things up? Yeah, I don't think so. That's not going to happen. It's the ultimate silo, and you're never going to break it.

Would you ever willingly build a company like #2? I hope not. But, if you build a company like #3, you've basically done it anyway: you get all of the walls and none of the opportunities to tear them down later.

There are plenty of other gotchas which come with each of these three scenarios, but I'll leave it there for now. It boils down to this: the proliferation of interconnected vendors leads to a loss of control, and few things are more depressing than that.

Read the whole story
132 days ago
Share this story
Next Page of Stories