25.1 C
New York

Serializing Data Within Large Integers


Ethereum and Dapps uses bitwise operations to serialize and de-serialize multiple integers within a single 256 bit token ID

Sean Lawless
Shifting bits in a large binary integer to reveal smaller integers (Image courtesy of Gerd Altmann from Pixabay)

Often data-size optimization is overlooked, or readability is preferred over data size. However, when paying for data storage on a blockchain, size does matter. Ethereum smart contract integers occupy 256 bits of space, which is typically larger than needed. A 256 bit integer can represent eight (8) different 32 bit integer values or 32 different 8 bit values. With gas prices dependent upon data size, it is important to utilize all bits of the integers you are paying to store on the blockchain.

Visualizing multiple smaller integers within a single 256-bit integer

To be clear, the additional complexity of serializing smaller integers within a larger integer should be taken on with care. Only logically connected data should be grouped together in this way. During the development of the world’s first decentralized software application store, the nonprofit ImmutableSoft found this path to be full of unexpected obstacles. This is their story of why, where, and how this technique was applied.

ERC-721 token IDs are 256 bit integers that must be unique (known as non-fungible in blockchain speak). The Immutable Ecosystem decentralized application (Dapp) created by ImmutableSoft defines their software license activation as an ERC-721 token. To represent a software license, it must store the entity (organization, individual, etc.) and product identifiers. Additionally, activation limitations such as the expiration, version, and languages must be immutably recorded on the blockchain for each activation.

At first it did not appear possible to represent all this information with only 256 bits. And this would be fine since it is easy to extend our ERC-721 token to store additional data for each individual token. However, additional storage increases gas costs and moves immutable data outside the ERC-721 standard. We felt it was important that our tokens represent the entire software activation accurately within the standard token ID so that the token would be understandable even within other ERC-721 exchanges. With plans to draft an Ethereum Improvement Proposal (EIP) for Activations finalized, careful design was prudent.

To start with, we first examined all of our activation variables and defined a minimum integer size for each. For example, it was reasonable to assume that 32 bits were enough to identify all registered entities of the Immutable Ecosystem. The same for the product identifier. Even the most prolific reseller would find a 32 bit integer sufficient to identify all of their products. The expiration time could similarly be represented with a 32 bit integer using the common seconds-from-epoch representation (the C standard library time() function).

Representing a version took some soul searching. Our compromise was to use four (4) different 16 bit integers to represent the version. In string dot notation, this would look like, with each digit represented as a 16 bit unsigned integer (max value 65,535). The entire version field totaled 64 bits. So far 160 bits of the token ID have been utilized with 96 bits still available for use.

For flexibility and future upgradability, it was clear we needed a flags field to identify what type and features the activation token represents. Based on which flag bits were set (bit value one), the data layout of the 256 bit token ID could change. We did this initially so our ERC-721 token could represent features within the application, beyond or instead of an activation of the executable. For example, unique game items can be easily represented as activations within the ecosystem. Whether game items are purchasable or found in-game only, online games can move their items to the blockchain using the Immutable Ecosystem with no blockchain experience. The Feature flag is defined to identify an application feature. But for the remainder of this discussion, we focus on the Limitation and Expiration flags.

To represent the language limitations, it was important that each be represented with its own bit so that multiple languages could be supported within a single software license activation. After deciding to use 64 bits, each bit to represent a different language, we found ourselves utilizing all of the 256 bits of the token ID. Here is what the bit layout now looked like in our smart contracts. The most significant 128 bits are supported by all Software License Activation tokens, the least significant 128 bits are specific to the defined Flags. In this case, the Limitation flag defines the last bits (or LSBs) to hold the languages and version limitations.

First attempt to define the structure of the Activate Token’s 256 bit token ID

We generally applauded our success and reveled in our ingenuity — prematurely, it turned out.

While auditing and testing, we discovered that this solution was not enforcing the uniqueness (non-fungibility) requirement for ERC-721 tokens. The tokens in our design would be unique for common tests and use cases, but many (unknown) corner cases existed where the generated token ID could be identical to another. For example, if two people were to purchase an activation offer at the same time, then the expiration would be identical, resulting in a transaction failure to mint the new token — a blockchain failure that could cost the software creator revenue. It was also possible for a software creator to specify an offer with no expiration. It was this use case where the problem became very obvious and was detected. After the first purchase with this offer, subsequent purchases would fail at the mint stage as the token ID was not unique. Only the expiration time was providing the tokens with uniqueness.

Back to the drawing board. We reviewed all the data fields to identify how we could introduce uniqueness to each activation. Push came to shove, and the Flags size was reduced to 16 bits to make room for a unique ID. Using 16 bits for a product-specific incrementing counter (nonce) ensured uniqueness. With each new token increasing this value, it can also be used to aid recordkeeping and report generation.

The last piece of our puzzle was the platform type flags for the activation. Does the software license activation support Windows OS, Mac OS, Linux, or some combination of the above? We were so close to fitting everything, we gave this effort one more try, reviewing the Languages and Version fields. With global aspirations, we understood we could, but did not want to, limit the number of possible languages. And the Version appeared impregnable, but was it?

Already optimized and used to identify product releases, the Version field could not change. However, it was argued that the last digit of the Version should never be defined as a limitation of an activation, the reason being that all software should allow new versions to account for bug fixes. As a software distributor, you never want to box yourself into a corner where you must relicense your customer(s) in order to provide a bug fix. So four digits of Version were needed for a release, but the last digit of the Version was now up for grabs in the activation!

But was 16 bits enough to hold all the Platform bits? It turns out that, as currently defined, there were 15 platforms — with one bit left for the future. This was close, too close. With the Flags field still available for upgrades, we became more comfortable with this tight fit and any pivot going forward.

Second revision of the structure of the Activation Token’s 256 bit token ID

Once the structure of the token ID was finalized, we defined constants within the smart contract (Solidity) so that we could extract, or de-serialize, the individual integers from within the whole. By performing a bitwise And followed by bit shifting the Offset to the right, we extract, or de-serialize, the individual integer. To visualize, put your fingers on the lines of the field you want above (the And operation) and then shift the number to the right until it is flush with the end (right-shift the Offset). Here are the defined Offsets and Masks that correspond to the structure in the image above.

 // Offset and mask of entity and product identifiers
uint256 constant EntityIdOffset = 224;
uint256 constant EntityIdMask = (0xFFFFFFFF <<
uint256 constant ProductIdOffset = 192;
uint256 constant ProductIdMask = (0xFFFFFFFF <<

To use the Masks and Offsets above, first perform the bitwise And operation to apply the Mask and then shift the result to the right by the Offset number of bits. Here is an example to extract the expiration from the activation token ID.

// token_id is the value of the activation token id

Once the smart contract was representing all data fields within the token ID, it became necessary for our decentralized application (Dapp) to read (de-serialize) and write (serialize) the individual values to and from the single 256 bit token ID. JavaScript, like most programming languages, does not support large integers or integer constants, so to make things easier, we will perform the bitwise operations in reverse order to the Solidity example above.

Using the bitwise operation, Shift Right before the And operation allows the use of a small local mask defined by the size of the integer to de-serialize. The JavaScript 32 bit integer constant (‘0xFFFFFFFF’), or 16 bit integer constant (0xFFFF) are used to Mask that size integer after shifting. By Shifting Right a number of bits equal to the Offset and then applying the local Mask with the And operation, each smaller integer from within the larger one can be isolated and assigned to a separate variable for use by your program.

For our Dapp we used Node.JS and the big-integer package. To install with npm:

npm install big-integer

And to include in your code:

var bigInt = require("big-integer");

To create a big integer for use by your JavaScript code the bigInt() constructor must be used. Smart contract calls return a large 256 bit integer as a hexadecimal string. To begin to de-serialize this large integer, pass the hex string directly to the bigInt() constructor to create a 256 bit integer of the token ID in JavaScript. Then use the shiftRight() operation, in conjunction with the and() operation, to de-serialize the smaller integers from within the larger 256 bit integer. Each individual integer encoded in the larger 256 bit integer can now be parsed (de-serialized) and presented to the user or otherwise used by the Dapp.

 //First create the bigInt() of the token id
// tokenId is the response from the blockchain
var value = bigInt(tokenId);

Since JavaScript does not support 64 bit integer constants, the Mask for the Languages and Version field required that we first create a bigInt() to use as the bitwise Mask. With this mask in place, we complete the de-serialization of the 256 bit activation token ID.

 // JavaScript has no 64 bit constant support, so create
// a bigInt to hold our 64 bit mask of all ones
var all32bits = bigInt('0xFFFFFFFF');
var all64bits = all32bits.or(all32bits.shiftLeft(32));

To serialize the smaller integers back into the larger 256 bit integer, we do the reverse bitwise operations. Using bitwise Left Shift operations, we create new bigInt() variables of each smaller integer but with the bits shifted to their final position within a 256 bit integer value. Once all the individual integer values are converted to big integers and in their correct bit locations, we perform the bitwise Or operation to put them all together. Here is how we did it.

 // entityID, productID, preventResale, duration and
// limitation from user input (not shown)

Well, there you have it, the blueprint for how to serialize and de-serialize multiple smaller integers into and out of a single large 256 bit integer. I hope this walkthrough helps illuminate a path for others with similar problems. With some preparation, it is not that difficult to encode and decode multiple integers within a single 256 bit integer using serialization. Cheers!

Source: https://medium.com/better-programming/serializing-data-within-large-integers-433684c8e7cd?source=rss——-8—————–cryptocurrency

Related articles


Recent articles