Using structures for network and file formats

One of the more common uses of structure data types is to encapsulate formatted binary data, such as it is stored in file formats or sent in network messages. Strictly speaking, the C language doesn’t make enough guarantees about how structures are laid out at the binary level for this to really be portable across different architectures—as persistent files and network communication must be—but with an adequate understanding of endianness, alignment, padding, etc., in practice they make a very convenient tool.

For example, let us consider a file format that consists of a list of points in three-dimensional space. Each coordinate will be represented as an unsigned four-byte integer. Since each point is specified by 3 coordinates, each point will occupy 12 bytes, and the whole file, as a list of \(n\) such points, will be \(12\cdot n\) bytes long.

Given that format, here are a pair of programs, one to scatter a given number of random points and produce the file representing them, and another program to read in that file format and determine which pair of points from that file are closest. They use the htonl and ntohl functions from <arpa/inet.h> to convert between ‘host’ and ‘network’ order, where network order is big-endian. If you only use this file format on one architecture, this would not be necessary, and if your architecture is already big-endian there is no byteswapping needed.

To scatter the points, there is no actual use made of structures, and the file is written out as a sequence of the appropriate number of coordinates, 3 coordinates per point.

#include <arpa/inet.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

int main(int argc, char **argv)
{
    FILE *f;
    const char *filename;
    int i;
    int n;

    if (argc != 3) {
        fprintf(stderr, "usage: %s FILENAME N\n", argv[0]);
        return 1;
    }

    srand(time(NULL));
    filename = argv[1];
    n = atoi(argv[2]);
    f = fopen(filename, "w");
    if (!f) {
        fprintf(stderr, "unable to write file %s\n", filename);
        return 1;
    }

    for (i = 0; i < 3 * n; i++) {
        uint32_t ord = rand() % 101;
        ord = htonl(ord);
        fwrite(&ord, sizeof(ord), 1, f);
    }

    fclose(f);
}

You might generate a few files with this program and look at them in xxd to get a feeling for the format. This is written to generate points with coordinates from 0 to 100 inclusive, but you could change that to distribute the points differently.

The main trick on display with these programs is with how the closest points programs reads in the data. It uses fseek, ftell, and rewind to determine how long the file is (jumps to the end, tells where it is, then goes back to the beginning). Assuming that length looks like a proper multiple of the size of a point, then an array is dynamically allocated and the whole file is read at once into the memory of that array. Since the raw bits will be in network byte order, a loop goes over and does any necessary byte swapping. Then, the data has been loaded into the array of structures.

#include <arpa/inet.h>
#include <inttypes.h>
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

struct point {
    uint32_t x;
    uint32_t y;
    uint32_t z;
};

double distance(struct point a, struct point b)
{
    a.x -= b.x;
    a.y -= b.y;
    a.z -= b.z;
    a.x *= a.x;
    a.y *= a.y;
    a.z *= a.z;
    return sqrt(a.x + a.y + a.z);
}

int main(int argc, char **argv)
{
    FILE *f;
    const char *filename;
    double close;
    int i;
    int j;
    int n;
    int p1;
    int p2;
    long bytes;
    struct point *points;

    if (argc != 2) {
        fprintf(stderr, "usage: %s FILENAME\n", argv[0]);
        return 1;
    }

    filename = argv[1];
    f = fopen(filename, "r");
    if (!f) {
        fprintf(stderr, "unable to read file %s\n", filename);
        return 1;
    }

    fseek(f, 0, SEEK_END);
    bytes = ftell(f);
    if (bytes % sizeof(points[0]) != 0) {
        fprintf(stderr, "file is not a valid length\n");
        fclose(f);
        return 1;
    }
    n = bytes / sizeof(points[0]);
    rewind(f);

    points = malloc(bytes);
    fread(points, sizeof(points[0]), n, f);
    fclose(f);

    for (i = 0; i < n; i++) {
        points[i].x = ntohl(points[i].x);
        points[i].y = ntohl(points[i].y);
        points[i].z = ntohl(points[i].z);
    }

    close = HUGE_VAL;
    for (i = 0; i < n; i++) {
        for (j = i + 1; j < n; j++) {
            double d = distance(points[i], points[j]);
            if (d < close) {
                close = d;
                p1 = i;
                p2 = j;
            }
        }
    }

    printf("(%"PRIu32", %"PRIu32", %"PRIu32") ", points[p1].x, points[p1].y, points[p1].z);
    printf("(%"PRIu32", %"PRIu32", %"PRIu32") ", points[p2].x, points[p2].y, points[p2].z);
    printf("%f\n", close);
}
You have attempted of activities on this page