Java Apache Arrow: Read and Write to a ListVector

Jeremy Bohrer
2 min readDec 19, 2020
Apache Arrow Logo

In this tutorial we go over how to read and write to a ListVector using Java Apache Arrow.

In my free time I have been reading and coding with the Java implementation of Apache Arrow. Based on their docs and posts I found online it is a great tool to use for columnar data and in memory processing of that data. They have ok documentation on reading and writing to basic vector types, but I found that it did not cover some of the more complex types such as the ListVector, so I wrote up this guide to go over some of its intricacies.

The first step to use any arrow vector is to create one.

Initiate ListVector

Once you have a list vector you can then use a UnionListWriter to write to the vector. In my basic example I wanted to store an array of arrays like this: [[0, 0, 0, 0, 0], [0, 1, 2, 3, 4], [0, 2, 4, 6, 8], …, [0, 9, 18, 27, 36]]

When you want to create a new list in a vector you call writer.startList() and writer.setPosition(i). To add data to your list you call writer.writeInt() or whatever data type you want to add. To end the list make sure to call writer.endList().

Write to ListVector

You can read from the ListVector using both their get API and through the UnionListReader. In both cases you iterate over the value count that was set while writing to the vector. If you use the get API you can access your data using ValueVector.getObject() and then iterating over the results.

Read from ListVector using get API

If you use the reader, you access your data by setting the reader position and then iterating over the reader until there are no more elements. The position corresponds with the position you set while writing to the vector.

Read from ListVector using reader

Well there you have it, we can read and write to an Arrow ListVector using Java. I have some mixed feelings after using this format since I think the API could be cleaned up a bit to make it easier to read and write to vectors. But I guess that’s a project for later. A full example of the code discussed in this tutorial can be found here.

--

--