How to export a Postgresql table as CSV with Node.js Streams

Hi all, today in “How to make happy a BI user” chapter, we will make it possible to download ‘big data’ without endangering our own server.

To be able to do this we are going to use our favorite database, PostgreSQL, and nodejs with the help of two packages, pg and pg-copy-streams.

The main problem of downloading or processing all this data is the size of them. So to avoid having to load all the table in memory at the same time, as it could be pretty big, we use nodejs streams.

Node.js Streams have tons of benefits, for example they have a low memory footprint, they are consumed and processed in buffered chunks and they do not block the thread among others.

Now let’s start with the interesting part.

First of all, imagine that we have a table with millions of rows, called super_big_table. Our user will want to filter and download it. The best way to get fast output from PostgreSQL is via the COPY statement, but it has one problem. COPY statements do not allow parameters. One solution is to create a temporal table, and insert the desired data.

CREATE TEMPORARY TABLE temp_csv_table AS
  SELECT
    t.id, t.value
  FROM
    super_big_table t
  WHERE
    ${customFilters}

Then we just need to execute the COPY query inside pg-copy-streams. Code will be something like this:

const copyTo = require('pg-copy-streams').to
const pg = require('pg')
const client = new pg.Client()

await client.connect()
const q = `COPY temp_csv_table to STDOUT with csv DELIMITER ';'`
const dataStream = client.query(copyTo(q))

dataStream.on('error', async function (err) {
  // Here we can controll stream errors
  await client.end()
})
dataStream.on('end', async function () {
  await client.end()
})

Now imagine that you need to include some headers to this stream. We could use the Transform stream. With this we could modify all the data, or like in this case, just adding a row at the beginning.

We can create our own Tranformer stream by extending Transform and creating our own class.

const {Transform} = require('stream')

class PrefixedStream extends Transform {
  constructor (prefixRow) {
    super()
    this.prefixRow = prefixRow
    this.isFirstChunk = true
  }

  _transform (chunk, encoding, callback) {
    if (this.isFirstChunk) {
      this.isFirstChunk = false
      this.push(this.prefixRow)
      this.push('\n')
    }
    this.push(chunk)
    callback()
  }
}

And just piping both streams.

const csvHeaders = ['Big table id', 'My value']
const csvWithHeadersStream = new streamUtils.PrefixedStream(csvHeaders)
dataStream.pipe(csvWithHeadersStream)

Finally we just need to pipe again this stream, but this time with the response.

csvWithHeadersStream.pipe(res)

So finally, to take advantage of it, we will need to make it possible to the user to download it.

We will need a little bit of code on the client side, thanks to file-saver.

const {saveAs} = require('file-saver')

const blob = new Blob([response.data], {type: 'application/octet-stream'})
saveAs(blob, 'super_big_table_filtered.csv')

So as we just saw is super easy to download custom data in CSV, and make happy BI customers!

 

Now you aren’t afraid anymore about Streams, enjoy it’s powerfulness!

Jose Luis Pillado “Fofi” – Lead Software Engineer