Blog /what-is-data-brokering-in-node-js/

What is data brokering in Node.js?

Understand data brokering and different approaches in Node.js.

ByMario Kandut

Posted March 11, 2021

Updated March 01, 2022

5 min read

node javascript

This article is based on Node v16.14.0.

With Node.js data can be shared between services without tightly coupling applications together or rewriting them. For example, if you want to use a legacy API in a modern frontend application. Node.js is also a good choice for processing large amounts of data.

What is data brokering?

💰 The Pragmatic Programmer: journey to mastery. 💰 One of the best books in software development, sold over 200,000 times.

Data brokering saves us from having to rewrite the underlying systems or make them directly aware of each other. Processing data with Node.js is done in many ways including, but not only:

A proxy API (back end for front end)
An ETL pipeline (Extract, Transform, Load)
Message queues (like RabbitMQ)

But why do we need data brokering in the first place? The problem is that technology moves fast, but the systems in the applications can't change, or don't change or even adapt so rapidly. Besides, the ever moving part of technology, business needs can also change. When a business needs change, the requirements also change, and we can't always know what the requirements will be in the future. Often data is locked away (legacy API, 3rd party system) or siloed off, which makes it hard to access from a new frontend application. Eventually the business' needs change, and relationships between systems need to adapt.

This introduces a new challenge: How do we share data across our applications, without tightly coupling our applications together or rewriting them?

This challenge can be solved with data brokering. The data brokering software sits between applications and facilitates the transfer of data. This "glue" (piece of software ) can be focused entirely on talking to the specific systems it is concerned with, and nothing more.

This approach is called data brokering or moving data between disparate systems. The goal of data brokering is to expose data from one system to another, without those systems having to know anything about each other. A common restriction in legacy APIs and third party services is that the data source can't be controlled or modified. Separating concerns between systems and keeping coupling low is generally a useful approach. Data brokering helps us create decoupled applications and services.

Why Node.js for data brokering?

The Node.js ecosystem benefits from the NPM packages for interacting with different data sources. There are packages for nearly any data source available. This speeds up the development with reducing the overall amount of code that needs to be written and reduces the time-to-market cycle inherent in building software.

The event-driven design of Node.js also makes it a good choice for applications, which need to sit and wait for interaction. Node.js consumes very few resources and can handle a large amount of concurrent connections on a single thread. This event-driven model also works well in a serverless context.

Node.js also has a low learning curve, since a lot of developers have some experience in JavaScript.

Examples of data brokering

Data brokering with Node.js is done in many ways including, but not only:

A proxy API (back end for front end)
An ETL pipeline (Extract, Transform, Load)
Message queues (like RabbitMQ)

Proxy API

A proxy API is a thin/lightweight API server, which translates requests and responses between another API and an API consumer. It creates an abstraction layer over another service (middleware).

The proxy API serves as a friendly interface for a consumer. Here, a consumer is any client or application which wants to request data, and the underlying API service is the source of the data that the proxy API communicates with to fulfill the request. A proxy API connects a known service with any number of known or unknown consumer clients. It mediates between a service and its consumers.

A proxy API translates requests from the consumer into requests that the underlying service can understand. It restructures data received from underlying services into the format expected by consumers. This makes it easy to expose a stable and easy-to-work-with API.

ETL (Extract, Transform, Load) pipeline

The ETL pipeline is another approach to data brokering. ETL stands for extract, transform, and load. This is a common approach in moving data from one location to another, while transforming the structure of the data before it is loaded from its source into its destination.

ETL is a process with three separate steps and often called a pipeline, because data moves through these three steps.

Extract data source from wherever it is (DB, API, ...).
Transform or process the data in some way. This could be restructuring, renaming, removing invalid or unnecessary data, adding new values, or any other type of data processing.
Load the data into its final destination (DB, flat file, ...).

ETL pipeline are typically run as batch jobs.

Major differences between a proxy API and an ETL pipeline is that both ends of the ETL pipeline are known, and ETl are run as a batch. So all the data is moved at once, as opposed to the proxy API approach, where an interface for consuming data is exposed. With a proxy API, you don't necessarily know which clients are going to consume the data, and you let the consumer make requests as needed.

Use Case for ETL: Analytics Data

Aggregate data to use for analytics
Extract the raw data from database
Clean, validate, and aggregate the data in transform stage.
Load the transformed data into the destination

An ETL pipeline is a practical choice for migrating large amounts of data, like converting hundreds of gigabytes of data stored in flat files into a new format, or compute new data based on those hundred of gigabytes.

Message Queue

A message queue stores messages (data) in sequential order sent to it until a consumer is ready to retrieve the message from the queue. A message can be any piece of data.

Message queues work with producers and consumers:

Producers add messages to the queue.
Consumers take messages off the queue (one at a time or multiple at once).

The producer can send messages to the queue and not worry about hearing back if the message was processed or not. The message will be in the queue waiting to be processed, when the consumer is ready to receive it. This decoupled relationship between communication and processing makes message queues an asynchronous communication protocol.

Message queues are highly resilient and scalable. If at any point messages are coming in faster than the consumers can handle, none of the messages are actually lost. They will sit in the queue in the order they were received until either more consumers can be spun up to handle the increased load, or a consumer becomes available to deal with the message. Hence, message queues are also to some degree fault tolerant. A common example for a message queue would be delivering webhooks.

Message queues are great, when dealing with high volumes of realtime events, like processing payments, tracking page views, etc. Any scenario where two systems need to communicate and persistence, resiliency, or batching are highly important, a message queue could be the right solution.

TL;DR

Data brokering helps us connect different parts of applications, while keeping them from directly relying on one another (loosely coupled).
Data brokering approaches with Node.js are these three, but not only: a proxy API, an ETL pipeline, a message queue.
A proxy API sits between an underlying API and the consumer requesting data. The underlying API is known, but the consumer doesn't have to be known ahead of time.
An ETL pipeline takes data from one source, processes it, and then loads it into its final destination. Both ends of an ETL pipeline should be known: you know how to access the source of the data, and you know where it is going to end up.
A message queue allows multiple systems to communicate asynchronously, by sending messages to a persistent queue to then be processed whenever a consumer is ready. A queue doesn't need to know anything about the producer adding messages to the queue, or the consumer processing messages from the queue.

Thanks for reading and if you have any questions, use the comment function or send me a message @mariokandut.

If you want to know more about Node, have a look at these Node Tutorials.

References (and Big thanks):

Node.js, HeyNode, RabbitMQ, OsioLabs

Newsletter Signup

Never miss an article.

Latest NODE articles:

3 min read

September 22, 2023

How to create a Webpack configuration

Most projects require a complex setup, use a configuration to handle this.

5 min read

September 19, 2023

Getting started with Webpack

Deep dive into webpack and what problems it can solve for you

6 min read

September 18, 2023

How to list/debug npm packages?

Different versions can cause version conflicts.