Understand data brokering and different approaches in Node.js.
By Mario Kandut
Europe’s developer-focused job platform
Let companies apply to you
Developer-focused, salary and tech stack upfront.
Just one profile, no job applications!
This article is based on Node v16.14.0.
With Node.js data can be shared between services without tightly coupling applications together or rewriting them. For example, if you want to use a legacy API in a modern frontend application. Node.js is also a good choice for processing large amounts of data.
💰 The Pragmatic Programmer: journey to mastery. 💰 One of the best books in software development, sold over 200,000 times.
Data brokering saves us from having to rewrite the underlying systems or make them directly aware of each other. Processing data with Node.js is done in many ways including, but not only:
But why do we need data brokering in the first place? The problem is that technology moves fast, but the systems in the applications can't change, or don't change or even adapt so rapidly. Besides, the ever moving part of technology, business needs can also change. When a business needs change, the requirements also change, and we can't always know what the requirements will be in the future. Often data is locked away (legacy API, 3rd party system) or siloed off, which makes it hard to access from a new frontend application. Eventually the business' needs change, and relationships between systems need to adapt.
This introduces a new challenge: How do we share data across our applications, without tightly coupling our applications together or rewriting them?
This challenge can be solved with data brokering. The data brokering software sits between applications and facilitates the transfer of data. This "glue" (piece of software ) can be focused entirely on talking to the specific systems it is concerned with, and nothing more.
This approach is called data brokering or moving data between disparate systems. The goal of data brokering is to expose data from one system to another, without those systems having to know anything about each other. A common restriction in legacy APIs and third party services is that the data source can't be controlled or modified. Separating concerns between systems and keeping coupling low is generally a useful approach. Data brokering helps us create decoupled applications and services.
The Node.js ecosystem benefits from the NPM packages for interacting with different data sources. There are packages for nearly any data source available. This speeds up the development with reducing the overall amount of code that needs to be written and reduces the time-to-market cycle inherent in building software.
The event-driven design of Node.js also makes it a good choice for applications, which need to sit and wait for interaction. Node.js consumes very few resources and can handle a large amount of concurrent connections on a single thread. This event-driven model also works well in a serverless context.
Data brokering with Node.js is done in many ways including, but not only:
A proxy API is a thin/lightweight API server, which translates requests and responses between another API and an API consumer. It creates an abstraction layer over another service (middleware).
The proxy API serves as a friendly interface for a consumer. Here, a consumer is any client or application which wants to request data, and the underlying API service is the source of the data that the proxy API communicates with to fulfill the request. A proxy API connects a known service with any number of known or unknown consumer clients. It mediates between a service and its consumers.
A proxy API translates requests from the consumer into requests that the underlying service can understand. It restructures data received from underlying services into the format expected by consumers. This makes it easy to expose a stable and easy-to-work-with API.
The ETL pipeline is another approach to data brokering. ETL stands for extract, transform, and load. This is a common approach in moving data from one location to another, while transforming the structure of the data before it is loaded from its source into its destination.
ETL is a process with three separate steps and often called a pipeline, because data moves through these three steps.
ETL pipeline are typically run as batch jobs.
Major differences between a proxy API and an ETL pipeline is that both ends of the ETL pipeline are known, and ETl are run as a batch. So all the data is moved at once, as opposed to the proxy API approach, where an interface for consuming data is exposed. With a proxy API, you don't necessarily know which clients are going to consume the data, and you let the consumer make requests as needed.
Use Case for ETL: Analytics Data
An ETL pipeline is a practical choice for migrating large amounts of data, like converting hundreds of gigabytes of data stored in flat files into a new format, or compute new data based on those hundred of gigabytes.
A message queue stores messages (data) in sequential order sent to it until a consumer is ready to retrieve the message from the queue. A message can be any piece of data.
Message queues work with producers and consumers:
The producer can send messages to the queue and not worry about hearing back if the message was processed or not. The message will be in the queue waiting to be processed, when the consumer is ready to receive it. This decoupled relationship between communication and processing makes message queues an asynchronous communication protocol.
Message queues are highly resilient and scalable. If at any point messages are coming in faster than the consumers can handle, none of the messages are actually lost. They will sit in the queue in the order they were received until either more consumers can be spun up to handle the increased load, or a consumer becomes available to deal with the message. Hence, message queues are also to some degree fault tolerant. A common example for a message queue would be delivering webhooks.
Message queues are great, when dealing with high volumes of realtime events, like processing payments, tracking page views, etc. Any scenario where two systems need to communicate and persistence, resiliency, or batching are highly important, a message queue could be the right solution.
Thanks for reading and if you have any questions, use the comment function or send me a message @mariokandut.
Never miss an article.