Blog
In many domains, transfer of static and batch files is an important part of systems integrations and a large number of applications produce and expect data in the form of files rather than network-based APIs. In this article, we shall see how Zato makes multi-protocol integrations of this kind possible in a way that is secure, scalable and easy to extend in Python.
File transfer is often found in scenarios such as:
There are many more scenarios but what they all have in common is that in each such data flow there may be a single producer and multiple recipients of information, each of them preferring to use different technologies.
For instance, a single file saved to a Windows share may go to a REST endpoint, an IBM MQ or AMQP queue, an AWS S3 bucket and to e-mail recipients, each of which may be an independent application or business entity and each may use a different data format, e.g. what is CSV initially may need to be transformed to JSON, XML and PDF, depending on the receiver.
Let's see how Zato helps in implementation of processes of this kind.
The core Zato concept in file integrations is that of file transfer channels - reusable definitions pointing to producers and consumers of data along with accompanying details, such as a schedule, file patterns to use, what encoding the data is in and more.
Zato will connect to the data source automatically and handle each new or modified file according to the channel's configuration, i.e. by parsing and distributing it to data recipients, allowing one to apply any transformations or enrichment as required, and optionally deleting the data file from the source location afterwards.
In the Zato Dashboard, a file transfer channel may be defined as below.
Let's consider the following sample definition. In this particular case:
The data source is an SFTP connection definition called CRM
New or changed files are processed according to a schedule configured for a job called Transfer Invoices Job
Multiple input directories and glob file patterns can be used
Among of data recipients are REST outgoing connections - any file picked up from the source location will be sent to a REST endpoint directly as-is, without any programming needed
A file can be sent to [a service]/en/docs/3.3/intro/esb-soa.html) - and the service may transform or enrich the file as needed, e.g. it may change its data format, invoke more APIs to obtain additional data, combine it all and send the result to one or more recipients
A file can also be delivered to a publish/subscribe topic - again, with multiple recipients and data transformations taking place before it is delivered to its final destination(s)
Note the flexibility and reusability - the same data source, schedule and recipients can be configured to take part in multiple file integration processes without influencing each other.
In fact, the same services and endpoints may independently participate in both file and online API integrations.
To showcase how file transfer works in practice, let's implement an example process:
We need to configure a few Zato objects for the new file transfer to use, check below for details of how to do it:
Simply fill out a form such as the ones below, click OK and you are done. No server restarts are needed.
Because so many new integrations are based around REST, recipients of this kind are treated specially by file transfer channels in that there is no programming required for files to be transferred to REST endpoints.
When you create a file transfer channel, simply assign one or more REST endpoints to it and Zato will send all the files to them as soon as they are picked up from the data source.
As to other recipient types, we will use the Python-based service below.
Let's quickly implement a service that will transform and send the files to AMQP and SMTP recipients.
The service will receive its input in an attribute called self.request.raw_request - we will transform it to the required data formats and send to both AMQP and SMTP.
# -*- coding: utf-8 -*-
# stdlib
import io
# For Excel files
from openpyxl import Workbook
from openpyxl.writer.excel import save_workbook
# Zato
from zato.common.api import SMTPMessage
from zato.common.json_ import dumps
from zato.server.service import Service
# ##############################################################################
class HandleNewOrders(Service):
name = 'api.file-transfer.handle-new-orders'
def handle(self):
# All of our input data, including metadata assigned by Zato
request = self.request.raw_request
# Full path, including directory and file name
full_path = request['full_path']
# Channel the file was received through
channel_name = request['channel_name']
# Data received, already parsed by our channel
data = request['data']
# Log info that we are ready
self.logger.info('Processing file `%s` from `%s`', full_path, channel_name)
# Send to the CRM using AMQP first ..
#self.send_orders_to_crm(data)
# Send an email now ..
self.notify_new_orders(data)
# We are done!
# ##############################################################################
def send_orders_to_crm(self, data):
""" Converts input orders to JSON and sends them to the CRM using AMQP.
"""
# A list of orders to be sent as JSON
orders = []
# Process each input CSV row ..
for row in data:
# .. extract information from each line ..
order_no = row[0]
customer_id = row[1]
order_amount = row[2]
# .. create a dict representing each order ..
order = {
'order_no': order_no,
'customer_id': customer_id,
'order_amount': order_amount,
}
# .. append the dict to already existing ones.
orders.append(order)
# Convert the list of orders into a JSON string
json_data = dumps(orders)
# Details of the AMQP connection to use ..
conn_name = 'My AMQP Connection'
exchange = '/orders.new'
routing_key = 'zato'
# .. and now, we can use the already converted message using AMQP.
self.outgoing.amqp.send(json_data, conn_name, exchange, routing_key)
# ##############################################################################
def notify_new_orders(self, data):
# Create a new workbook ..
workbook = Workbook()
# .. create a new worksheet for orders ..
sheet = workbook.create_sheet('New orders')
# .. iterate over all CSV orders ..
for row_idx, row in enumerate(data, 1):
# .. now, iterate over all columns ..
for column_idx, value in enumerate(row, 1):
# .. create a new cell in the sheet ..
sheet.cell(column=column_idx, row=row_idx, value=value)
# .. our in-memory buffer for the Excel file ..
buffer = io.BytesIO()
# .. save the workbook to RAM ..
save_workbook(workbook, buffer)
# .. rewind to the beginning ..
buffer.seek(0)
# .. get the Excel file ..
excel_data = buffer.read()
# .. do not forget to close the buffer now that we do not need it.
buffer.close()
# We have the Excel file so we can send it as an e-mail attachment now.
# In practice, this would be kept in a config file,
# Redis, SQL or another config source.
email_to = 'hello@example.com'
email_cc = ['my.cc.1@example.com', 'my.cc.2@example.com']
email_from = 'zato@example.com'
email_body = '<h1>New orders</h1><br/>New orders are in the attachment.'
file_name = 'new_orders.xlsx'
# Construct a new message ..
message = SMTPMessage()
# .. we send an HTML file ..
message.is_html = True
# .. assign details ..
message.to = email_to
message.cc = email_cc
message.from_ = email_from
message.body = email_body
# .. attach the file ..
message.attach(file_name, excel_data)
# .. obtain a connection to the SMTP server ..
conn = self.email.smtp.get('My SMTP Connection').conn
# .. and finally, send the message across.
conn.send(message)
# ############################################################################
Now, we can create a new file transfer channel. After clicking OK, it will start to work.
If there is anything that needs changing, click Edit in web-admin and enter new information, e.g. new directories to look for files in.
If anything changes in any of the pieces of configuration that the channel uses, such as AMQP, REST or anything else, the channel will reconfigure automatically, without a need for a manual update.
Here we are, using Zato, we have just created a multi-protocol file transfer hub, integrating SFTP, AMQP and e-mail.
New protocols and recipients can be added at any time without disrupting existing file integrations - we can easily extend the service to transform and push the input files to SQL, Cassandra and Redis, for instance.
And if new data sources are needed, just create new channels, start them in addition to the existing ones and they will begin to work immediately. It is just that easy!