In today’s interconnected world, integrating diverse data sources is crucial for any organisation aiming to stay competitive. However, not all systems come equipped with APIs or easily accessible data extraction methods. This is where leveraging your inbox as an integration protocol becomes a powerful tool. By using emails as a conduit, you can efficiently extract and manage data, even from systems that lack direct integration capabilities. With the advent of large language models (LLMs), unstructured data in emails can be easily organised and utilised, making it easier than ever to incorporate this data into your broader data strategy.
We’re iviva. Our platform lets you integrate all your bits together and get everything into one platform. If you can see it on your screen, we can retrieve it for you. In this post we look at how to deal with data sources that are not easy to reach.
Summary
Not everything has an API or machine-readable way to extract information.
However, following Zawinski’s law, everything can do email.
We’re going to explain a simple but powerful approach to data integration with systems that don’t have an easy integration layer by using your inbox as a way to get data out.
It’s easy today with LLMs – they allow unstructured inputs to be easily organised.
Let’s dive in!
You can easily get data out of your building system.
BACnet, Modbus, OPC, Mqtt – these are all standard well defined protocols and we all know how to deal with them.
You can easily get data out of most of your cloud hosted software as well.
They all have APIs.
They are reasonably documented for some definition of ‘reasonably’.
They are probably just REST APIs. You may have some Graph-based APIs and sometimes if you dust off the cobwebs, you can see SOAP and other ancient protocols of yore.
That makes it easy to build a connector.
However, you also have access to and deal with a lot of data that has neither a protocol to exchange information or any kind of usable RESTful API.
Examples?
Data trapped in walled gardens
Ironically, many consumer technology services that can technically provide an API don’t. They might have one but only publish it for businesses who will throw large amounts of money at them. Or they need to drive engagement and an API doesn’t let them do it.
If you want to track your spend on Uber rides and food, you can’t. There’s no API unless you are some kind of business entity that needs the information badly enough to pay for it.
LinkedIn doesn’t give you an API to your job postings so you can’t have a centralised workflow for recruitment.
Your flight tickets are not in an API and you don’t even want to try and ask your airline to provide it.
But there’s other kinds of data as well.
Data trapped in old technology
Fun fact: Most buildings don’t have smart metres for water. They can collect energy data easily but water data? Not always.
How do you get your water consumption data into a dashboard or ESG report?
Yes – that’s right – you manually type it into an excel sheet.
What about waste generation?
Yes – you can have a sophisticated weighing device on which you put your dumpster can but few people actually have those (and ‘few’ is very close to ‘0’ as a first approximation)
What do they all have in common? They can all send email!
Everyone can do email
Your waste company probably sends you a monthly report.
You may get your water bill online through email as well.
Uber receipts are emailed to you as are receipts and notifications on pretty much everything else you do. Flight tickets. Loyalty programs. Job postings.
Pretty much everything.
Despite so many products trying to kill email, it can never go away – it’s the lowest common denominator in tech. Everyone can do email.
So if you have data in your inbox, you should be able to connect to it and get it out. Right?
A few hurdles
There are a few roadblocks on this path.
- Is it safe?
- Everyone sends emails in a different way and format
Let’s tackle these.
Is it safe?
In general, you probably don’t want any program having unfettered access to all your email – especially if the whole point of the program is to take data out and send it somewhere else.
There’s an easy solution for this – forwarding rules.
Setup a simple forwarding rule:
The selected emails you need to process can have a forwarding rule applied to it so all emails pertaining to flights from Cathay Pacific can be forward to a new address.
You can create forwarding rules directly in Gmail or Outlook.
Everyone sends these things out in different ways.
This has historically been a problem.
Every system you want to work with has a different way and structure in how they send you email.
Some send data embedded in the body. Some send a pdf. Some send a screenshot of a document as an image. Some clever enterprising systems take that screenshot and embed it in a pdf and password-protect it and send it. Bless them all.
Previously you had two options to deal with this:
- You would have had to write a separate parser/connector to understand every type of email you receive – it would be time consuming and brittle and when some intern at the company adds an extra bit of bold font or formatting in the email – then it looks slightly different and breaks your system.
- You would have to be someone like Google who can have a team of data scientists apply a cluster of compute power to run AI models to extract that information automatically.
But today we are living in a world where large language transformer models exist and they have changed everything.
You can now instruct an LLM with a single prompt to extract useful information for you.
And if you have the right tooling, you can route images, pdfs and email text to an LLM and have it extract exactly what you need and in the format you need it.
From there, its easy to then route that data into any platform for storage.
How iviva does it
iviva’s integration component – Lucy – can do this out of the box.
You can set up a workflow that has an inbox and email address attached to it.
The workflow can read the incoming email, extract text and attachments, send them to a pdf converter if required, and then route it to the GPT4 or Anthropic connector to extract exactly what you need.
You can build a working solution in less than an hour.
It can even work with those annoying password-protected pdfs.
This means virtually anything you get in your inbox can now show up on your dashboard and analytics!