Apache Spark™ has transformed the way we handle big data. This open-source, distributed computation framework has a huge number of applications and, alongside its core data-processing engine, it includes libraries for SQL, graph computation, machine learning and much more. By processing data in-memory, it is faster than its predecessor, Map Reduce and capable of managing petabytes of data across physical or virtual servers. We asked Senior Software Engineer, Paul Campbell, to share his experience of Spark and the challenges it can solve for organisations around the globe.
Paul, can you start by telling us about your role at Neueda?
Of course. I’ve worked at Neueda for over three years. I started in a development role and I now work as a Senior Software Engineer. My role is varied, but it includes analysing large volumes of data – namely our client’s communications for conduct – then enriching this data for processing by their modelling team and compliance analysts.
Why have we seen such a big move towards Spark?
The simple answer is that the global volume of data has exploded. Normal data processing methods can’t cope – they are no longer viable for providing the type of support companies require. Every big player in the market is now moving towards this new type of data handling.
What is a modern problem that Spark can solve?
Spark is excellent for analysis of real-time streams of data. For example, Amazon uses Spark to analyse every single click you make on their site and then leverages this information to personalise their prices and recommendations. This is Spark Streaming and you are seeing these frameworks in practice every day across sites like Facebook and Amazon – even if you don’t know it!
What Spark projects are you involved in right now?
I am part of a team of 10 developers working in this area and we’re involved in multiple projects. One project involved helping a Global tier 1 investment bank improve performance when using a traditional application architecture to process and analyse a large volume of daily communications for conduct and compliance. The volume and processing time meant that even on large scale, specialised hardware, processing backlogs would still form, and in many cases, these were difficult to clear.
By rearchitecting the system to use Spark, our client was freed from the confines of a single server. Not only could the processing be spread concurrently across multiple servers, the number and size of servers could be scaled dynamically to meet peaks in demand.
This has allowed the client to achieve their goal of analysing more communications from more sources in a shorter time. The solution has proven to be scalable and adaptable to client requirements and business needs.
What is it like to work at Neueda?
It’s a great team here and the perfect environment to get hands-on experience. It’s the place to be for new tech in Belfast! Now is a great time to join us and I would recommend it to anyone looking to grow (or start) their digital career.