Job Overview
- Date PostedJune 15, 2025
- Location
- Expiration dateJune 22, 2025
- Experience5 Years
- GenderBoth
- QualificationBachelor Degree
- Career LevelMid-Level / Officer
Job Description
JOB PURPOSE STATEMENT
The Data Engineering team is responsible for documenting data models, architecting distributed systems,
creating reliable data pipelines, combining data sources, architecting data stores, and collaborating with
the data science teams to build the right solutions for them.
They do this by using the Open Source Big Data platforms such as Apache NIFI, Kafka, Hadoop, Apache
Spark, Apache Hive, HBase, Druid and the Java programming language, while picking the right tool for
each purpose.
The growth of every product relies heavily on data, such as for scoring and for studying product behavior
that may be used for improvement, and it is the role of the data engineer to build a fast and horizontally
scalable architectures using modern tools that are not the traditional Business Intelligence systems as we
know them.
KEY RESPONSIBILITIES & PERCENTAGE (%) TIME SPENT
- Documenting Data Models: The role will be responsible for documenting the entire journey that data
elements take end-to-end, from the data sources to the all the data stores, including all the
transformations in between, and maintaining those documents up to date with every change. (10%). - Architecting Distributed Systems: Modern data engineering platforms are distributed systems. The data
engineer designs the right architecture for each solution, while utilizing best-of-breed Open Source
tools in the big data ecosystem because there is no one solution that does everything; the tools are
specialized and are made lean and fit for purpose. The architecture should be one that can process
any data, Any Time, Any Where, Any Workload. (10%). - Combining Data Sources: pulling data from different sources, which could be structured, semistructured or unstructured data using tools such as Apache NIFI and taking the data through a journey
that will create a final state that is useful to the data consumers. These sources can be REST, JDBC,
Twitter, JMS, Images, PDF, MS Word and put the data into a staging environment such as Kafka topics
for onward processing. (10%). - Developing Data Pipelines: creating data pipelines that will transform data using tools such as
Apache Spark and the Java programming language. The pipelines may apply processing such as
machine learning, aggregation, iterative computation, and so on. (40%). - Architecting Data Stores: Designing and creating data stores using big data platforms such as
Hadoop, and the NoSQL databases such as HBase. (15%). - Data Query and Analysis: Utilizing tools such as Apache Hive to analyze data in the data stores to
generate business insights. (10%) - Team Leadership: Providing team leadership to the data engineers. (5%)
QUALIFICATION AND EXPERIENCE REQUIREMENTS
- A Bachelor’s degree in Computer Science
- Minimum 5 years’ experience developing object oriented applications using the Java programming language
- Certification and experience implementing best practice frameworks e.g. ITIL, PRINCE2, preferred
- Minimum 5 years’ experience working with relational databases
- Minimum 5 years’ experience working with the Linux operating system
- Experience with Open Source Big Data Platforms and tools (Hadoop, Kafka, Apache NIFI,
Apache Spark, Apache Hive, NoSQL databases) and ODI - Experience working with Data Warehouses
- Experience with DevOps, Agile working and CICD
- Familiarity with complex systems integrations using SOA tools (Oracle Weblogic/ESB/SOA)
- Familiarity with industry standard formats and protocols (JMS, SOAP, XML/XPath/XQuery, REST and
JSON) and data sources - Excellent analytical, problem solving and reporting skills
- A good knowledge of the systems and processes within Financial Services industry