Job Details

Big Data Developer

By Independent Recruiter on June 28, 2019
Job Type: Full Time
Job Category: Information Technology
Contract Type: Permanent

Sandton, Gauteng, South Africa

Job Description

We are recruiting for an excellent opportunity and seeking a Big Data Developer with extensie Java experience Python as well as Hadoop.

Skills

Personal Attributes and Skills: • SQL – Advanced • Python – Advanced • Java – Advanced • Scala – Advanced • HIVE SQL – Advanced • MS Excel – Advanced • Data Mining – Advanced • Spark – Advanced • H2O – Advanced • Applying Expertise and Technology • Delivering Results and Meeting Customer Expectations • Creating and Innovating • Analysing • Coping with Pressure and Setbacks • Relating and Networking • Adapting and Responding to Change • Learning and Researching Education and Experience: • Systems Development Life Cycle (SDLC) – Intermediate • 3 – 5 years of experience in Java, Python and Linux scripting experience. • 3 years + experience in Hadoop • Data Security and Protection Policies – Intermediate • Big Data using Hortonworks (Hadoop)– Advanced • Data Warehouse principles and practises – Intermediate • Kimball Methodology – Intermediate • ETL development – Intermediate • Matric – Essential • National Diploma in IT (BTech) - Essential • Bachelor of Science (Information Systems, Computer Science, Mathematics) - Advantageous

Job Requirements

• Develop and implement machine learning models and solutions • Design and implement ETL methodologies and technologies and the integration with big data • Conduct root cause analysis on production issues • Technical leadership of entire information management process of both structured and unstructured data • Provide ongoing support and enhancement to ETL system • Configuration of the Hadoop infrastructure and environment for optimal performance • Integrate with statistical and actuarial analysts to build models • Producing relevant technical documentation and specifications • Estimate time and resource requirements for business requirement • Integration of big data solutions with existing reporting and analytical solutions • Develop User Defined Functions (UDF’s) – PIG, HIVE using Java • Assist statistical and actuarial analysts in developing distributed data science applications in Spark and H2O. • Setup and Maintain Python, R and Scala environments for use within Spark. Install and configure relevant packages (H2O, Lime, Keras, TensorFlow, Pandas, Scikit Learn, Numpy etc) • Troubleshoot and fix failures in the HDP and HDF environments (Spark, Zeppelin, Hive, Druid, Superset, HDFS, YARN, Zookeeper, Kafka, Nifi, SAM, Ranger etc) • Setup, configure, maintain and support sparkling water (H2O) clusters • Configuration of HDP and HDF environment for optimal performance • Linux Administration and DevOps. Final level of support for all HDP and HDF components • Build and support multi-tenant data science environments • Research the latest technologies and check the viability of integrating them into the big data and data science environments

Job Duties

• The Data Science Unit (DSU) – Business Intelligence is part of the company. We are responsible for building all information assets and delivering information to the business through various reporting channels from an executive level down to the call centre agent. The information assets that we are building is not only limited to report delivery, but also integrated information assets to support the data scientists, risk analysts and the data analytics teams. • Interpret requirements provided by business and produce effective Big Data solutions. Programming exposure for code transformations and integrating the big data solution with existing systems. Develop information solutions from a variety of sources for both structured and unstructured data. Technical ownership of Big Data solutions for structured and unstructured data.