Data Engineer, Python/Spark (PySpark) – Leading Insurance Firm

Contract Blockgram
Philadelphia, PA, US

Apply with A resume

Blockgram is actively recruiting a Data Engineer: Python / Spark (PySpark) on a 6-month minimum contract (with potential for extension and strong potential to full-time) with a strong background in computer programming, statistics, and data science who is eager to tackle problems with large, complex datasets using the latest Python, R, and/or PySpark. You are a self-starter who will take ownership of your projects and deliver high-quality data-driven analytics solutions. You are adept at solving diverse business problems by utilizing a variety of different tools, strategies, algorithms and programming languages.

The chosen candidate can work either from the Philadelphia, PA or Whitehouse, NJ offices!

Responsibilities
  • Utilize the data engineering skills within and outside of the developing information ecosystem for discovery, analytics and data management
  • Work with data science team to deploy Machine Learning Models
  • You will be using Data wrangling techniques converting one “raw” form into another including data visualization, data aggregation, training a statistical model etc.
  • Work with various relational and non-relational data sources with the target being Azure-based SQL Data Warehouse & Cosmos DB repositories
  • Clean, unify and organize messy and complex data sets for easy access and analysis
  • Create different levels of abstractions of data depending on analytics needs
  • Hands-on data preparation activities using the Azure technology stack
  • Implement discovery solutions for high-speed data ingestion
  • Work closely with the Data Science team to perform complex analytics and data preparation tasks
  • Work with the Sr. Data Engineers on the team to develop APIs
  • Sourcing data from multiple applications, profiling, cleansing and conforming to create master datasets for analytics use
  • Utilize state of the art methods for data manning especially unstructured data
  • Experience with Complex Data Parsing (Big Data Parser) and Natural Language Processing (NLP) Transforms on Azure a plus
  • Design solutions for managing highly complex business rules within the Azure ecosystem
  • Performance tune data loads

Skills Required
  • Mid to an advanced level knowledge of Python and Spark is an absolute must.
  • Knowledge of Azure, Hadoop 2.0 ecosystems, HDFS, MapReduce, Hive, Pig, sqoop, Mahout, Spark etc. a must
  • Experience with Web Scraping frameworks (Scrapy or Beautiful Soup or similar)
  • Extensive experience working with Data APIs (Working with RESTful endpoints and/or SOAP)
  • Significant programming experience (with above technologies as well as Java, R and Python on Linux) a must
  • Knowledge of any commercial distribution like HortonWorks, Cloudera, MapR etc. a must
  • Excellent working knowledge of relational databases, MySQL, Oracle etc.
  • Experience with Complex Data Parsing (Big Data Parser) a must. Should have worked on XML, JSON and other custom Complex Data Parsing formats
  • Natural Language Processing (NLP) skills with experience in Apache Solr, Python a plus
  • Knowledge of High-Speed Data Ingestion, Real-Time Data Collection, and Streaming is a plus

This will be a 6-month minimum contract with competitive pay. We also have a very generous referral program in place in case you know someone who is qualified that may be interested, thank you!