Sr. Data Engineer / Lead Data Engineer
Location: Boston, MA
Job Type:
IQ Workforce is a leading recruiting firm for the global analytics & data science community.
Our client partner, CVS Health, is at the forefront of digital transformation in healthcare. “Pharmacy Personalization” is a major initiative to transform the customer experience within their retail pharmacies. The goal is to help patients maintain adherence with their medications – one of the factors that are most likely to affect their health outcomes.
CVS Health’s Data Engineering team is helping lead this personalization effort and has multiple openings at the Sr. Data Engineer / Lead Data Engineer level.
The Data Engineering team will lead the effort to transform healthcare and better outcomes for patients by leveraging advanced machine learning capabilities across our 10,000 retail pharmacies. You will be at the forefront of developing an unprecedented level of personalized pharmacy services for millions of customers every day and helping them on their path to better health by helping them fill and maintain their medications.
Using the latest in big data technology, you will be charged with:
Making analytics faster, more insightful, and more efficient by building, architecting and maintaining a next-generation Big Data Machine Learning framework. Rapidly develop prototypes and proof of concepts for the selected solutions, and implementing complex big data projects
Designing a highly scalable and extensible Big Data platform which enables industrializing collection, storage, modeling, and analysis of massive data sets from heterogeneous channels
Bringing a DevOps mindset to enable big data and batch/real-time analytical solutions that leverage emerging technologies
Developing prototypes and proof of concepts for the selected solutions, and implementing complex big data projects
Applying an analytic mindset to collecting, parsing, managing, and automating data feedback loops in support of business innovation
Developing and releasing ML pipelines into a production environment using Spark and Databricks (primary languages: Scala/Python and SQL). Enable the integration of ML pipelines and refine the processes and tools with existing CICD framework/processes for the Personalization Engine environment.
Qualifications:
Hands-on experience with “big data” platforms including Hadoop (preferably Azure or AWS) and Spark as well as experience with traditional RDBMS (eg, Teradata, Oracle)
Proficiency in “big data” technologies including Spark, Airflow, Kafka, Hbase, Pig, NoSQL databases, etc
Proficiency in the following programming languages: PySpark, Python, shell scripting, SQL (preferably Teradata and PL/SQL syntax) and Hive, Pig, Java, or Scala
Ability to design and build a framework to orchestrate data pipelines and Machine Learning models
Proficiency with tools to automate CI/CD pipelines (eg, Jenkins, GIT, Control-M)
Design and implement end-to-end solutions using Machine Learning, Optimization, and other advanced technologies, and own live deployments
Experience with frameworks for either Machine Learning or NLP (Scikit-Learn, SpaCy, Pytorch, Spark NLP)
Must be able to translate analytical problems into structured programs (in PySpark or Scala)