We are looking for a Sr Data Engineer to join one of our E-commerce clients in their Dumbo office in Brooklyn.
The members of the Data Engineering department build infrastructure for collecting, storing, and analyzing huge sets of data in batch and streaming pipelines. They also use this infrastructure to create and support high-quality datasets. The work of this team powers the rest of the company and enables new product development, machine learning and personalization, marketing campaigns, and financial analysis.
- Build highly-performant systems that are maintainable and easy to understand by selecting and integrating with the best of current technologies.
- Team is responsible for developing and monitoring the company's batch and streaming environments and improving or fixing them over time.
- Write ETL code and advise other teams on how to improve theirs.
- Build a lot of APIs and libraries in Java, Scala, or Python.
- Responsible for the quality and consistent availability of core business data.
- You are willing to work with and improve code you did not originally write.
Experience & Skills Required
- This role will require someone who has had experience working with or building platforms at scale data processing and collaborating with teams that use these platforms. This person will need to have experience building applications, and using one of the major cloud providers is a bonus, but not required. The team is primarily writing in Java, Scala, SQL, and use tools like Hadoop, Kafka, Airflow, Avro/Thrift, and GCP comparable tools like Dataproc, Dataflow, and BQ.
- You are generous with your time and experience, and can mentor other engineers.
- Can take on unconstrained problems and know when to seek help.
- Understand the advantages and limitations of distributed systems
- Ability to use or maintain batch data processing environments like Hadoop or Dataproc, and stream processing systems like Kafka Streams, Spark, or Dataflow
- Experience writing and scheduling ETL pipelines
- Experience writing SQL queries for exploration and analysis
- Ability to Integrate data from multiple sources