A-Z Data Engineering

 Data Engineering (A-Z)


Airflow — Programmable DAG based job scheduler, a very popular Apache project


BigQuery — Google’s serverless datawarehose that competes with Redshift and Azure DW


Cassandra — distributed NoSQL database popular for columnar storage capabilities


Databricks — a web-based platform for working with Spark and more


ETL — extract from source, transform, load into destination


Flink — distributed processing engine for data streams


Glue — large scale serverless ETL, data pipelining solution from AWS


Hadoop — big data processing framework comprising of MapReduce, YARN and HDFS


InfluxDB —very popular timeseries database


JSON —the de facto data transportation format over the internet


Kafka — LinkedIn’s distributed streaming framework


Looker — Google’s latest browser based BI tool


MongoDB — very popular open source NoSQL database


NoSQL — group of database technologies which are more than relational databases


Oozie — DAG based workflow scheduler for Hadoop jobs


PostgreSQL — programmers’ favorite open-source database


Query Engine — the piece of software that executes queries against a dataset


Redshift — most popular managed, petabyte scale data warehousing solution


SQL — the language that data speaks


Terraform — the de facto Infrastructure-as-code product by HashiCorp


Unstructured Data — data without a schema or a pre-defined structure


View —unpersisted database object represented by a SQL query


Wrangling — cleaning the data, making it ready for analysis


Xplenty — integrations platform to extract data out of various cloud apps and move data


YARN — resource manager for Hadoop ecosystem (used for MapReduce and Spark)


Zookeeper — centralized configuration management service


Do you know them all? What would you change?


Yorum Gönder