Data Engineering (A-Z)
Airflow — Programmable DAG based job scheduler, a very popular Apache project
BigQuery — Google’s serverless datawarehose that competes with Redshift and Azure DW
Cassandra — distributed NoSQL database popular for columnar storage capabilities
Databricks — a web-based platform for working with Spark and more
ETL — extract from source, transform, load into destination
Flink — distributed processing engine for data streams
Glue — large scale serverless ETL, data pipelining solution from AWS
Hadoop — big data processing framework comprising of MapReduce, YARN and HDFS
InfluxDB —very popular timeseries database
JSON —the de facto data transportation format over the internet
Kafka — LinkedIn’s distributed streaming framework
Looker — Google’s latest browser based BI tool
MongoDB — very popular open source NoSQL database
NoSQL — group of database technologies which are more than relational databases
Oozie — DAG based workflow scheduler for Hadoop jobs
PostgreSQL — programmers’ favorite open-source database
Query Engine — the piece of software that executes queries against a dataset
Redshift — most popular managed, petabyte scale data warehousing solution
SQL — the language that data speaks
Terraform — the de facto Infrastructure-as-code product by HashiCorp
Unstructured Data — data without a schema or a pre-defined structure
View —unpersisted database object represented by a SQL query
Wrangling — cleaning the data, making it ready for analysis
Xplenty — integrations platform to extract data out of various cloud apps and move data
YARN — resource manager for Hadoop ecosystem (used for MapReduce and Spark)
Zookeeper — centralized configuration management service
Do you know them all? What would you change?

Yorum Gönder