Skip to content

Airflow

Hr Airflow

Hr Airflow project is an Airflow application that schedule data collection pipeline for Netethic. It mainly schedule the collection of social media profiles including profile information, postes and list of followers following of a given profile id. It also collects data from workspace based plateforms such as Microsoft, Google, etc. This solution alos includes a recovery system (système de reprise) which manage collecting new data (delta data) produced since the latest run.

Features

  • Retrieves pending jobs for each social media platform
  • Retrieves pending jobs for domains per task
  • Manage the process of data collections
  • Update job state and execution date for take back system
  • Check if there is files that are not pushed in Minio
  • Check for blocked jobs and reset their state

Technology and Tools:

  • Docker
  • Airflow
  • MongoDB

Gitlab Branchs

Develop:

  • used for developement environment
  • The docker compose file to use is dev.yml

Master:

  • used to deploy the project in preprod environement
  • The docker compose file to use is docker-compose.yml

Usage

If you want to create your own branch you can use develop as a reference branch

1. Clone project

$ git clone https://gitlab.kaisens.fr/kaisensdata/apps/4inshield/back/airflow

$ git checkout $branch

2. Before starting the project

In the project root folder run the following command:

$ mkdir airflow/logs

$ chmod 777 -R airflow/logs

3. Add .env file:

  • Create a copy of .env_sample and name it .env
  • The location of the file should be in the root directory
  • Make sure to update the configurations in the .env concerning airflow, the mongo db connexion, the connexion with dauthenticator.

4. login to kaisens docker registry :

To build your project make sure you have access to the kaisens docker registry.

$ docker login $registry_server -u $username -p $password
Request for registry and login credentials, if needed.

5. Launch app :

In your project root directory run the following command:

$ docker-compose -f <name_docker_compose_file.yml> up -d --build

License:

This software is supplied to you by Kaisens data. Any person who copy or redistribute this software outside Kaisens data or attempts to do so could be sued for intellectual property theft and corporate rules violation.