Retrieve and add domains
Retrieve Domains DAG¶
The retrieve_domains_dag is responsible of retrieving domains associated with a specific school or group of schools from a Webserver and store them in Crawlserver MongoDB database.
Parameters¶
The DAG accepts two parameters:
- school_identifier (int): School ID for domain retrieval
- group_school_identifier (int): Group school ID for domain retrieval

Tasks¶
The dags has 3 PythonOperator tasks:

1. validate_params¶
- This task ensures that the input parameters are correctly set before proceeding with domain retrieval. It checks that either
school_identifierorgroupschool_identifieris provided but not both.
2. get_auth_token¶
- Retrieves an authentication token from the Webserver to access protected API endpoints and stores it in XCom to be used by the subsequent tasks.
3. retrieve_domains¶
- Fetches domains from the Webserver and stores them in the
domainMongoDB collection. It uses the access token obtained from theget_auth_tokentask to send a request to theget-domains-for-crawlserverAPI with the appropriate parameters school or group school identifier. The task also verify the retrieved domains exist already in the collection and store them otherwise.
Cleaning Xcom¶
- When the DAG execution is successfully completed, all XCom entries related to this DAG execution will be deleted