Flask Meetup Data Scraper Project Documentation

Table of Contents:

Intro

Fulltext Meetup.com Search engine, download & index every meetup Group and the Group Events in a region every 28 days.

With the fulltext meetup search is it possible to search in every discription and any other field. So you know in wich places you done a talk.

Problem

Problem: In wich meetups did my team a talk?

Because it is not possible to create a fulltext search request on the official meetup API, so a way to solve the issue is to download relevant meetup groups and index them into a elasticsearch.

Solution

Solution: Download every relevant group from meetup and index them into elasticsearch!

The Dataflow concept is that the API Server, wich is written in Python with the Flask webframework, download every 28 days all relevant meetup groups with there events and index them into elasticsearch. The search user use an angular app to communicate with the API server. For an easy deployment the angular app has it’s own NGINX based docker container and the traefik container route every http & https traffik to the angular container expect PUT request. Put request are routet to the API server. Also traefik secure the communication with the enduser via SSL and used to handle basic auth request for the frontend & backend!

DataFlow

DataFlow

Getting started

Note

These instructions assume familiarity with Docker and Docker Compose.

Development & Production Version

The Project comes with 2 different Docker-Compose files wich are for development local.yml and production production.yml.

The development version start the website in debug mode and bind the local path ./ to the flask docker contaiers path /app.

For the production version, the docker container is build with the code inside of the container. Also the production version use redis as caching backend.

Quick install (Development Version)

Build the docker container.

$ docker-compose -f local.yml build

Migrate all models into Elasticsearch

$ docker-compose -f local.yml run flask flask migrate_models

Load the Meetup Sandbox Group with all events.

$ docker-compose -f local.yml run flask flask get_group --sandbox True

Start the website.

$ docker-compose -f local.yml up

Now the server is listen on http://localhost:5000 for any REST API requests.

Quick install (Production Version)

Settings

At first create the directory ./.envs/.production

$ mkdir ./.envs/.production`

For flask container create a file ./.envs/.production/.flask wich should look like:

To fill the section # Meetup.com OAuth you need an API account from Meetup.com. When setting up the meetup oauth account add https://you-domain.com/callback as your callback url & with https://you-domain.com/login you can log in with your meetup account.

To read how to handle a boundingbox in the section # Meetup.com groups region go to load_zip_codes.

For Elasticsearch container create a file ./.envs/.production/.elasticsearch wich should look like below. For further information how to setup Elasticsearch with enviroment vars got to https://www.elastic.co/guide/en/elasticsearch/reference/current/settings.html

Add Users

Frontend & backend has the same endpoint for user authentification. Both use Basic Auth from traefik. To add a user, use htpasswd and store the user data into compose/production/traefik/basic-auth-usersfile. Example use in Linux:

$ sudo apt install apache2-utils # install htpasswd
$ htpasswd -c compose/production/traefik/basic-auth-usersfile username

Setup

Build the docker container.

$ docker-compose -f production.yml build

Create the elasticsearch indexes.

$ docker-compose -f production.yml run flask flask migrate_models

Load Meetuup zip codes for a country.

$ docker-compose -f production.yml run flask flask load_zip_codes 47.2701114 55.099161 5.8663153 15.0418087 # germany
$ docker-compose -f production.yml run flask flask load_zip_codes 45.817995 47.8084648 5.9559113 10.4922941 # switzerland
$ docker-compose -f production.yml run flask flask load_zip_codes 46.3722761 49.0205305 9.5307487 17.160776 # austria

Load Meetuup zip codes for a country.

$ docker-compose -f production.yml run flask flask load_zip_codes 47.2701114 55.099161 5.8663153 15.0418087 # germany
$ docker-compose -f production.yml run flask flask load_zip_codes 45.817995 47.8084648 5.9559113 10.4922941 # switzerland
$ docker-compose -f production.yml run flask flask load_zip_codes 46.3722761 49.0205305 9.5307487 17.160776 # austria

Start the website.

$ docker-compose -f production.yml up -d

Conjob

Add a cronjob to run every week. So that every 4 weeks the elasticsearch index should be resetet. If you want a another periode change the 4 with your periode time. But don’t use a persiod over 30 days! It is forbidden by meetup.com!!:

0   3       *       *       6       docker-compose -f production.yml run flask flask reset_index --reset_periode 4

Description what does this command do, is under reset_index.

Usage Guide

Note

The usage guide is written for the development instance, to use the commands for prodcution change -f local.yml to -f production.yml for every command!

CLI

Help

For showing a general help for all commands run:

$ docker-compose -f local.yml run flask flask

To display a helptext for a single command, add to the command --help like:

$ docker-compose -f local.yml run flask flask shell --help

migrate_models

To use elasticsearch search, the index for the groups & zip codes musst be set. To do this run:

$ docker-compose -f local.yml run flask flask migrate_models

Load single group from Meetup.com

To download a single group from Meetup.com and index them into elasticsearch use get_group. For downloading a group use the urlname. When you are on a Meetup group page like https://www.meetup.com/de-DE/Meetup-API-Testing/, the urlname is in the browser URL field after the language-code block.

In the example for Meetup-API-Testing group, the language-code is de-DE and the urlname is Meetup-API-Testing.

Meetup group urlname screenshot

Meetup group urlname screenshot

Also when you search for new groups in the meetup.com rest api, the urlname has it extra field for request a group entpoint.

So to download & index the Meetup-API-Testing group use:

$ docker-compose -f local.yml run flask flask get_group Meetup-API-Testing

The output should look like:

Starting flask-meetup-data-scraper_elasticsearch_1 ... done
Elasticsearch is available
Group Meetup API Testing Sandbox was updatet with 761 events

For testing purpose it’s possible to load the sandbox group with the param --sandbox True, than it’s not required to add a urlname.:

$ docker-compose -f local.yml run flask flask get_group --sandbox True

get_groups

Warning

The command get_groups is desiged for use only in development, for prodcution please use load_groups!

Load mutiple groups from JSON files. The JSON files muss include a key & a URL name. To download use the rest api direkt or via the meetup website https://secure.meetup.com/meetup_api/console/?path=/find/groups

An example Rest API request for the first 200 german groups -> https://api.meetup.com/find/groups?&sign=true&photo-host=public&country=DE&page=200&offset=0&only=urlname

After you downloaded the json, put them into ./meetup_data_scraper. When you download the JSON’s in a another directory set the path via --json_path /app/your-dir/. When you run the command in docker, you need to set the path inside the docker container.

$ docker-compose -f local.yml run flask flask get_groups

Example JSON file in ./compose/local/flask/meetup_groups/test-groups.json

{
    "0": {
        "urlname": "Meetup-API-Testing"
    },
    "1": {
        "urlname": "None"
    },
    "2": {
        "urlname": "connectedawareness-berlin"
    }
}

load_groups

Load all groups from every meetup.com zip code stored in elasticsearch use load_groups.

$ docker-compose -f local.yml run flask flask load_groups

load_groups accecpt 2 params, --load_events False will do not load all past events from a group (default is to load all past events) & --country DE will set to load only groups from a specific country by a country code. To get the meetup country codes check out meetup.com zip code.

$ docker-compose -f local.yml run flask flask load_groups --load_events False --country DE

load_zip_codes

For downloading all meetup.com zip code use load_zip_codes with arguments of a boundingbox. A boundingbox is a gelocation area, to find out a boundingbox of a location like germany use the rest api of https://nominatim.openstreetmap.org/. To create a search query for nominatim add after /search/ your query & after the query add ?format=json for a json output. The full search request for germany is https://nominatim.openstreetmap.org/search/germany?format=json

Output:

[
    {
        "place_id": 235495355,
        "licence": "Data © OpenStreetMap contributors, ODbL 1.0. https://osm.org/copyright",
        "osm_type": "relation",
        "osm_id": 51477,
        "boundingbox": [
            "47.2701114",
            "55.099161",
            "5.8663153",
            "15.0418087"
        ],
        "lat": "51.0834196",
        "lon": "10.4234469",
        "display_name": "Deutschland",
        "class": "boundary",
        "type": "administrative",
        "importance": 0.8896814891722549,
        "icon": "https://nominatim.openstreetmap.org/images/mapicons/poi_boundary_administrative.p.20.png"
    },
    ...
]

Now add the boundingbox to the load_zip_codes.

$ docker-compose -f local.yml run flask flask load_zip_codes 47.2701114 55.099161 5.8663153 15.0418087 # germany
$ docker-compose -f local.yml run flask flask load_zip_codes 45.817995 47.8084648 5.9559113 10.4922941 # switzerland
$ docker-compose -f local.yml run flask flask load_zip_codes 46.3722761 49.0205305 9.5307487 17.160776 # austria

update_groups

To get all new past events from all groups in the elasticsearch use:

$ docker-compose -f local.yml run flask flask update_groups

reset_index

The reset_index command should only use when you want to delete your complete elasticsearch index & reload all groups from meetup.com. This should use as a cronjob at least once every 30 days!

$ docker-compose -f local.yml run flask flask reset_index

As param it possible to set the warning time in secounds with --waring_time 30, the default time span is 30 secounds.

$ docker-compose -f local.yml run flask flask reset_index --waring_time 30

Also you can add a weekly periode with --reset_periode. If this param is use, the command check if the current week is modulo by the value.

As an example to execute the command on only every 4 weeks use:

$ docker-compose -f local.yml run flask flask reset_index --reset_periode 4

The command check how many weeks are gone since 1970 (unix time) and calc them modulo % by for. So for the date 2020-01-30 is in the 2613 week since 1970-01-01.:

2613 % 4 = 1

Since the rest of 2613 % 4 is not 0, the command will exit. Only when the rest is 0 the command will be execute!

Advanced topics

Changing Models

The elasticsearch models are stored in meetup_search/models and the tests are in tests/models. To edit the models read the Elasticsearch-DSL Docs.

Documentation

The docs are stored in ./docs and written with Sphinx. The recommend way to host sphinx docs are with readthedocs.org.

To compile the docs as HTML use:

$ docker-compose -f local.yml run flask make --directory docs html

The html output goes to docs/_build/html

IDE

This Project was created with Visual Studio Code and this section help you to setup your VS Code installation for this project.

Recommanded Extensions for VS-Code

Install Python

Please use the same Version of Python as it used in the Flask Dockerfiles! Right now it is Python 3.8.

Windows 10

Note

Change the command Python to py when following the instructions!

To enable Python 3 in your Windows 10 Power please follow the article on Digitalocean.com

Mac

On mac you can use brew

$ brew install python3

Linux

In moste linux systems python is installed and maintaind out of the box, you just need to check if you use the same version as in in Dockerfiles.

Install Python dependencies

Virtualenv

If you like, you can install every dependency in a specific folder via virtualenv. To create a virtualenv for the project dependencies.

Virtualenv when Python 3 is the default python interpreter.

$ virtualenv venv

When you want to select a different python version use the param -p

$ virtualenv venv -p python38

To use the virtualenv use the source command.

$ source venv/bin/activate

Install development packages

Windows
$ python -m pip install -r .\requirements\local.txt
Mac / Linux
$ pip install -r requirements/local.txt

Code Format

This use Black to format this code, in VS Code you can set on every save to format the code in black. You can add auto format in black on every save when you add follow settings in your settings.json

{
    "editor.formatOnSave": true,
    "python.formatting.provider": "black",
    "editor.codeActionsOnSave": {
        "source.organizeImports": true
    },
}

To install black use pip.

For Windows:

$ python -m pip install black

For Mac / Linux:

$ pip install black

To format the code from the terminal you can use the black cli. For example to format the whole project use.

$ black ./

Rest API Documentation

About

The REST API is based on Flask RESTful. To add or remove Endpoints modify in app.py the method create_app. Example how to add add suggestion & search:

# init flask api
api: Api = Api(app)
# add api endpoints
api.add_resource(MeetupSearchApi, "/")
api.add_resource(MeetupSearchSuggestApi, "/suggest/")

The code for the REST API is in meetup_search/rest_api/api.py and the tests are in tests/rest_api/test_api.py.

Also in tests/rest_api/utily.py are helper methods to tests the REST API!

Suggestion

PUT /suggest/

Return up to 5 suggestion based on group names.

Example, when send a PUT /suggest/ with follow data:

{
    'query': 'jam',
}

The output will be like:

{"suggestions": [
    "Jam-Session Berlin",
    "Jam Time Amsterdam",
    "Jammy @ ROMA",
]}

C/I

About

Flask Meetup Data Scraper use Github.com Marketplace Apps to maintain the project. Every App is for free for Open Source projects!

Code Style

Black

Black code style

Black is not integrated as a C/I, it’s just a python code auto formater for the project. So if you like to contribute your code use black by python black ./!

Tests

Travis

Travis CI tests

This project use for testing unit test, flask commands & Docker-Compose builds Travis

Travis config is .travis.yml

Documentation

Readthedocs.org

Documentation Status

Documentation is written in Sphinx in .rst file format. The sourcecode of the docs is in docs/

Travis config is .readthedocs.yml

Code Review

Codacy.com

Codacy quality Coverage

Codacy.com is an automated code analysis/quality tool. Codacy analyze only python for this project, also the coverage of the test are uploaded to Codacy.com via Travis.

DeepSource.io

DeepSource

DeepSource.io is like Codacy.com but it also analyze Dockerfiles.

DeepSource config is .deepsource.toml

Dependencies

Pyup.io

Updates Python 3

Pyup.io update Python packages once a week. It push every update to an extra banch & create a pull request.

Pyup config is .pyup.yml

Dependabot.com

Dependabot.com update Dockerfiles once a week. It push every update to an extra banch & create a pull request.

Dependabot config is .dependabot/config.yml

Elasticsearch Queries

The main Elasticsearch Query is written in meetup_search/rest_api/api.py and the tests are in tests/rest_api/test_api.py. This project use https://github.com/elastic/elasticsearch-dsl-py to handle Elasticsearch, if you want to modify the query, go to https://elasticsearch-dsl.readthedocs.io/en/latest/ for help.

To run the tests for the api run:

$ docker-compose -f local.yml run flask coverage run -m pytest -s tests/rest_api/test_api.py

To run a single test use:

$ docker-compose -f local.yml run flask coverage run -m pytest -s tests/rest_api/test_api.py::test_search_query

Frontend

The Frontend is written as an Angular app. The source code is in a extra git repo https://github.com/linuxluigi/meetup-data-scraper-angular

For developing the frontend it’s best to run the flask app in background. The develoing settings of the app try to make PUT request on http://localhost:5000 and the production site is designt to run on the same domain as the backend.

To run the frontend & backend on the same domain traefik is setup to handle every http & https request. The default setup is that every traffik goes to the angular app (NGINX server) and only http PUT request go to the flask backend app.

Frontend - Landingpage

The Landingpage from the angular frontend

Testing

Warning

Do not run Tests on Production Systems!!! Tests will destroy your Elasticsearch Index!!!

Pytest

The unit test are stored in the tests folder and written with the Pytest Framework.

Conftest

When executing a test like.

$ docker-compose -f local.yml run flask coverage run -m pytest

Pytest will at first go to the file conftest.py and execute the method pytest_runtest_setup bevor each test and after each test the method pytest_runtest_teardown will be executed.

Every method witch is an annotation @pytest.fixture can be used in any test method as a param, for deteaild explination go to https://docs.pytest.org/en/latest/fixture.html

Create new test

To create a new test just create a python file with the prefix test_ in the folder /tests. It also possible to bundle test in a python package, for that create a folder in /tests and add a empty file __init__.py in the new folder, so that python recognize the folder as a python package.

In the new test file with the prefix tests_ (example: /tests/test_user.py) add method also with the prefix test_. An example for the /tests/test_user.py test file would look like:

def test_user():
    assert get_user(username="Hugo").username == "Hugo"

Execute tests

To execute all tests run:

$ docker-compose -f local.yml run flask coverage run -m pytest

To run all tests in a specific module add the path like, in the example run all test in the path tests/api_client:

$ docker-compose -f local.yml run flask coverage run -m pytest tests/api_client

For running all test in a file, just add the full path of the file:

$ docker-compose -f local.yml run flask coverage run -m pytest tests/api_client/test_json_parser.py

To run a single method add 2 colons to the file path with a method name.:

$ docker-compose -f local.yml run flask coverage run -m pytest tests/api_client/test_json_parser.py::test_get_group_from_response

Note

If you add -s as param when executing a tests, you can see the terminal output from the test. docker-compose -f local.yml run flask coverage run -m pytest -s

Troubleshooting

This page contains some advice about errors and problems commonly encountered during the development of Meetup Data Scraper.

max virtual memory areas vm.max_map_count [65530] likely too low, increase to at least [262144]

When using docker on some machines, you will need to manually extend the max virtual memory. For CentOS & Ubuntu use:

$ sudo sysctl -w vm.max_map_count=262144

Or add it permanently use:

$ echo "vm.max_map_count=262144" | sudo tee -a /etc/sysctl.conf
$ sudo reboot

For more detils go to -> https://www.elastic.co/guide/en/elasticsearch/reference/current/vm-max-map-count.html

Build faild -> out of memory

Building need quit a lot of RAM, if container like elasticsearch run in background you can ran out of memory. So you need to stop all all containers.:

$ docker-compose -f production.yml stop

Error when starting container under Windows

ERROR: for flask-meetup-data-scraper_flask_1 Cannot start service flask: error while creating mount source path /host_mnt/c/Users/.../dev/flask-meetup-data-scraper: mkdir /host_mnt/c: file exists

When the error comes, in most cases is it enought to just delete all unused container.

Delte unused docker containers

Delte unused docker containers

FAQ

What are the minimum hardware requirements?

To host it with docker you will need at leat a vServer with 2GB RAM, 10GB disk space & 1 CPU.

How to set the domain for a production site?

Replace in compose\production\traefik\traefik.yml the entry meetup-data-scraper.de with your target domain.

Indices & Tables