medved/README.md
2018-07-28 15:21:15 +03:00

3.7 KiB

medved

alpha beta whatever partly works

roadmap

refactor README and lib/plugin/plugins/* cleanup *

probably Listener shoul be part of Core in order to supervise everything

tasks don't store results currently. need to implement some validation of performed tasks (at least in Executor thread before spreading new tasks) http://python-rq.org/docs/results/

configuration

data/config.yaml

# lists top-level services (like listeners, executors, data-manager) and pipelines enabled
core:
  services: # should point to correct service
  - data_manager
  - rq_executor
  pipelines:
  - ftp

# describes top-level services and their configurations
services:
  data_manager:
    # REQUIRED package name, just like path to file with dots
    package: lib.data.Manager
    # REQUIRED class inherited from Service (lib.Service)
    service: DataManager
    # REQUIRED used to select one of storages
    data:
      id: pool
    # there can be more service-specific configuration fields
    # for now they can be found in code :)
    sources:
    - random_ip
    feeds: 
    - test_telegram
  rq_executor:
    package: lib.exeq.Executor
    service: RQExecutor
    data:
      id: pool
    redis:
      host: "127.0.0.1"

# describes datasources for data_manager
sources:
  random_ip:
    package: lib.plugin.base.lib.IP
    service: RandomIP
    data:
      id: random_ip

# describes datafeeds for data_manager
feeds:
  test_telegram: # doesn't work yet, eh
    package: lib.plugin.base.lib.Telegram
    service: TelegramFeed
    data:
      id: pool
    token: 
    chats: 
      - id: good_evening
        pipelines: [ftp, gopher]
        filter:
          clause: any-of
          equal: 
          - ftp_list_files_status: success
          - gopher_collect_status: success

# describes various storages, e.g. data pool for pipelines or queues for datasources 
storage:
  pool:
    # REQUIRED
    package: lib.plugin.base.lib.Mongo
    service: MongoStorage
    size: 40960
    # service-specific
    db: "medved"
    coll: 'pool'
  random_ip:
    package: lib.plugin.base.lib.Mongo
    service: MongoStorage
    size: 500
    db: "medved"
    coll: 'randomipsource'

# describes available pipelines
pipelines:
  ftp:
    # list of steps with dependencies
    steps: 
    # will pass 10 items to lib.plugin.iscan.tasks.common.scan
    - name: scan
      package: lib.plugin.iscan.tasks.common
      service: scan
      multiple: 10 # default: False
      requires: []
    # will pass 1 item marked with ftp_scan to lib.plugin.iscan.tasks.ftp.connect
    - name: connect
      package: lib.plugin.iscan.tasks.ftp
      service: connect
      requires:
      - ftp_scan
    - name: list_files
      package: lib.plugin.iscan.tasks.ftp
      service: list_files
      requires:
      - ftp_connect

# various configurations for tasks
tasks:
  ftp_scan:
    ports:
    - 21
  ftp_connect:
    logins: data/ftp/logins.txt
    passwords: data/ftp/passwords.txt
    bruteforce: true
    timeout: 15
  ftp_list_files:

logging: 
  Storage: INFO 

probably it can be launched with docker, however I didn't test it yet

run make base && docker-compose up --build --scale worker=5

or simply python medved.py

you'll need working redis and mongodb for default configuration

top-level services

lib.data.Manager.DataManager

Orchestrates datasources and datafeeds - starts and stops them, also checks pool size. If it is too low - takes data from DS.

lib.exeq.Executor.RQExecutor

Should run pipelines described in configuration. Works via RedisQueue, so needs some Redis up and running Basically takes data from pool and submits it to workers. RQ workers should be launched separately (rqworker worker from code root)