Go to file
Adler Neves 13a2a8fce8 two sonar analysis 2018-05-03 19:41:34 -03:00
application Merge commits 2017-12-07 18:31:49 -02:00
corpusslayer environment configurator patch 2018-02-12 21:44:29 -02:00
html removing troublesome symlink 2018-05-03 19:34:18 -03:00
locale server load monitoring 2017-12-08 14:08:49 -02:00
media Merge commits 2017-12-07 18:31:49 -02:00
plugins update 2018-02-12 20:22:36 -02:00
server_deploy_config new server config 2018-04-26 13:13:51 -03:00
staticSource folder rename 2017-12-07 19:51:21 -02:00
templates server load monitoring 2017-12-08 14:08:49 -02:00
view update 2018-02-12 20:22:36 -02:00
.gitignore add code quality scanner 2018-05-03 18:54:39 -03:00
.gitlab-ci.yml two sonar analysis 2018-05-03 19:41:34 -03:00
DISCLAIMER.md Merge commits 2017-12-07 18:31:49 -02:00
LICENSE Merge commits 2017-12-07 18:31:49 -02:00
Makefile DOCUMENTATION: update: deploy instructions 2018-05-03 16:22:59 -03:00
README.md DOCUMENTATION: update: deploy instructions 2018-05-03 16:22:59 -03:00
__init__.py Merge commits 2017-12-07 18:31:49 -02:00
manage.py Merge commits 2017-12-07 18:31:49 -02:00
requirements.txt Merge commits 2017-12-07 18:31:49 -02:00
sonar-project.properties exclude static content and migrations from check 2018-05-03 19:05:45 -03:00
uwsgi.ini root index 2017-12-07 20:09:03 -02:00


Corpus Slayer

This is a modular and multilingual corpus processing tool built on top of DJango and Python 3.

This tool doesn't aim to be good for all purposes right out of the box, but to be extensible enough to receive a plug-in that satisfies your cravings.

You could say that this is a collection of ad-hoc command-line tools glued together with Python and JSON, and put together in an event-based architecture that produces web pages as result.

An usage scenario would be an university that offers such platform for its researchers to investigate the Literature for different construction patterns in many authors, to build better voice command devices that recognizes the user intention better than in past iterations, to build better speech-to-text converters that are shipped in smartphones that adds punctuation automatically, among many other possibilities that a better understanding of the language we use can bring.

How to run

First run sudo make apt-deps to download dependencies from distribution's repository into your system.

Then run sudo make depends to download required python modules from PyPI repository into your system.

Then run make all to make migrations to the database and download extra data for the plug-ins.

Finally run make serve. You may now be able to access the application through the port 14548.


Trying to give people as much freedom to do whatever they want to the code, the license chosen was the MIT.

Notice that the MIT license only applies to the base platform and plug-ins received as is. When running make all, make build or make deploy-cd, it's expected that you will download MXPOST (proprietary license), TreeTagger (proprietary license), Unitex/GramLab (GNU LGPL v2.1) and Mac-Morpho (CC-BY-4.0); where some of those are incompatible with MIT license and may impose restrictions on how you will use or redistribute the platform - you are welcomed to contribute by writing a plug-in to replace those proprietary parts.


The software comes preloaded with two translations: Brazilian Portuguese and American English.

Adding a language

Run python3 manage.py makemessages -l LL_CC, where LL_CC is your locale name according DJango's documentation.

Editing language strings

  • Visit the /rosetta endpoint in your browser
  • Click a language
  • Start translating

PS: This is how you edit the content of the pages “Help”, “Privacy” and “Terms”.

Syncing with whole project

After you edit a template, it'll be required that you re-sync language strings from templates

  • Run python3 manage.py makemessages -a
  • Visit the /rosetta endpoint in your browser
  • Translate new strings


The recommended configuration is NGINX reverse-proxying a uWSGI server powered by Python 3, this last one kept alive by systemd.


Just copy the file server_deploy_config/corpusslayer.service into /etc/systemd/system and adapt it to suit your needs.

Points worth your attention:

  • platform absolute path (default: /var/www/corpusslayer)


Just run make serve and the web server will be available in the port 14548. Check how to automate this command at server startup in the topic immediately above.


Just copy the file server_deploy_config/corpusslayer-com-http.conf into /etc/nginx/sites-available and adapt it to suit your needs.

Points worth your attention:

  • ACME snippet for successfully acquiring X.509 certificates from CertBot (default: /etc/nginx/snippets/acme.conf)
  • TLS and GZIP snippet (default: /etc/nginx/snippets/tlsgzip.conf)
  • Proxied server location (default: the.corpusslayer.com:14548)
  • Static files location (default: /var/www/corpusslayer/static)
  • Media files location (default: /var/www/corpusslayer/media)
  • Server name (default: the.corpusslayer.com)


It's known that Apache Web Server (2.4.18) with mod-wsgi-py3 (4.3.0) on its default configuration only handles ASCII. There's a fix in DJango's documentation ("Fixing UnicodeEncodeError for file uploads"), but we chose uWSGI because it works out of the box without any additional configuration.

The server configuration is up to you.

Python 2.x

TL;DR: Won't run.

Python 2 wasn't targeted during the development. This is because Python 3 series is said to be the present and future of the Python language by its official wiki.

Some actions

Changing site name

Edit file at secrets/SITE.txt and restart the WSGI server
Default: Corpus Slayer

Changing site domain

Edit file at secrets/SITE.tld and restart the WSGI server
Default: the.corpusslayer.com

Logging everyone out

Delete file secrets/SECRET_KEY.bin and restart the WSGI server

Deleting all users and all associated data

Delete file db.sqlite3, run make init and then restart the WSGI server