You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
 
Ádler Neves 13a2a8fce8 two sonar analysis 2 years ago
application Merge commits 2 years ago
corpusslayer environment configurator patch 2 years ago
html removing troublesome symlink 2 years ago
locale server load monitoring 2 years ago
media Merge commits 2 years ago
plugins update 2 years ago
server_deploy_config new server config 2 years ago
staticSource folder rename 2 years ago
templates server load monitoring 2 years ago
view update 2 years ago
.gitignore add code quality scanner 2 years ago
.gitlab-ci.yml two sonar analysis 2 years ago
DISCLAIMER.md Merge commits 2 years ago
LICENSE Merge commits 2 years ago
Makefile DOCUMENTATION: update: deploy instructions 2 years ago
README.md DOCUMENTATION: update: deploy instructions 2 years ago
__init__.py Merge commits 2 years ago
manage.py Merge commits 2 years ago
requirements.txt Merge commits 2 years ago
sonar-project.properties exclude static content and migrations from check 2 years ago
uwsgi.ini root index 2 years ago

README.md

Corpus Slayer

This is a modular and multilingual corpus processing tool built on top of DJango and Python 3.

This tool doesn’t aim to be good for all purposes right out of the box, but to be extensible enough to receive a plug-in that satisfies your cravings.

You could say that this is a collection of ad-hoc command-line tools glued together with Python and JSON, and put together in an event-based architecture that produces web pages as result.

An usage scenario would be an university that offers such platform for its researchers to investigate the Literature for different construction patterns in many authors, to build better voice command devices that recognizes the user intention better than in past iterations, to build better speech-to-text converters that are shipped in smartphones that adds punctuation automatically, among many other possibilities that a better understanding of the language we use can bring.

How to run

First run sudo make apt-deps to download dependencies from distribution’s repository into your system.

Then run sudo make depends to download required python modules from PyPI repository into your system.

Then run make all to make migrations to the database and download extra data for the plug-ins.

Finally run make serve. You may now be able to access the application through the port 14548.

License

Trying to give people as much freedom to do whatever they want to the code, the license chosen was the MIT.

Notice that the MIT license only applies to the base platform and plug-ins received as is. When running make all, make build or make deploy-cd, it’s expected that you will download MXPOST (proprietary license), TreeTagger (proprietary license), Unitex/GramLab (GNU LGPL v2.1) and Mac-Morpho (CC-BY-4.0); where some of those are incompatible with MIT license and may impose restrictions on how you will use or redistribute the platform - you are welcomed to contribute by writing a plug-in to replace those proprietary parts.

Translations

The software comes preloaded with two translations: Brazilian Portuguese and American English.

Adding a language

Run python3 manage.py makemessages -l LL_CC, where LL_CC is your locale name according DJango’s documentation.

Editing language strings

  • Visit the /rosetta endpoint in your browser
  • Click a language
  • Start translating

PS: This is how you edit the content of the pages “Help”, “Privacy” and “Terms”.

Syncing with whole project

After you edit a template, it’ll be required that you re-sync language strings from templates

  • Run python3 manage.py makemessages -a
  • Visit the /rosetta endpoint in your browser
  • Translate new strings

Server

The recommended configuration is NGINX reverse-proxying a uWSGI server powered by Python 3, this last one kept alive by systemd.

SystemD

Just copy the file server_deploy_config/corpusslayer.service into /etc/systemd/system and adapt it to suit your needs.

Points worth your attention:

  • platform absolute path (default: /var/www/corpusslayer)

uWSGI

Just run make serve and the web server will be available in the port 14548. Check how to automate this command at server startup in the topic immediately above.

NGINX

Just copy the file server_deploy_config/corpusslayer-com-http.conf into /etc/nginx/sites-available and adapt it to suit your needs.

Points worth your attention:

  • ACME snippet for successfully acquiring X.509 certificates from CertBot (default: /etc/nginx/snippets/acme.conf)
  • TLS and GZIP snippet (default: /etc/nginx/snippets/tlsgzip.conf)
  • Proxied server location (default: the.corpusslayer.com:14548)
  • Static files location (default: /var/www/corpusslayer/static)
  • Media files location (default: /var/www/corpusslayer/media)
  • Server name (default: the.corpusslayer.com)

Apache

It’s known that Apache Web Server (2.4.18) with mod-wsgi-py3 (4.3.0) on its default configuration only handles ASCII. There’s a fix in DJango’s documentation (“Fixing UnicodeEncodeError for file uploads”), but we chose uWSGI because it works out of the box without any additional configuration.

The server configuration is up to you.

Python 2.x

TL;DR: Won’t run.

Python 2 wasn’t targeted during the development. This is because Python 3 series is said to be the present and future of the Python language by its official wiki.

Some actions

Changing site name

Edit file at secrets/SITE.txt and restart the WSGI server
Default: Corpus Slayer

Changing site domain

Edit file at secrets/SITE.tld and restart the WSGI server
Default: the.corpusslayer.com

Logging everyone out

Delete file secrets/SECRET_KEY.bin and restart the WSGI server

Deleting all users and all associated data

Delete file db.sqlite3, run make init and then restart the WSGI server