Database Installation & Description
PostgreSQL 9.5 is used as the RDBMS of the system. Below you will find the description of how to create the Database for the application along with the description of the most important relations of the database.
Deployment
Install PostgreSQL
Follow any guide, for example the official one for Ubuntu, or follow these steps:
- Create the file
/etc/apt/sources.list.d/pgdg.list
, and add a line for the repositorydeb http://apt.postgresql.org/pub/repos/apt/ trusty-pgdg main
- Import the repository running
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | \ sudo apt-key add -
sudo apt-get update
- Install postgresql
sudo apt-get install postgresql-9.4
Now, make the database accesible:
- Change to postgres user
sudo su - postgres
- Change these config files using
vi
or any other editorvi /etc/postgresql/9.5/main/postgresql.conf
Change this linelisten_addresses = '*'
vi /etc/postgresql/9.5/main/pg_hba.conf
Change this linehost all all 0.0.0.0/0 md5
- Reload configuration and restart
psql
SELECT pg_reload_conf();
\q
service postgresql restart
Create a user and the database:
psql
create user <user> with password '<password>';
create database <database> with owner <user> encoding 'UTF8';
Database Schema Definition
Within the GitHub repository, the necessary scriptsthat defines the different relations and views of the applicaiton can be found on DbSchema\DB_Schema_and_Tables.sql
.
Description
Below you will find the description of the main relations in the database
Schema Training
This schema is used to store the information of the training and testing sets used for calculating for checking the accuracy of the application:
training.articles_authors
: Contains the information of the signatures used for the training and testing sets. It links to complete information of the articles and signatures located in thesource
tables.training.v_authors_distance
: View that agregates the information needed for the training/testing sets of the application.training.v_training_focus_names
: View that contains the focus names used for the training set.training.v_testing_focus_names
: View that contains the focus names used for the testing set.training.v_authors_distance_training
: View that contains the aggregated information for the training set, using the viewstraining.v_authors_distance
andtraining.v_training_focus_names
.training.v_authors_distance_testing
: View that contains the aggregated information for the testing set, using the viewstraining.v_authors_distance
andtraining.v_testing_focus_names
.
Schema Source
This schema contains the information of the source data of the articles, preprocessed and organized according the schema of the application:
source.articles
: Table that contains the information of the articles present in the system. This is an important table as it contains theid
of the articles (referenced by most of the other relations) and theprocessid
for processing information from the web application.source.signatures
: Table that contains the information of the signature of an author present in an specific article. A signature is identified by the combination of theid
of the article and the position of the signature in the articled
.source.keywords
: Table that contains the different keywords for every article.source.subjects
: Table that contains the different subjects which each article belongs to.source.references
: Table that contains the different journals referenced by each article.
Schema Distances
This schema contains the tables to temporaly store the calculated distances of the different features of the application:
distances.title
: Table that contains the distances between the titles of a pair of articles.distances.subject
: Table that contains the distances between the subjects of a pair of articles.distances.refs
: Table that contains the distances between the subjects referenced by a pair of articles.distances.keywords
: Table that contains the distances between the keywords of a pair of articles.distances.coauthor
: Table that contains the distances between the coauthors of a pair of articles.distances.ethnicity
: Table that contains the distances between the posible ethnicities of a pair of signatures.
Schema Main
This schema contains the main information of the application:
main.articles_authors
: Table that contains the information of the signatures and the link to the articles. This also includesthe focus_name
of the signature.main.lda_topic
: Table that contains the LDA Topics of each article in the application.main.last_name_ethnicities
: Table that contains the possible ethnicities for each last name in the application.main.info_for_distances
: Table that aggregates the information needed to calculate the distances between two signatures in the application.main.v_articles_distance
: View that aggregates the different distances between a pair of articles.main.v_authors_distance
: View that aggregates the different distances between a pair of signatures.main.same_authors
: Table that contains the result of calculating if a pair of authors are the same or not.main.authors_disambiguated
: Table that contains the result of disambiguating and clustering the authors of the application.