Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix buttons and labels #50

Merged
merged 7 commits into from
Aug 16, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
config_centillion.py
config_flask.py
vp
credentials.json
Expand Down
96 changes: 85 additions & 11 deletions Readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,10 @@

one centillion is 3.03 log-times better than a googol.

![Screen shot of centillion](docs/images/ss.png)
![Screen shot: centillion search](docs/images/search.png)


## what is it
## What Is It

Centillion (https://github.com/dcppc/centillion) is a search engine that can index
three kinds of collections: Google Documents (.docx files), Github issues, and Markdown files in
Expand All @@ -25,14 +25,43 @@ defined in `centillion.py`.

The centillion keeps it simple.

## authentication layer
## Authentication Layer

Centillion lives behind a Github authentication layer, implemented with
[flask-dance](https://github.com/singingwolfboy/flask-dance). When you first
visit the site it will ask you to authenticate with Github so that it can
verify you have permission to access the site.

## technologies
![Screen shot: centillion authentication](docs/images/auth.png)

## Master List

There is a master list of all content indexed by centilion at the master list page,
<https://search.nihdatacommons.us/master_list>.

A master list for each type of document indexed by the search engine is displayed
in a table:

![Screen shot: centillion master list](docs/images/master_list.png)

The metadata shown in these tables can be filtered and sorted:

![Screen shot: centillion master list with sorting](docs/images/master_list2.png)

## Control Panel

There's also a control panel at <https://search.nihdatacommons.us/control_panel>
that allows you to rebuild the search index from scratch. The search index
stores versions/contents of files locally, so re-indexing involves going out and
asking each API for new versions of a file/document/web page. When you re-index
the main search index, it will ask every API for new versions of every document.
You can also update only specific types of documents in the search index.

![Screen shot: centillion control panel](docs/images/control_panel.png)



## Technologies

Centillion is a Python program built using whoosh (search engine library). It
indexes the full text of docx files in Google Documents, just the filenames for
Expand All @@ -41,16 +70,61 @@ results are grouped by issue. Centillion requires Google Drive and Github OAuth
apps. Once you provide credentials to Flask you're all set to go.


## control panel
## Configuration

There's also a control panel at <https://search.nihdatacommons.us/control_panel>
that allows you to rebuild the search index from scratch (the Google Drive indexing
takes a while).
You will need to configure both the centillion search index and the flask app.

The centillion search index is configured with `config_centillion.py`; this file
sets the names of repositories to crawl when indxing issues and files.

The flask app is configured with `config_flask.py`. This file contains sensitive
information and is in the `.gitignore` file. This file contains API credentials
for Github and Groups.io.

Exampls are provided in `config_centillion.example.py` and `config_flask.example.py`.


## Authentication

The search engine will need to connect to several APIs when it re-indexes the
search index:

* Github
* Groups.io
* Google Drive

### Github

Github API credentials (both an OAuth token for the centillion app's Github
authentication mechanism, and a personal access token for accessing repositories
during the re-indexing process) are provided in `config_flask.py`.

### Groups.io

The Groups.io API token is used to index email threads. This token is provided in
`config_flask.py`.

### Google Drive

The Google Drive API credentials are provided in a file, `credentials.json`. This is
the file that is generated when the OAuth process is complete.

When you enable the Google Drive API in the Google Cloud Console, you will be provided
with a file `client_secrets.json`. To authenticate centillion with Google Drive, you should
download this file, and run the Google Drive utility directly:

```
python gdrive_util.py
```

![Screen shot of centillion control panel](docs/images/cp.png)
This will initiate the authentication procedure. Sign in as a user that has access to
the documents you want to index, and _only_ the documents you want to index (it is useful
to set up a bot account for this purpose).

Once you log in as that user, it will create `credentials.json`, and the Google Drive
re-indexing procedure should not have any problems autheticating using that file.

## quickstart (with Github auth)
## Quickstart (With Github Auth)

Start by creating a Github OAuth application.
Get the public and private application key
Expand Down Expand Up @@ -85,7 +159,7 @@ This will start a Flask server, and you can view the minimal search engine
interface in your browser at `http://<ip>:5000`.


## troubleshooting
## Troubleshooting

If you are having problems with your callback URL being treated
as HTTP by Github, even though there is an HTTPS address, and
Expand Down
7 changes: 0 additions & 7 deletions config_centillion.py → config_centillion.example.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,13 +23,6 @@
"dcppc/design-guidelines",
"dcppc/2018-may-workshop",
"dcppc/centillion"
],
"github_ignore_files_re" : [
'^\.*',
'^_*'
],
"github_ignore_dirs_re" : [
'^_*'
]
}

Binary file added docs/images/auth.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/control_panel.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed docs/images/cp.png
Binary file not shown.
Binary file added docs/images/master_list.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/master_list2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/search.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed docs/images/ss.png
Binary file not shown.
112 changes: 99 additions & 13 deletions static/centillion_master_list.js
Original file line number Diff line number Diff line change
Expand Up @@ -110,12 +110,29 @@ function load_gdoc_table(){
r[++j] = '</td></tr>';
}
r[++j] = '</tbody>'
$('#gdocs-master-list').html(r.join(''));
$('#gdocs-master-list').DataTable({

// Construct names of id tags
var doctype = 'gdocs';
var idlabel = '#' + doctype + '-master-list';
var filtlabel = idlabel + '_filter';

// Initialize the DataTable
$(idlabel).html(r.join(''));
$(idlabel).DataTable({
responsive: true,
lengthMenu: [50,100,250,500]
});
initGdocTable = true;

// Get the search filter section and search box
var searchsec = $(filtlabel).find('label');
var searchbox = searchsec.find('input');

// Replace search filter section text,
// then re-add the removed search box
searchsec.text('Search Metadata: ');
searchsec.append(searchbox);

initGdocTable = true
});
console.log('Finished loading Google Drive master list');
}
Expand Down Expand Up @@ -160,11 +177,28 @@ function load_issue_table(){
r[++j] = '</td></tr>';
}
r[++j] = '</tbody>'
$('#issues-master-list').html(r.join(''));
$('#issues-master-list').DataTable({

// Construct names of id tags
var doctype = 'issues';
var idlabel = '#' + doctype + '-master-list';
var filtlabel = idlabel + '_filter';

// Initialize the DataTable
$(idlabel).html(r.join(''));
$(idlabel).DataTable({
responsive: true,
lengthMenu: [50,100,250,500]
});

// Get the search filter section and search box
var searchsec = $(filtlabel).find('label');
var searchbox = searchsec.find('input');

// Replace search filter section text,
// then re-add the removed search box
searchsec.text('Search Metadata: ');
searchsec.append(searchbox);

initIssuesTable = true;
});
console.log('Finished loading Github issues master list');
Expand Down Expand Up @@ -206,11 +240,28 @@ function load_ghfile_table(){
r[++j] = '</td></tr>';
}
r[++j] = '</tbody>'
$('#ghfiles-master-list').html(r.join(''));
$('#ghfiles-master-list').DataTable({

// Construct names of id tags
var doctype = 'ghfiles';
var idlabel = '#' + doctype + '-master-list';
var filtlabel = idlabel + '_filter';

// Initialize the DataTable
$(idlabel).html(r.join(''));
$(idlabel).DataTable({
responsive: true,
lengthMenu: [50,100,250,500]
});

// Get the search filter section and search box
var searchsec = $(filtlabel).find('label');
var searchbox = searchsec.find('input');

// Replace search filter section text,
// then re-add the removed search box
searchsec.text('Search Metadata: ');
searchsec.append(searchbox);

initGhfilesTable = true;
});
console.log('Finished loading Github file list');
Expand All @@ -234,7 +285,7 @@ function load_markdown_table(){
r[++j] = '<thead>'
r[++j] = '<tr class="header-row">';
r[++j] = '<th width="70%">Markdown File Name</th>';
r[++j] = '<th width="30%">Repo</th>';
r[++j] = '<th width="30%">Repository</th>';
r[++j] = '</tr>';
r[++j] = '</thead>'
r[++j] = '<tbody>'
Expand All @@ -250,11 +301,28 @@ function load_markdown_table(){
r[++j] = '</td></tr>';
}
r[++j] = '</tbody>'
$('#markdown-master-list').html(r.join(''));
$('#markdown-master-list').DataTable({

// Construct names of id tags
var doctype = 'markdown';
var idlabel = '#' + doctype + '-master-list';
var filtlabel = idlabel + '_filter';

// Initialize the DataTable
$(idlabel).html(r.join(''));
$(idlabel).DataTable({
responsive: true,
lengthMenu: [50,100,250,500]
});

// Get the search filter section and search box
var searchsec = $(filtlabel).find('label');
var searchbox = searchsec.find('input');

// Replace search filter section text,
// then re-add the removed search box
searchsec.text('Search Metadata: ');
searchsec.append(searchbox);

initMarkdownTable = true;
});
console.log('Finished loading Markdown list');
Expand Down Expand Up @@ -293,14 +361,32 @@ function load_emailthreads_table(){
r[++j] = '</td></tr>';
}
r[++j] = '</tbody>'
$('#emailthreads-master-list').html(r.join(''));
$('#emailthreads-master-list').DataTable({

// Construct names of id tags
var doctype = 'emailthreads';
var idlabel = '#' + doctype + '-master-list';
var filtlabel = idlabel + '_filter';

// Initialize the DataTable
$(idlabel).html(r.join(''));
$(idlabel).DataTable({
responsive: true,
lengthMenu: [50,100,250,500]
});
initEmailthreadsTable = true

// Get the search filter section and search box
var searchsec = $(filtlabel).find('label');
var searchbox = searchsec.find('input');

// Replace search filter section text,
// then re-add the removed search box
searchsec.text('Search Metadata: ');
searchsec.append(searchbox);

initEmailthreadsTable = true;
});
console.log('Finished loading Groups.io email threads list');
}
}
}

4 changes: 4 additions & 0 deletions static/style.css
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
.btn-reindex-type, .btn-reindex-all {
width: 350px;
}

#github-button {
display:inline-block;
font-size: 20px;
Expand Down
Loading