Skip to content

Commit

Permalink
Stop indexing from DataSpace (#746)
Browse files Browse the repository at this point in the history
* Stop indexing from DataSpace

* Fix typo

* Increase solr_writer thread pool
As suggested by an error message on the server

* Remove change to solr_writer.thread_pool
If we need it, it should be in a separate PR
  • Loading branch information
bess authored Jan 29, 2025
1 parent 1fbe77c commit 390d761
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 20 deletions.
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# pdc_discovery


A discovery portal for Princeton research data. Initially it will provide a better browsing experience for the research data contained in [DataSpace](https://dataspace.princeton.edu).
A discovery portal for Princeton research data.

Please note: While this is open-source software, we would disourage anyone from trying to just check it out and run it. Princeton specifics, from styling to authentication and authorization, are hard coded, and we have not invested any time in the kind of configurabily that would be needed for use at another institution. Instead it should be taken as an example of breaking a monolithic project into separate components, and developing iteratively in response to local user feedback.

Expand Down Expand Up @@ -59,9 +59,9 @@ We utilize Rubocop for our Ryby code and Prettier for our JavaScript

To create a tagged release use the [steps in the RDSS handbook](https://github.com/pulibrary/rdss-handbook/blob/main/release_process.md)

## Indexing research data from DataSpace and PDC Describe
## Indexing research data from PDC Describe

PDC Discovery indexes data from both DataSpace and from PDC Describe via the following rake task:
PDC Discovery indexes data from PDC Describe via the following rake task:

```ruby
rake index:research_data
Expand Down
17 changes: 0 additions & 17 deletions lib/tasks/index.rake
Original file line number Diff line number Diff line change
Expand Up @@ -9,22 +9,12 @@ namespace :index do

Rails.logger.info "Indexing: Fetching PDC Describe records"
Rake::Task['index:pdc_describe_research_data'].invoke
Rails.logger.info "Indexing: Fetching DataSpace records"
Rake::Task['index:dspace_research_data'].invoke
Rails.logger.info "Indexing: Fetching completed"

Indexing::SolrCloudHelper.update_solr_alias!
Rails.logger.info "Indexing: Updated Solr to read from the new collection: #{Indexing::SolrCloudHelper.alias_url} -> #{Indexing::SolrCloudHelper.collection_reader_url}"
end

desc 'Index all DSpace research data collections'
task dspace_research_data: :environment do
Rails.logger.info "Indexing: Harvesting and indexing DataSpace research data collections started"
DspaceResearchDataHarvester.harvest(false)
Indexing::SolrCloudHelper.collection_writer_commit!
Rails.logger.info "Indexing: Harvesting and indexing DataSpace research data collections completed"
end

desc 'Index all PDC Describe data'
task pdc_describe_research_data: :environment do
Rails.logger.info "Indexing: Harvesting and indexing PDC Describe data started"
Expand All @@ -40,13 +30,6 @@ namespace :index do
Blacklight.default_index.connection.commit
end

desc 'Fetches the most recent community information from DataSpace and saves it to a file.'
task cache_dataspace_communities: :environment do
cache_file = ENV['COMMUNITIES_FILE'] || './spec/fixtures/files/dataspace_communities.json'
communities = DataspaceCommunities.new
File.write(cache_file, JSON.pretty_generate(communities.tree))
end

desc 'Prints to console the current Solr URLs and how they are configured'
task print_solr_urls: :environment do
puts "Solr alias.: #{Indexing::SolrCloudHelper.alias_url}"
Expand Down

0 comments on commit 390d761

Please sign in to comment.