|
| 1 | +# CLAUDE.md |
| 2 | + |
| 3 | +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. |
| 4 | + |
| 5 | +## Project Overview |
| 6 | + |
| 7 | +PyOrcid is a Python library and API client for interacting with the ORCID API. ORCID (Open Researcher and Contributor ID) provides unique identifiers to researchers. This library enables developers to access and manage ORCID profile data, including publications, employment, education, and other research activities. |
| 8 | + |
| 9 | +## Development Commands |
| 10 | + |
| 11 | +### Package Management |
| 12 | +The project uses Poetry for dependency management: |
| 13 | +```bash |
| 14 | +# Install dependencies |
| 15 | +poetry install |
| 16 | + |
| 17 | +# Add a dependency |
| 18 | +poetry add <package-name> |
| 19 | + |
| 20 | +# Add a dev dependency |
| 21 | +poetry add --group dev <package-name> |
| 22 | +``` |
| 23 | + |
| 24 | +### Testing |
| 25 | +```bash |
| 26 | +# Run all tests |
| 27 | +python -m unittest tests/test_orcid.py |
| 28 | + |
| 29 | +# Run tests with pytest (if installed) |
| 30 | +pytest tests/ |
| 31 | + |
| 32 | +# Run a single test |
| 33 | +python -m unittest tests.test_orcid.TestOrcid.test_access_token_valid |
| 34 | +``` |
| 35 | + |
| 36 | +### Linting |
| 37 | +```bash |
| 38 | +# Check for Python syntax errors and undefined names |
| 39 | +flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics |
| 40 | + |
| 41 | +# Full linting with warnings |
| 42 | +flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics |
| 43 | +``` |
| 44 | + |
| 45 | +### Building |
| 46 | +```bash |
| 47 | +# Build the package using Poetry |
| 48 | +poetry build |
| 49 | +``` |
| 50 | + |
| 51 | +## Architecture |
| 52 | + |
| 53 | +### Core Classes |
| 54 | + |
| 55 | +The library is organized into four main classes in `src/pyorcid/`: |
| 56 | + |
| 57 | +1. **`Orcid` (orcid.py)** - The main API wrapper class |
| 58 | + - Handles both Public and Member API access |
| 59 | + - Requires an ORCID ID and access token |
| 60 | + - Supports sandbox mode for testing |
| 61 | + - Methods map to ORCID API v3.0 endpoints (e.g., `/works`, `/person`, `/educations`) |
| 62 | + - Returns tuples: `(processed_data, raw_api_response)` for most section methods |
| 63 | + - Key methods: `record()`, `works()`, `educations()`, `employments()`, `fundings()`, `person()` |
| 64 | + - Helper: `generate_markdown_file()` creates formatted reports |
| 65 | + |
| 66 | +2. **`OrcidAuthentication` (orcid_authentication.py)** - Handles OAuth 2.0 authentication |
| 67 | + - `get_public_access_token()` - For reading public data (/read-public scope), no user auth required |
| 68 | + - `get_private_access_token()` - For Member API or limited-access data, requires user authorization |
| 69 | + - Supports both production and sandbox environments |
| 70 | + - Manages redirect URIs and authorization codes |
| 71 | + |
| 72 | +3. **`OrcidScrapper` (orcid_scrapper.py)** - Alternative data access via web scraping |
| 73 | + - Inherits from `Orcid` class |
| 74 | + - Scrapes public ORCID profiles without authentication |
| 75 | + - Converts XML responses to JSON and reformats to match API structure |
| 76 | + - Only works with public profiles |
| 77 | + - Overrides `__read_section()` to use web scraping instead of API calls |
| 78 | + |
| 79 | +4. **`OrcidSearch` (orcid_search.py)** - Wrapper for ORCID Search API |
| 80 | + - `search(query, start, rows, search_mode, columns)` - Searches ORCID registry |
| 81 | + - Supports three search modes: "expanded-search", "search", "csv-search" |
| 82 | + - Handles query encoding and pagination |
| 83 | + - Requires access token for authentication |
| 84 | + |
| 85 | +### API Modes |
| 86 | + |
| 87 | +The library supports two ORCID API modes: |
| 88 | +- **Public API** (`state="public"`): Read-only access to public profiles, uses `pub.orcid.org` |
| 89 | +- **Member API** (`state="member"`): Read/write access for ORCID members, uses `api.orcid.org` |
| 90 | + |
| 91 | +Both modes support sandbox environments for testing (`sandbox=True` parameter). |
| 92 | + |
| 93 | +### Data Processing Pipeline |
| 94 | + |
| 95 | +1. **Token validation**: All classes except `OrcidScrapper` validate tokens on initialization |
| 96 | +2. **API requests**: Made via `__read_section(section)` private method |
| 97 | +3. **Data extraction**: Helper methods like `__get_value_from_keys()` navigate nested JSON safely |
| 98 | +4. **Formatting**: Methods like `get_formatted_date()` convert API data to user-friendly formats |
| 99 | +5. **Unicode handling**: `__deunicode_string()` removes non-ASCII characters for compatibility |
| 100 | + |
| 101 | +### Testing Approach |
| 102 | + |
| 103 | +Tests use mocked HTTP requests (unittest.mock) to avoid live API calls. The main `Orcid` class includes special `__test_*` methods that pull credentials from environment variables (`ORCID_ACCESS_TOKEN`) for CI/CD integration with GitHub Actions. |
| 104 | + |
| 105 | +## Important Patterns |
| 106 | + |
| 107 | +- **Private methods**: Methods prefixed with `__` (double underscore) are internal-only |
| 108 | +- **Error handling**: Token validation occurs in `__init__()` for early failure detection |
| 109 | +- **Return tuples**: Section methods typically return `(simplified_data, raw_data)` to provide both convenience and full access |
| 110 | +- **Safe navigation**: `__get_value_from_keys()` prevents KeyError on missing nested keys |
| 111 | +- **Inheritance**: `OrcidScrapper` extends `Orcid` to reuse data processing logic while changing the data source |
| 112 | + |
| 113 | +## Dependencies |
| 114 | + |
| 115 | +Core dependencies: |
| 116 | +- `requests` - HTTP client for API calls |
| 117 | +- `python-dotenv` - Environment variable management |
| 118 | +- `urllib3` - URL handling and encoding |
| 119 | +- `xmltojson` - XML to JSON conversion (for scraping) |
| 120 | + |
| 121 | +Development dependencies: |
| 122 | +- `pytest` - Testing framework |
| 123 | +- `flake8` - Linting (used in CI/CD) |
0 commit comments