Skip to content

Add support for componentjs #4138

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion src/packagedcode/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
from packagedcode import build_gradle
from packagedcode import cargo
from packagedcode import chef
from packagedcode import componentjs
from packagedcode import debian
from packagedcode import debian_copyright
from packagedcode import distro
Expand Down Expand Up @@ -81,7 +82,7 @@
conan.ConanDataHandler,

cran.CranDescriptionFileHandler,

componentjs.ComponentJSONMetadataHandler,
debian_copyright.DebianCopyrightFileInPackageHandler,
debian_copyright.StandaloneDebianCopyrightFileHandler,
debian.DebianDscFileHandler,
Expand Down
154 changes: 154 additions & 0 deletions src/packagedcode/componentjs.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
#
# Copyright (c) nexB Inc. and others. All rights reserved.
# ScanCode is a trademark of nexB Inc.
# SPDX-License-Identifier: Apache-2.0
# See http://www.apache.org/licenses/LICENSE-2.0 for the license text.
# See https://github.com/nexB/scancode-toolkit for support or download.
# See https://aboutcode.org for more information about nexB OSS projects.
#

import json
from packagedcode import models
from packageurl import PackageURL
import yaml

class ComponentJSONMetadataHandler(models.NonAssemblableDatafileHandler):
"""
Handle component JSON metadata files for package analysis.
"""
datasource_id = "component_json_metadata"
path_patterns = ("*component.json",)
default_package_type = "library"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
default_package_type = "library"
default_package_type = "generic"

description = "component JSON package metadata file"

@classmethod
def parse(cls, location, package_only=False):
"""
Parse the JSON metadata file at `location` and yield PackageData.
"""
with open(location, "r", encoding="utf-8") as f:
data = json.load(f)

name = data.get('name') or data.get('repo', '').split('/')[-1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is name not a required attribute here? is repo always formatted like the following: "repo": "chaijs/chai"?
From https://github.com/componentjs/spec/blob/master/component.json/specifications.md#name seems like this is required. Same comment for namespace processing
Please go through the full spec carefully

if not name:
return

namespace = None
if 'repo' in data and '/' in data['repo']:
namespace, name = data['repo'].split('/', 1)

package_data = dict(
datasource_id=cls.datasource_id,
type=cls.default_package_type,
name=name,
namespace=namespace,
version=data.get('version'),
description=data.get('description', ''),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
description=data.get('description', ''),
description=data.get('description'),

We want the defaults to be None (or whatever is defined at the model for this attribute) always if we don't have any value.

homepage_url=cls._extract_homepage(data),
keywords=data.get('keywords', []),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
keywords=data.get('keywords', []),
keywords=data.get('keywords'),

dependencies=cls._process_dependencies(data),
extracted_license_statement=cls._extract_license_statement(data),
extra_data=cls._extract_extra_data(data)
)

if namespace and name:
package_data['purl'] = PackageURL(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You do not need to populate the purl field explicitly, this is populated based on the values while creating the PackageData object, in a more general way see https://github.com/aboutcode-org/scancode-toolkit/blob/develop/src/packagedcode/models.py#L302.

type='generic',
namespace=namespace,
name=name,
version=package_data.get('version')
).to_string()


yield models.PackageData.from_data(package_data, package_only)

@staticmethod
def _extract_homepage(data):
"""
Extract homepage URL from various possible sources.
"""
if data.get('homepage'):
return data['homepage']

if data.get('repo'):
return f'https://github.com/{data["repo"]}'

desc = data.get('description', '')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to do this specifically for this manifest, the description can have a lot of URLs which are not the homepage. It's better to return nothing rather than return false information.

if 'http' in desc:
urls = [word for word in desc.split() if word.startswith('http')]
return urls[0] if urls else None

return None

@staticmethod
def _process_dependencies(data):
"""
Process dependencies into DependentPackage objects.
"""
dependencies = []

for dep_name, dep_version in data.get('dependencies', {}).items():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are not processing devDependencies, this should be with scope='devDependencies', and the flags set similarly.

try:
if '/' in dep_name:
namespace, name = dep_name.split('/', 1)
else:
namespace, name = None, dep_name

purl = PackageURL(
type='generic',
namespace=namespace,
name=name,
version=dep_version
).to_string()

dependencies.append(
models.DependentPackage(
purl=purl,
scope='runtime',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
scope='runtime',
scope='dependencies',

Here this is specific to the manifest type

is_runtime=True,
is_optional=False
)
)
except Exception:
continue

return dependencies

@classmethod
def _extract_license_statement(cls, data):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to process/normalize this separately, this is handled while creating PackageData generally, since this is common across ecosystems, see https://github.com/aboutcode-org/scancode-toolkit/blob/develop/src/packagedcode/models.py#L782

"""
Extract license statement.

"""
license_field = data.get('license')
if not license_field:
return None

if isinstance(license_field, str):
return yaml.dump({"type": license_field.strip()}).strip()

if isinstance(license_field, list):
license_statements = [
yaml.dump({"type": lic.strip()}).strip()
for lic in license_field
if lic.strip()
]
return "\n".join(license_statements) if license_statements else None

return None

@staticmethod
def _extract_extra_data(data):
"""
Extract additional metadata not in core package data.
"""
extra_fields = [
'main', 'scripts', 'styles', 'bin',
'repository', 'private', 'dev', 'development'
]

return {
field: data[field]
for field in extra_fields
if field in data
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
{
"name": "angular-ui-sortable",
"version": "0.0.1",
"description": "This directive allows you to jQueryUI Sortable.",
"author": "https://github.com/angular-ui/ui-sortable/graphs/contributors",
"license": "MIT",
"homepage": "http://angular-ui.github.com",
"main": "./src/sortable.js",
"ignore": [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to have ignore in extra data too, as this is useful in the assembly step.

"**/.*",
"node_modules",
"components",
"test*",
"demo*",
"gruntFile.js",
"package.json"
],
"dependencies": {
"angular": "~1.x",
"jquery-ui": ">= 1.9"
},
"devDependencies": {
"angular-mocks": "~1.x"
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
{
"packages": [],
"dependencies": [],
"files": [
{
"path": "component.json",
"type": "file",
"package_data": [
{
"type": "generic",
"namespace": null,
"name": "angular-ui-sortable",
"version": "0.0.1",
"qualifiers": {},
"subpath": null,
"primary_language": null,
"description": "This directive allows you to jQueryUI Sortable.",
"release_date": null,
"parties": [],
"keywords": [],
"homepage_url": "http://angular-ui.github.com",
"download_url": null,
"size": null,
"sha1": null,
"md5": null,
"sha256": null,
"sha512": null,
"bug_tracking_url": null,
"code_view_url": null,
"vcs_url": null,
"copyright": null,
"holder": null,
"declared_license_expression": "mit",
"declared_license_expression_spdx": "MIT",
"license_detections": [
{
"license_expression": "mit",
"license_expression_spdx": "MIT",
"matches": [
{
"license_expression": "mit",
"license_expression_spdx": "MIT",
"from_file": "component.json",
"start_line": 1,
"end_line": 1,
"matcher": "1-hash",
"score": 16.0,
"matched_length": 3,
"match_coverage": 100.0,
"rule_relevance": 16,
"rule_identifier": "mit_1301.RULE",
"rule_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/mit_1301.RULE",
"matched_text": "license type: MIT"
}
],
"identifier": "mit-1c9cba21-81d2-7522-ac3e-dfde6630f8d1"
}
],
"other_license_expression": null,
"other_license_expression_spdx": null,
"other_license_detections": [],
"extracted_license_statement": "type: MIT",
"notice_text": null,
"source_packages": [],
"file_references": [],
"is_private": false,
"is_virtual": false,
"extra_data": {
"main": "./src/sortable.js"
},
"dependencies": [
{
"purl": "pkg:generic/angular@~1.x",
"extracted_requirement": null,
"scope": "runtime",
"is_runtime": true,
"is_optional": false,
"is_pinned": false,
"is_direct": true,
"resolved_package": {},
"extra_data": {}
},
{
"purl": "pkg:generic/jquery-ui@%3E%3D%201.9",
"extracted_requirement": null,
"scope": "runtime",
"is_runtime": true,
"is_optional": false,
"is_pinned": false,
"is_direct": true,
"resolved_package": {},
"extra_data": {}
}
],
"repository_homepage_url": null,
"repository_download_url": null,
"api_data_url": null,
"datasource_id": "component_json_metadata",
"purl": "pkg:generic/[email protected]"
}
],
"for_packages": [],
"scan_errors": []
}
]
}
51 changes: 51 additions & 0 deletions tests/packagedcode/data/componentjs/chai/component.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
{
"name": "chai"
, "repo": "chaijs/chai"
, "version": "2.1.2"
, "description": "BDD/TDD assertion library for node.js and the browser. Test framework agnostic."
, "license": "MIT"
, "keywords": [
"test"
, "assertion"
, "assert"
, "testing"
, "chai"
]
, "main": "index.js"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to add a assemble and assign_packages_to_resources_* functions to process these properly. These are used to process top level packages and assign files to these package objects (to resolve which files are part of a package). This is handled by default functions which are base implementations: https://github.com/aboutcode-org/scancode-toolkit/blob/develop/src/packagedcode/models.py#L1137 but whenever these are specific data we need to override these by explicit functions. This populates the for_packages attribute of resources and does a couple other things.

Then also add tests with directories and files to test this too.

See other examples of this in other datafilehandlers, like the simple assembly in https://github.com/aboutcode-org/scancode-toolkit/blob/develop/src/packagedcode/conda.py#L42
Please ask questions on this if you need help with this, as this can be more complex

, "scripts": [
"index.js"
, "lib/chai.js"
, "lib/chai/assertion.js"
, "lib/chai/config.js"
, "lib/chai/core/assertions.js"
, "lib/chai/interface/assert.js"
, "lib/chai/interface/expect.js"
, "lib/chai/interface/should.js"
, "lib/chai/utils/addChainableMethod.js"
, "lib/chai/utils/addMethod.js"
, "lib/chai/utils/addProperty.js"
, "lib/chai/utils/flag.js"
, "lib/chai/utils/getActual.js"
, "lib/chai/utils/getEnumerableProperties.js"
, "lib/chai/utils/getMessage.js"
, "lib/chai/utils/getName.js"
, "lib/chai/utils/getPathValue.js"
, "lib/chai/utils/getPathInfo.js"
, "lib/chai/utils/hasProperty.js"
, "lib/chai/utils/getProperties.js"
, "lib/chai/utils/index.js"
, "lib/chai/utils/inspect.js"
, "lib/chai/utils/objDisplay.js"
, "lib/chai/utils/overwriteMethod.js"
, "lib/chai/utils/overwriteProperty.js"
, "lib/chai/utils/overwriteChainableMethod.js"
, "lib/chai/utils/test.js"
, "lib/chai/utils/transferFlags.js"
, "lib/chai/utils/type.js"
]
, "dependencies": {
"chaijs/assertion-error": "1.0.0"
, "chaijs/deep-eql": "0.1.3"
}
, "development": {}
}
Loading
Loading