Skip to content

Commit

Permalink
Merge pull request #737 from microlinkhq/profiling
Browse files Browse the repository at this point in the history
feat: add profiling support
  • Loading branch information
Kikobeats authored Jan 12, 2025
2 parents 522b1bc + e88d2b1 commit 4a3c5c9
Show file tree
Hide file tree
Showing 88 changed files with 428 additions and 161 deletions.
10 changes: 10 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,16 @@
All notable changes to this project will be documented in this file.
See [Conventional Commits](https://conventionalcommits.org) for commit guidelines.

# [5.46.0-beta.0](https://github.com/microlinkhq/metascraper/compare/v5.45.29...v5.46.0-beta.0) (2025-01-10)

### Bug Fixes

* load dependency ([6344788](https://github.com/microlinkhq/metascraper/commit/6344788ddbfc27a03f3ce12b2a842cd438574cc5))

### Features

* add profiling support ([9370e3c](https://github.com/microlinkhq/metascraper/commit/9370e3cdde056e86dcc2d189b3b22dd01a310372))

## [5.45.29](https://github.com/microlinkhq/metascraper/compare/v5.45.28...v5.45.29) (2025-01-07)

### Bug Fixes
Expand Down
34 changes: 26 additions & 8 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ A set of rules under the same namespace runs in series and only the value return
You can associate a `test` function with your rule bundle:

```js
rules.test = ({ url }) => getVideoInfo(url).service === 'youtube'))
rules.test = ({ url }) => getVideoInfo(url).service === 'youtube'
```

The `test` function will receive the same arguments as a rule. This is useful for skipping all rules that doesn't target a specific URL.
Expand All @@ -52,12 +52,31 @@ A good practice is to use a memoize function to prevent unnecessary CPU cycles f
```js
const { memoizeOne } = require('@metascraper/helpers')

const test = memoizeOne(url => getVideoInfo(url).service === 'youtube'))
const test = memoizeOne(url => getVideoInfo(url).service === 'youtube')

const rules = []
rules.test ({ url }) => test(url)
rules.test = ({ url }) => test(url)
```

### Defining `pkgName` property

Additionally you can define `pkgName` property associated with your rules:

```js
const { memoizeOne } = require('@metascraper/helpers')

const rules = []
rules.pkgName = 'metascraper-module'
```

This is using for printing debug logs, see debugging section to know how to use it.

## Debugging your Rules

In case you need to see what's happening under the hood, you can set `DEBUG='metascraper*'.

This is useful for verifying rule precedence and detecting slow rules.

## Testing your Rules

Since the order of the rules is important, testing it is also an important thing in order to be sure more popular rules are executed first over less popular rules.
Expand All @@ -74,7 +93,6 @@ const metascraper = require('metascraper')([
require('metascraper-logo')()
])


describe('metascraper-logo', () => {
it('creates an absolute favicon url if the logo is not present', async () => {
const html = `
Expand All @@ -92,8 +110,8 @@ describe('metascraper-logo', () => {
</body>
</html>
`
const meta = await metascraper({ html, url }))
should(meta.log).be.equal("open graph value")
const meta = await metascraper({ html, url })
should(meta.log).be.equal('open graph value')
})
})
```
Expand Down Expand Up @@ -129,8 +147,8 @@ const metascraper = require('metascraper')([
describe('metascraper-logo', () => {
it('it resolves logo value', async () => {
const html = fs.readFileSync('index.html', 'utf-8')
const meta = await metascraper({ html, url }))
should(meta.logo).be.equal("https://metascraper.js.org/static/logo.png")
const meta = await metascraper({ html, url })
should(meta.logo).be.equal('https://metascraper.js.org/static/logo.png')
})
})
```
2 changes: 1 addition & 1 deletion lerna.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"packages": [
"packages/*"
],
"version": "5.45.29",
"version": "5.46.0-beta.2",
"command": {
"bootstrap": {
"npmClientArgs": [
Expand Down
4 changes: 4 additions & 0 deletions packages/metascraper-amazon/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,10 @@
All notable changes to this project will be documented in this file.
See [Conventional Commits](https://conventionalcommits.org) for commit guidelines.

# [5.46.0-beta.0](https://github.com/microlinkhq/metascraper/compare/v5.45.29...v5.46.0-beta.0) (2025-01-10)

**Note:** Version bump only for package metascraper-amazon

## [5.45.28](https://github.com/microlinkhq/metascraper/compare/v5.45.27...v5.45.28) (2025-01-01)

**Note:** Version bump only for package metascraper-amazon
Expand Down
2 changes: 1 addition & 1 deletion packages/metascraper-amazon/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"name": "metascraper-amazon",
"description": "Metascraper integration with Amazon",
"homepage": "https://github.com/microlinkhq/metascraper/packages/metascraper-amazon",
"version": "5.45.28",
"version": "5.46.0-beta.2",
"types": "src/index.d.ts",
"main": "src/index.js",
"author": {
Expand Down
2 changes: 2 additions & 0 deletions packages/metascraper-amazon/src/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -60,5 +60,7 @@ module.exports = () => {

rules.test = ({ url }) => test(url)

rules.pkgName = 'metascraper-amazon'

return rules
}
4 changes: 4 additions & 0 deletions packages/metascraper-audio/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,10 @@
All notable changes to this project will be documented in this file.
See [Conventional Commits](https://conventionalcommits.org) for commit guidelines.

# [5.46.0-beta.0](https://github.com/microlinkhq/metascraper/compare/v5.45.29...v5.46.0-beta.0) (2025-01-10)

**Note:** Version bump only for package metascraper-audio

## [5.45.28](https://github.com/microlinkhq/metascraper/compare/v5.45.27...v5.45.28) (2025-01-01)

**Note:** Version bump only for package metascraper-audio
Expand Down
2 changes: 1 addition & 1 deletion packages/metascraper-audio/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"name": "metascraper-audio",
"description": "Get audio property from HTML markup",
"homepage": "https://github.com/microlinkhq/metascraper/packages/metascraper-audio",
"version": "5.45.28",
"version": "5.46.0-beta.2",
"types": "src/index.d.ts",
"main": "src/index.js",
"author": {
Expand Down
6 changes: 5 additions & 1 deletion packages/metascraper-audio/src/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ const _getIframe = (url, $, { src }) =>
loadIframe(url, $.load(`<iframe src="${src}"></iframe>`))

module.exports = ({ getIframe = _getIframe } = {}) => {
return {
const rules = {
audio: audioRules.concat(
async ({ htmlDom: $, url }) => {
const srcs = [
Expand Down Expand Up @@ -110,4 +110,8 @@ module.exports = ({ getIframe = _getIframe } = {}) => {
}
)
}

rules.pkgName = 'metascraper-audio'

return rules
}
4 changes: 4 additions & 0 deletions packages/metascraper-author/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,10 @@
All notable changes to this project will be documented in this file.
See [Conventional Commits](https://conventionalcommits.org) for commit guidelines.

# [5.46.0-beta.0](https://github.com/microlinkhq/metascraper/packages/metascraper-author/compare/v5.45.29...v5.46.0-beta.0) (2025-01-10)

**Note:** Version bump only for package metascraper-author

## [5.45.28](https://github.com/microlinkhq/metascraper/packages/metascraper-author/compare/v5.45.27...v5.45.28) (2025-01-01)

**Note:** Version bump only for package metascraper-author
Expand Down
2 changes: 1 addition & 1 deletion packages/metascraper-author/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"name": "metascraper-author",
"description": "Get author property from HTML markup",
"homepage": "https://metascraper.js.org",
"version": "5.45.28",
"version": "5.46.0-beta.2",
"types": "src/index.d.ts",
"main": "src/index.js",
"author": {
Expand Down
52 changes: 29 additions & 23 deletions packages/metascraper-author/src/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -24,27 +24,33 @@ const strict = rule => $ => {
return REGEX_STRICT.test(value) && value
}

module.exports = () => ({
author: [
toAuthor($jsonld('author.name')),
toAuthor($jsonld('brand.name')),
toAuthor($ => $('meta[name="author"]').attr('content')),
toAuthor($ => $('meta[property="article:author"]').attr('content')),
toAuthor($ => $filter($, $('[itemprop*="author" i] [itemprop="name"]'))),
toAuthor($ => $filter($, $('[itemprop*="author" i]'))),
toAuthor($ => $filter($, $('[rel="author"]'))),
strict(toAuthor($ => $filter($, $('a[class*="author" i]')))),
strict(toAuthor($ => $filter($, $('[class*="author" i] a')))),
strict(toAuthor($ => $filter($, $('a[href*="/author/" i]')))),
toAuthor($ => $filter($, $('a[class*="screenname" i]'))),
strict(toAuthor($ => $filter($, $('[class*="author" i]')))),
strict(
toAuthor($ =>
$filter($, $('[class*="byline" i]'), el => {
const value = $filter.fn(el)
return !date(value) && value
})
module.exports = () => {
const rules = {
author: [
toAuthor($jsonld('author.name')),
toAuthor($jsonld('brand.name')),
toAuthor($ => $('meta[name="author"]').attr('content')),
toAuthor($ => $('meta[property="article:author"]').attr('content')),
toAuthor($ => $filter($, $('[itemprop*="author" i] [itemprop="name"]'))),
toAuthor($ => $filter($, $('[itemprop*="author" i]'))),
toAuthor($ => $filter($, $('[rel="author"]'))),
strict(toAuthor($ => $filter($, $('a[class*="author" i]')))),
strict(toAuthor($ => $filter($, $('[class*="author" i] a')))),
strict(toAuthor($ => $filter($, $('a[href*="/author/" i]')))),
toAuthor($ => $filter($, $('a[class*="screenname" i]'))),
strict(toAuthor($ => $filter($, $('[class*="author" i]')))),
strict(
toAuthor($ =>
$filter($, $('[class*="byline" i]'), el => {
const value = $filter.fn(el)
return !date(value) && value
})
)
)
)
]
})
]
}

rules.pkgName = 'metascraper-author'

return rules
}
4 changes: 4 additions & 0 deletions packages/metascraper-clearbit/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,10 @@
All notable changes to this project will be documented in this file.
See [Conventional Commits](https://conventionalcommits.org) for commit guidelines.

# [5.46.0-beta.0](https://github.com/microlinkhq/metascraper/compare/v5.45.29...v5.46.0-beta.0) (2025-01-10)

**Note:** Version bump only for package metascraper-clearbit

## [5.45.28](https://github.com/microlinkhq/metascraper/compare/v5.45.27...v5.45.28) (2025-01-01)

**Note:** Version bump only for package metascraper-clearbit
Expand Down
2 changes: 1 addition & 1 deletion packages/metascraper-clearbit/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"name": "metascraper-clearbit",
"description": "Metascraper integration with Clearbit Logo API",
"homepage": "https://github.com/microlinkhq/metascraper/packages/metascraper-clearbit",
"version": "5.45.28",
"version": "5.46.0-beta.2",
"types": "src/index.d.ts",
"main": "src/index.js",
"author": {
Expand Down
6 changes: 5 additions & 1 deletion packages/metascraper-clearbit/src/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,12 @@ module.exports = opts => {
const clearbit = createClearbit(opts)
const getClearbit = composeRule(($, url) => clearbit(parseUrl(url).domain))

return {
const rules = {
logo: getClearbit({ from: 'logo' }),
publisher: getClearbit({ from: 'name', to: 'publisher' })
}

rules.pkgName = 'metascraper-clearbit'

return rules
}
4 changes: 4 additions & 0 deletions packages/metascraper-date/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,10 @@
All notable changes to this project will be documented in this file.
See [Conventional Commits](https://conventionalcommits.org) for commit guidelines.

# [5.46.0-beta.0](https://github.com/microlinkhq/metascraper/compare/v5.45.29...v5.46.0-beta.0) (2025-01-10)

**Note:** Version bump only for package metascraper-date

## [5.45.28](https://github.com/microlinkhq/metascraper/compare/v5.45.27...v5.45.28) (2025-01-01)

**Note:** Version bump only for package metascraper-date
Expand Down
2 changes: 1 addition & 1 deletion packages/metascraper-date/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"name": "metascraper-date",
"description": "Get date property from HTML markup",
"homepage": "https://github.com/microlinkhq/metascraper/packages/metascraper-date",
"version": "5.45.28",
"version": "5.46.0-beta.2",
"types": "src/index.d.ts",
"main": "src/index.js",
"author": {
Expand Down
10 changes: 6 additions & 4 deletions packages/metascraper-date/src/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -43,17 +43,19 @@ module.exports = (
dateModified: false
}
) => {
const result = {
const rules = {
date: dateModifiedRules().concat(datePublishedRules(), dateRules())
}

if (datePublished) {
result.datePublished = datePublishedRules()
rules.datePublished = datePublishedRules()
}

if (dateModified) {
result.dateModified = dateModifiedRules()
rules.dateModified = dateModifiedRules()
}

return result
rules.pkgName = 'metascraper-date'

return rules
}
4 changes: 4 additions & 0 deletions packages/metascraper-description/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,10 @@
All notable changes to this project will be documented in this file.
See [Conventional Commits](https://conventionalcommits.org) for commit guidelines.

# [5.46.0-beta.0](https://github.com/microlinkhq/metascraper/compare/v5.45.29...v5.46.0-beta.0) (2025-01-10)

**Note:** Version bump only for package metascraper-description

## [5.45.28](https://github.com/microlinkhq/metascraper/compare/v5.45.27...v5.45.28) (2025-01-01)

**Note:** Version bump only for package metascraper-description
Expand Down
2 changes: 1 addition & 1 deletion packages/metascraper-description/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"name": "metascraper-description",
"description": "Get description property from HTML markup",
"homepage": "https://github.com/microlinkhq/metascraper/packages/metascraper-description",
"version": "5.45.28",
"version": "5.46.0-beta.2",
"types": "src/index.d.ts",
"main": "src/index.js",
"author": {
Expand Down
6 changes: 5 additions & 1 deletion packages/metascraper-description/src/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ const { $jsonld, toRule, description } = require('@metascraper/helpers')
module.exports = opts => {
const toDescription = toRule(description, opts)

return {
const rules = {
description: [
toDescription($ => $('meta[property="og:description"]').attr('content')),
toDescription($ => $('meta[name="twitter:description"]').attr('content')),
Expand All @@ -18,4 +18,8 @@ module.exports = opts => {
toDescription($jsonld('description'))
]
}

rules.pkgName = 'metascraper-description'

return rules
}
4 changes: 4 additions & 0 deletions packages/metascraper-feed/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,10 @@
All notable changes to this project will be documented in this file.
See [Conventional Commits](https://conventionalcommits.org) for commit guidelines.

# [5.46.0-beta.0](https://github.com/microlinkhq/metascraper/compare/v5.45.29...v5.46.0-beta.0) (2025-01-10)

**Note:** Version bump only for package metascraper-feed

## [5.45.28](https://github.com/microlinkhq/metascraper/compare/v5.45.27...v5.45.28) (2025-01-01)

**Note:** Version bump only for package metascraper-feed
Expand Down
2 changes: 1 addition & 1 deletion packages/metascraper-feed/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"name": "metascraper-feed",
"description": "Get RSS/Atom feed URL from HTML markup",
"homepage": "https://github.com/microlinkhq/metascraper/packages/metascraper-description",
"version": "5.45.28",
"version": "5.46.0-beta.2",
"types": "src/index.d.ts",
"main": "src/index.js",
"author": {
Expand Down
6 changes: 5 additions & 1 deletion packages/metascraper-feed/src/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,15 @@ const { toRule, url } = require('@metascraper/helpers')
const toUrl = toRule(url)

module.exports = () => {
return {
const rules = {
feed: [
toUrl($ => $('link[type="application/rss+xml"]').attr('href')),
toUrl($ => $('link[type="application/feed+json"]').attr('href')),
toUrl($ => $('link[type="application/atom+xml"]').attr('href'))
]
}

rules.pkgName = 'metascraper-feed'

return rules
}
Loading

0 comments on commit 4a3c5c9

Please sign in to comment.