Skip to content

修改GetTitle函数的注释,增加了CefSharp的使用 #9

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -26,4 +26,6 @@ obj/
_ReSharper*/
[Tt]est[Rr]esult*
*.vssscc
$tf*/
$tf*/
.vs
packages
8 changes: 8 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,13 @@
# Html2Article

## Html2Article特色
本项目在 https://github.com/stanzhai/Html2Article 的基础上进行了部分修改,深深感谢原作者的开源精神及为此做出的努力和付出。

* 增加试用CefSharp来加载url
* 增加首先从Meta标签中获取文章发布日期,目前兼容新浪、腾讯、网易及符合《国务院办公厅关于印发政府网站发展指引的通知》标准的政府网站(参考:http://www.gov.cn/zhengce/content/2017-06/08/content_5200760.htm)
* 增加首先从Meta标签中获取文章标题,目前兼容符合《国务院办公厅关于印发政府网站发展指引的通知》标准的政府网站

以下为原作者描述:
.NET平台下,一个高效的从Html中提取正文的工具。
正文提取采用了基于文本密度的提取算法,支持从压缩的Html文档中提取正文,每个页面平均提取时间为30ms,正确率在95%以上。
![Html2Article](http://stanzhai.github.io/images/project/Html2Article.png)
Expand Down
37 changes: 37 additions & 0 deletions src/Demo/ControlExtensions.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
// Copyright © 2010-2015 The CefSharp Authors. All rights reserved.
//
// Use of this source code is governed by a BSD-style license that can be found in the LICENSE file.

using System;
using System.Windows.Forms;

namespace Demo
{
public static class ControlExtensions
{
/// <summary>
/// Executes the Action asynchronously on the UI thread, does not block execution on the calling thread.
/// </summary>
/// <param name="control">the control for which the update is required</param>
/// <param name="action">action to be performed on the control</param>
public static void InvokeOnUiThreadIfRequired(this Control control, Action action)
{
//If you are planning on using a similar function in your own code then please be sure to
//have a quick read over https://stackoverflow.com/questions/1874728/avoid-calling-invoke-when-the-control-is-disposed
//No action
if (control.Disposing || control.IsDisposed || !control.IsHandleCreated)
{
return;
}

if (control.InvokeRequired)
{
control.BeginInvoke(action);
}
else
{
action.Invoke();
}
}
}
}
215 changes: 122 additions & 93 deletions src/Demo/Demo.csproj
Original file line number Diff line number Diff line change
@@ -1,100 +1,129 @@
<?xml version="1.0" encoding="utf-8"?>
<Project ToolsVersion="4.0" DefaultTargets="Build" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
<PropertyGroup>
<Configuration Condition=" '$(Configuration)' == '' ">Debug</Configuration>
<Platform Condition=" '$(Platform)' == '' ">x86</Platform>
<ProductVersion>8.0.30703</ProductVersion>
<SchemaVersion>2.0</SchemaVersion>
<ProjectGuid>{7F199520-F681-4BB7-B64F-79F68B1DFA43}</ProjectGuid>
<OutputType>WinExe</OutputType>
<AppDesignerFolder>Properties</AppDesignerFolder>
<RootNamespace>Demo</RootNamespace>
<AssemblyName>Demo</AssemblyName>
<TargetFrameworkVersion>v4.0</TargetFrameworkVersion>
<TargetFrameworkProfile>
</TargetFrameworkProfile>
<FileAlignment>512</FileAlignment>
</PropertyGroup>
<PropertyGroup Condition=" '$(Configuration)|$(Platform)' == 'Debug|x86' ">
<PlatformTarget>x86</PlatformTarget>
<DebugSymbols>true</DebugSymbols>
<DebugType>full</DebugType>
<Optimize>false</Optimize>
<OutputPath>..\..\bin\Debug\</OutputPath>
<DefineConstants>DEBUG;TRACE</DefineConstants>
<ErrorReport>prompt</ErrorReport>
<WarningLevel>4</WarningLevel>
</PropertyGroup>
<PropertyGroup Condition=" '$(Configuration)|$(Platform)' == 'Release|x86' ">
<PlatformTarget>x86</PlatformTarget>
<DebugType>pdbonly</DebugType>
<Optimize>true</Optimize>
<OutputPath>..\..\bin\Release\</OutputPath>
<DefineConstants>TRACE</DefineConstants>
<ErrorReport>prompt</ErrorReport>
<WarningLevel>4</WarningLevel>
</PropertyGroup>
<ItemGroup>
<Reference Include="System" />
<Reference Include="System.Core" />
<Reference Include="System.Web" />
<Reference Include="System.Xml.Linq" />
<Reference Include="System.Data.DataSetExtensions" />
<Reference Include="Microsoft.CSharp" />
<Reference Include="System.Data" />
<Reference Include="System.Deployment" />
<Reference Include="System.Drawing" />
<Reference Include="System.Windows.Forms" />
<Reference Include="System.Xml" />
</ItemGroup>
<ItemGroup>
<Compile Include="FrmMain.cs">
<SubType>Form</SubType>
</Compile>
<Compile Include="FrmMain.Designer.cs">
<DependentUpon>FrmMain.cs</DependentUpon>
</Compile>
<Compile Include="Program.cs" />
<Compile Include="Properties\AssemblyInfo.cs" />
<EmbeddedResource Include="FrmMain.resx">
<DependentUpon>FrmMain.cs</DependentUpon>
</EmbeddedResource>
<EmbeddedResource Include="Properties\Resources.resx">
<Generator>ResXFileCodeGenerator</Generator>
<LastGenOutput>Resources.Designer.cs</LastGenOutput>
<SubType>Designer</SubType>
</EmbeddedResource>
<Compile Include="Properties\Resources.Designer.cs">
<AutoGen>True</AutoGen>
<DependentUpon>Resources.resx</DependentUpon>
<DesignTime>True</DesignTime>
</Compile>
<None Include="app.config">
<SubType>Designer</SubType>
</None>
<None Include="Properties\DataSources\Article.datasource" />
<None Include="Properties\Settings.settings">
<Generator>SettingsSingleFileGenerator</Generator>
<LastGenOutput>Settings.Designer.cs</LastGenOutput>
</None>
<Compile Include="Properties\Settings.Designer.cs">
<AutoGen>True</AutoGen>
<DependentUpon>Settings.settings</DependentUpon>
<DesignTimeSharedInput>True</DesignTimeSharedInput>
</Compile>
</ItemGroup>
<ItemGroup>
<ProjectReference Include="..\Html2Article\Html2Article.csproj">
<Project>{0EDCFCDB-08FC-43AE-B6BB-BAF8D6EDF61E}</Project>
<Name>Html2Article</Name>
</ProjectReference>
</ItemGroup>
<Import Project="$(MSBuildToolsPath)\Microsoft.CSharp.targets" />
<?xml version="1.0" encoding="utf-8"?>
<Project ToolsVersion="12.0" DefaultTargets="Build" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
<Import Project="..\..\packages\CefSharp.Common.100.0.230\build\CefSharp.Common.props" Condition="Exists('..\..\packages\CefSharp.Common.100.0.230\build\CefSharp.Common.props')" />
<Import Project="..\..\packages\cef.redist.x86.100.0.23\build\cef.redist.x86.props" Condition="Exists('..\..\packages\cef.redist.x86.100.0.23\build\cef.redist.x86.props')" />
<Import Project="..\..\packages\cef.redist.x64.100.0.23\build\cef.redist.x64.props" Condition="Exists('..\..\packages\cef.redist.x64.100.0.23\build\cef.redist.x64.props')" />
<PropertyGroup>
<Configuration Condition=" '$(Configuration)' == '' ">Debug</Configuration>
<Platform Condition=" '$(Platform)' == '' ">x86</Platform>
<ProductVersion>8.0.30703</ProductVersion>
<SchemaVersion>2.0</SchemaVersion>
<ProjectGuid>{7F199520-F681-4BB7-B64F-79F68B1DFA43}</ProjectGuid>
<OutputType>WinExe</OutputType>
<AppDesignerFolder>Properties</AppDesignerFolder>
<RootNamespace>Demo</RootNamespace>
<AssemblyName>Demo</AssemblyName>
<TargetFrameworkVersion>v4.8</TargetFrameworkVersion>
<TargetFrameworkProfile>
</TargetFrameworkProfile>
<FileAlignment>512</FileAlignment>
<NuGetPackageImportStamp>
</NuGetPackageImportStamp>
</PropertyGroup>
<PropertyGroup Condition=" '$(Configuration)|$(Platform)' == 'Debug|x86' ">
<PlatformTarget>x86</PlatformTarget>
<DebugSymbols>true</DebugSymbols>
<DebugType>full</DebugType>
<Optimize>false</Optimize>
<OutputPath>..\..\bin\Debug\</OutputPath>
<DefineConstants>DEBUG;TRACE</DefineConstants>
<ErrorReport>prompt</ErrorReport>
<WarningLevel>4</WarningLevel>
<Prefer32Bit>false</Prefer32Bit>
</PropertyGroup>
<PropertyGroup Condition=" '$(Configuration)|$(Platform)' == 'Release|x86' ">
<PlatformTarget>x86</PlatformTarget>
<DebugType>pdbonly</DebugType>
<Optimize>true</Optimize>
<OutputPath>..\..\bin\Release\</OutputPath>
<DefineConstants>TRACE</DefineConstants>
<ErrorReport>prompt</ErrorReport>
<WarningLevel>4</WarningLevel>
<Prefer32Bit>false</Prefer32Bit>
</PropertyGroup>
<ItemGroup>
<Reference Include="CefSharp, Version=100.0.230.0, Culture=neutral, PublicKeyToken=40c4b6fc221f4138, processorArchitecture=MSIL">
<HintPath>..\..\packages\CefSharp.Common.100.0.230\lib\net452\CefSharp.dll</HintPath>
</Reference>
<Reference Include="CefSharp.Core, Version=100.0.230.0, Culture=neutral, PublicKeyToken=40c4b6fc221f4138, processorArchitecture=MSIL">
<HintPath>..\..\packages\CefSharp.Common.100.0.230\lib\net452\CefSharp.Core.dll</HintPath>
</Reference>
<Reference Include="CefSharp.WinForms, Version=100.0.230.0, Culture=neutral, PublicKeyToken=40c4b6fc221f4138, processorArchitecture=MSIL">
<HintPath>..\..\packages\CefSharp.WinForms.100.0.230\lib\net462\CefSharp.WinForms.dll</HintPath>
</Reference>
<Reference Include="System" />
<Reference Include="System.Core" />
<Reference Include="System.Web" />
<Reference Include="System.Xml.Linq" />
<Reference Include="System.Data.DataSetExtensions" />
<Reference Include="Microsoft.CSharp" />
<Reference Include="System.Data" />
<Reference Include="System.Deployment" />
<Reference Include="System.Drawing" />
<Reference Include="System.Windows.Forms" />
<Reference Include="System.Xml" />
</ItemGroup>
<ItemGroup>
<Compile Include="ControlExtensions.cs" />
<Compile Include="FrmMain.cs">
<SubType>Form</SubType>
</Compile>
<Compile Include="FrmMain.Designer.cs">
<DependentUpon>FrmMain.cs</DependentUpon>
</Compile>
<Compile Include="Program.cs" />
<Compile Include="Properties\AssemblyInfo.cs" />
<EmbeddedResource Include="FrmMain.resx">
<DependentUpon>FrmMain.cs</DependentUpon>
<SubType>Designer</SubType>
</EmbeddedResource>
<EmbeddedResource Include="Properties\Resources.resx">
<Generator>ResXFileCodeGenerator</Generator>
<LastGenOutput>Resources.Designer.cs</LastGenOutput>
<SubType>Designer</SubType>
</EmbeddedResource>
<Compile Include="Properties\Resources.Designer.cs">
<AutoGen>True</AutoGen>
<DependentUpon>Resources.resx</DependentUpon>
<DesignTime>True</DesignTime>
</Compile>
<None Include="app.config">
<SubType>Designer</SubType>
</None>
<None Include="packages.config" />
<None Include="Properties\DataSources\Article.datasource" />
<None Include="Properties\Settings.settings">
<Generator>SettingsSingleFileGenerator</Generator>
<LastGenOutput>Settings.Designer.cs</LastGenOutput>
</None>
<Compile Include="Properties\Settings.Designer.cs">
<AutoGen>True</AutoGen>
<DependentUpon>Settings.settings</DependentUpon>
<DesignTimeSharedInput>True</DesignTimeSharedInput>
</Compile>
</ItemGroup>
<ItemGroup>
<ProjectReference Include="..\Html2Article\Html2Article.csproj">
<Project>{0EDCFCDB-08FC-43AE-B6BB-BAF8D6EDF61E}</Project>
<Name>Html2Article</Name>
</ProjectReference>
</ItemGroup>
<Import Project="$(MSBuildToolsPath)\Microsoft.CSharp.targets" />
<Target Name="EnsureNuGetPackageBuildImports" BeforeTargets="PrepareForBuild">
<PropertyGroup>
<ErrorText>这台计算机上缺少此项目引用的 NuGet 程序包。使用“NuGet 程序包还原”可下载这些程序包。有关更多信息,请参见 http://go.microsoft.com/fwlink/?LinkID=322105。缺少的文件是 {0}。</ErrorText>
</PropertyGroup>
<Error Condition="!Exists('..\..\packages\cef.redist.x64.100.0.23\build\cef.redist.x64.props')" Text="$([System.String]::Format('$(ErrorText)', '..\..\packages\cef.redist.x64.100.0.23\build\cef.redist.x64.props'))" />
<Error Condition="!Exists('..\..\packages\cef.redist.x86.100.0.23\build\cef.redist.x86.props')" Text="$([System.String]::Format('$(ErrorText)', '..\..\packages\cef.redist.x86.100.0.23\build\cef.redist.x86.props'))" />
<Error Condition="!Exists('..\..\packages\CefSharp.Common.100.0.230\build\CefSharp.Common.props')" Text="$([System.String]::Format('$(ErrorText)', '..\..\packages\CefSharp.Common.100.0.230\build\CefSharp.Common.props'))" />
<Error Condition="!Exists('..\..\packages\CefSharp.Common.100.0.230\build\CefSharp.Common.targets')" Text="$([System.String]::Format('$(ErrorText)', '..\..\packages\CefSharp.Common.100.0.230\build\CefSharp.Common.targets'))" />
</Target>
<Import Project="..\..\packages\CefSharp.Common.100.0.230\build\CefSharp.Common.targets" Condition="Exists('..\..\packages\CefSharp.Common.100.0.230\build\CefSharp.Common.targets')" />
<!-- To modify your build process, add your task inside one of the targets below and uncomment it.
Other similar extension points exist, see Microsoft.Common.targets.
<Target Name="BeforeBuild">
</Target>
<Target Name="AfterBuild">
</Target>
-->
-->
</Project>
Loading