<?xml version="1.0"?>
<?xml-stylesheet type="text/css" href="http://wiki.huihoo.com/skins/common/feed.css?303"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="zh-cn">
		<id>http://wiki.huihoo.com/wiki/?action=history&amp;feed=atom&amp;title=Web-Harvest</id>
		<title>Web-Harvest - 版本历史</title>
		<link rel="self" type="application/atom+xml" href="http://wiki.huihoo.com/wiki/?action=history&amp;feed=atom&amp;title=Web-Harvest"/>
		<link rel="alternate" type="text/html" href="http://wiki.huihoo.com/wiki/?title=Web-Harvest&amp;action=history"/>
		<updated>2026-04-09T19:15:41Z</updated>
		<subtitle>本wiki的该页面的版本历史</subtitle>
		<generator>MediaWiki 1.19.2</generator>

	<entry>
		<id>http://wiki.huihoo.com/wiki/?title=Web-Harvest&amp;diff=124271&amp;oldid=prev</id>
		<title>Allen：/* Links */</title>
		<link rel="alternate" type="text/html" href="http://wiki.huihoo.com/wiki/?title=Web-Harvest&amp;diff=124271&amp;oldid=prev"/>
				<updated>2013-01-31T04:43:54Z</updated>
		
		<summary type="html">&lt;p&gt;‎&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Links&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table class='diff diff-contentalign-left'&gt;
				&lt;col class='diff-marker' /&gt;
				&lt;col class='diff-content' /&gt;
				&lt;col class='diff-marker' /&gt;
				&lt;col class='diff-content' /&gt;
			&lt;tr valign='top'&gt;
			&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;←上一版本&lt;/td&gt;
			&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;2013年1月31日 (四) 04:43的版本&lt;/td&gt;
			&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;第20行：&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;第20行：&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;==Links==&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;==Links==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;*http://web-harvest.sourceforge.net&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;*http://web-harvest.sourceforge.net&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;color: red; font-weight: bold; text-decoration: none;&quot;&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;color: red; font-weight: bold; text-decoration: none;&quot;&gt;[[category:XML]]&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;color: red; font-weight: bold; text-decoration: none;&quot;&gt;[[category:java]]&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Allen</name></author>	</entry>

	<entry>
		<id>http://wiki.huihoo.com/wiki/?title=Web-Harvest&amp;diff=38534&amp;oldid=prev</id>
		<title>2010年9月27日 (一) 00:49 Allen</title>
		<link rel="alternate" type="text/html" href="http://wiki.huihoo.com/wiki/?title=Web-Harvest&amp;diff=38534&amp;oldid=prev"/>
				<updated>2010-09-27T00:49:49Z</updated>
		
		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;table class='diff diff-contentalign-left'&gt;
				&lt;col class='diff-marker' /&gt;
				&lt;col class='diff-content' /&gt;
				&lt;col class='diff-marker' /&gt;
				&lt;col class='diff-content' /&gt;
			&lt;tr valign='top'&gt;
			&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;←上一版本&lt;/td&gt;
			&lt;td colspan='2' style=&quot;background-color: white; color:black;&quot;&gt;2010年9月27日 (一) 00:49的版本&lt;/td&gt;
			&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;第1行：&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;第1行：&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;color: red; font-weight: bold; text-decoration: none;&quot;&gt;{{top news}}&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;background: #cfc; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;color: red; font-weight: bold; text-decoration: none;&quot;&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;Web-Harvest is Open Source Web Data Extraction tool written in Java. It offers a way to collect desired Web pages and extract useful data from them. In order to do that, it leverages well established techniques and technologies for text/xml manipulation such as XSLT, XQuery and Regular Expressions. Web-Harvest mainly focuses on HTML/XML based web sites which still make vast majority of the Web content. On the other hand, it could be easily supplemented by custom Java libraries in order to augment its extraction capabilities.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;div&gt;Web-Harvest is Open Source Web Data Extraction tool written in Java. It offers a way to collect desired Web pages and extract useful data from them. In order to do that, it leverages well established techniques and technologies for text/xml manipulation such as XSLT, XQuery and Regular Expressions. Web-Harvest mainly focuses on HTML/XML based web sites which still make vast majority of the Web content. On the other hand, it could be easily supplemented by custom Java libraries in order to augment its extraction capabilities.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background: #eee; color:black; font-size: smaller;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Allen</name></author>	</entry>

	<entry>
		<id>http://wiki.huihoo.com/wiki/?title=Web-Harvest&amp;diff=17553&amp;oldid=prev</id>
		<title>2008年1月17日 (四) 12:39 Allen</title>
		<link rel="alternate" type="text/html" href="http://wiki.huihoo.com/wiki/?title=Web-Harvest&amp;diff=17553&amp;oldid=prev"/>
				<updated>2008-01-17T12:39:36Z</updated>
		
		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;新页面&lt;/b&gt;&lt;/p&gt;&lt;div&gt;Web-Harvest is Open Source Web Data Extraction tool written in Java. It offers a way to collect desired Web pages and extract useful data from them. In order to do that, it leverages well established techniques and technologies for text/xml manipulation such as XSLT, XQuery and Regular Expressions. Web-Harvest mainly focuses on HTML/XML based web sites which still make vast majority of the Web content. On the other hand, it could be easily supplemented by custom Java libraries in order to augment its extraction capabilities.&lt;br /&gt;
&lt;br /&gt;
==Features==&lt;br /&gt;
* Graphical user interface is introduced giving the environment for easier configuration development and testing.&lt;br /&gt;
* html-to-xml processor, which is based on HtmlCleaner, now exposes attributes for controlling cleaner's behaviour.&lt;br /&gt;
* Besides BeanShell scripting engine, two others are added: Groovy and JavaScript. Now it is possible to choose the favourite scripting engine or even mix them in a single Web-Harvest configuration. This option is supported by adding new attributes to config, script and template processors.&lt;br /&gt;
* Access to HTTP client is supported by introducing implicit context varibale http. Now it is possible to check important HTTP response values, like http.mimeType, http.headers, http.statusCode, or even to obtain instance of org.apache.commons.httpclient.HttpClient class with http.client and manipulate it in the runtime.&lt;br /&gt;
* New attribute cookie-policy added to the http processor, specifying the way HttpClient manage cookies.&lt;br /&gt;
* Command-line use is improved by adding several new parameters.&lt;br /&gt;
* For more comfortable use of Web-Harvest context variables in the script engines' runtime scopes, several handy methods are added to the class org.webharvest.runtime.variables.Variable (interface IVariable in previous versions of Web-Harvest).&lt;br /&gt;
* Several useful methods added in implicit Web-Harvest context variable sys, like sys.xpath(expression, xml), sys.isVariableDefined(varname) and sys.defineVariable(varName, varValue, [overwrite]).&lt;br /&gt;
* Attribute overwrite added in the ver-def processor, giving possibility to specify whether existing variables with specified name will be overwriten or not.&lt;br /&gt;
* New proccessor &amp;lt;exit condition=... message=.../&amp;gt; is introduced in order to support conditional execution break.&lt;br /&gt;
* Encoding selection in http processor is changed - if no explicitely specified with charset attribute, one given from HTTP response is used instead to read downloaded text content.&lt;br /&gt;
* NTLM proxy authentication scheme is supported.&lt;br /&gt;
* Performance improvements and bug fixes.&lt;br /&gt;
&lt;br /&gt;
==Links==&lt;br /&gt;
*http://web-harvest.sourceforge.net&lt;/div&gt;</summary>
		<author><name>Allen</name></author>	</entry>

	</feed>