www.Inmagic.com    Inmagic Forums    Inmagic Forums  Hop To Forum Categories  Gatherer    Errors in Logfile
Go
New
Find
Notify
Tools
Reply
  
-star Rating Rate It!  Login/Join 
GTB
Posted
Hello,
Whenever I run the spider manually, it seems to run as normal. After around 5 minutes though, it stops. When I look in the log file for an indication as to why, it gives me the error for each .htm file it tries to import:

Spider error: Host not responding.


Any ideas why?

Thanks
 
Posts: 19 | Location: Manchester, UK | Registered: Fri October 13 2000Reply With QuoteEdit or Delete MessageReport This Post
Posted Hide Post
The spider hasn't even gotten to the import stage yet--Host not responding means that the spider can't even get to the page. Check that your initial URL and your domain list are correct (no typos, etc., and the domain list should just be www.domain.com--no http://, but the initial URL should have http://).

This could also be a permissions problem if there is a proxy server between the machine with the spider and the machine(s) you're trying to spider. You might need to set the logon setting under NT services, so that it's using an account with correct permissions instead of the system account.
 
Posts: 181 | Location: Boston, MA | Registered: Thu July 13 2000Reply With QuoteEdit or Delete MessageReport This Post
GTB
Posted Hide Post
What I don't understand then is, if the spider can't reach the page, how is it managing to import the new documents into the textbase I specify? Seemingly, for every document I get the Host not responding message, but even so it is still imported.

** Sorry, not entirely true what I say above. I set the spider to check all documents, rather than ones it had already done so maybe this is why it was coming up with these errors, as there could be a fair few dead links on our Intranert. I was wondering though, do successful imports of documents get a mention in the log?

quote:
Originally posted by rachel:
The spider hasn't even gotten to the import stage yet--Host not responding means that the spider can't even get to the page. Check that your initial URL and your domain list are correct (no typos, etc., and the domain list should just be www.domain.com--no http://, but the initial URL [b]should have http://).

This could also be a permissions problem if there is a proxy server between the machine with the spider and the machine(s) you're trying to spider. You might need to set the logon setting under NT services, so that it's using an account with correct permissions instead of the system account.[/B]




[This message has been edited by GTB (edited 02 February 2001).]
 
Posts: 19 | Location: Manchester, UK | Registered: Fri October 13 2000Reply With QuoteEdit or Delete MessageReport This Post
Posted Hide Post
quote:
Originally posted by GTB:
I set the spider to check all documents, rather than ones it had already done so maybe this is why it was coming up with these errors, as there could be a fair few dead links on our Intranert. ]


Technically there's no setting to get it to only re-check the ones you have already spidered. You can get it to skip the ones you've already spidered if unchanged, but it will still get new stuff in the specified locations, or it will redo everything, every time.

If you have a theory about the broken links being the cause, you could use the spider monitor and check the broken links button, to see. I have a feeling though it's likely you'd get a different message though. Something like: Spider error - HTTP error 404 File not found, URL: .

In my spider log I have both Host Not Responding and 404 File not found. The latter for broken links and the former for links that are invalid. Since it does write the URL into the log, you could copy and paste a few of them into a web browser to see what happens. If it's a dead site, you should get a Host Not Responding message in a web browser too. If you get a site, then it may have just been down when you spidered. If there are a lot of them though, and they all work through the web browser, then it may be that the spider didn't have the right access to spider those sites (ie, like my example in the first reply).

quote:
I was wondering though, do successful imports of documents get a mention in the log?


Which documents, no. I guess it's sort of assumed that if it was successfully spidered, then it would also have been imported. An exception might be if the time the spider is supposed to be running ends, and it hasn't finished importing everything. But the spider would resume where it left off the next time it is set to run. (Though if you're running it manually, you might be discarding these changes the next time you start a cycle. This is probably only likely if it takes a long time to spider everything and you don't realise it is taking hours/days/whatever).

Another way to compare is with the spider monitor--it will say how many documents were spidered, and how many were imported. It will update itself while the spider is running and give you a tally of each. It can spider/crawl faster than it can convert and import, so expect the numbers not to match up until the end.

OR, you could turn the textbase log on (in the textbase, Maintain>Edit textbase structure) to see this kind of information. I usually only use the monitor, but the textbase log is more handy for individual records.

------------------
Rachel
 
Posts: 181 | Location: Boston, MA | Registered: Thu July 13 2000Reply With QuoteEdit or Delete MessageReport This Post
GTB
Posted Hide Post
Thats great, thanks for the help.

Just one more question. I've noticed that on some of the links that have been imported, the links have been imported as, for example:

D:\Intranet\wwwroot\Working\tis\wr\0Wren974.htm

Rather than the correct way:
http://planet.bdp.co.uk/it/antivirus/0viruses.htm

The same spider cycle has imported these differing links, when they all should be in the second format detailed.

I guess (I hope) that this is probably some error I've made when configuring the spider cycle, but could it be something more sinister?
 
Posts: 19 | Location: Manchester, UK | Registered: Fri October 13 2000Reply With QuoteEdit or Delete MessageReport This Post
Posted Hide Post
"Something sinister" *laugh*.

How the spider makes URLs, File URLs, or Logical File paths, is based on how you entered the information in the spider dialog. I'll give a few examples in case I'm not explaining this well.

I have a machine I want to spider and people get to it via the web using "www.inmagic.com", so I enter that into the "domains to spider" section and give an initial URL of http://www.inmagic.com/
All the records will have http://www.inmagic.com/blah.htm in the URL field.

This same machine can also be known as secure.inmagic.com, meaning that www.inmagic.com and secure.inmagic.com can be used interchagably to get to the same files. The spider though, doesn't necessarily know that they're the same machine with the same content. So if I also add secure.inmagic.com to my "domains to spider" list, I will actually get all the same pages spidered again, but this time with http://secure.inmagic.com/blah.htm in the URL field.

This same machine might also have an internal name, say "webserver3". Internal users can get to it by typing http://webserver3/. I can put this in my domains to spider list too, but then I'll get the same files in triplicate.

Since this is also an internal machine, it means the spider has file access to it. If I've checked off "spider files in specified file system directories" on the first dialog of the wizard, then the spider will do a file crawl as well as go through what is web accessible. If the spider is on the same machine as "webserver3", then the files would be in c:\inetpub\wwwroot\ and I'd probably put that in the directories dialog. This means that I'll then get the same files a 4th time, but this time in the URL field it will say "C:\inetpub\wwwroot\blah.htm".

That file path is only useful if you're on the webserver console, because every computer has a c:\. So--as a slight tangent--to make this file path useful for anyone, I should instead enter the directory name as a UNC: \\webserver3\c\inetpub\wwwroot\
(where "c" is the share name for that drive).
Then the URL will look like this: \\webserver3\c\inetpub\wwwroot\blah.htm and will actually work from any internal computer when clicked on.

Now, while all of the above work, they are more than redundant. Once I've decided what information I want to spider, I only need to give the Spider one set of criteria to get there. Which one I decide on will depend a bit on who my audience will be (eg, internal only, etc.).


So it sounds like a few things might be going on--you've specified redundant sets of criteria for the same information, and--if you actually wanted file paths in the first place--you've not made the file paths universal so that any machine can understand them, no matter their drive mappings.

A note here: if you change your spider settings, it won't go and undo what you've already done, so some of the mistakes or duplicate records may still be in the textbase. It might be fastest to delete all the records, or start with a clean textbase (if you want to compare before and after (and even a batch delete can take awhile if the textbase is big)).


Hope I've sufficiently answered your question.


------------------
Rachel
 
Posts: 181 | Location: Boston, MA | Registered: Thu July 13 2000Reply With QuoteEdit or Delete MessageReport This Post
GTB
Posted Hide Post
Thanks for the help. Much appreciated. Had another look at what I'd set and did some tinkering. Although you dispelled my fears of anything sinister at work previously, I've a new even more terrible error message in the log file:

2/6/2001 10:56:04 AM: Spider error - Exception in InternetCrackUrl(), URL:


Can you tell me what's going wrong here?

Thanks
 
Posts: 19 | Location: Manchester, UK | Registered: Fri October 13 2000Reply With QuoteEdit or Delete MessageReport This Post
Posted Hide Post
Weird. I've never seen that message anywhere before. I'll have to do a little research to see if I can come up with anything.. I'll keep you posted.
 
Posts: 181 | Location: Boston, MA | Registered: Thu July 13 2000Reply With QuoteEdit or Delete MessageReport This Post
Posted Hide Post
I'm sorry, but I haven't been able to find a reference for that specific error message, even in places that detail error codes and their more user friendly versions.

Just for yucks I threw the error into some major search engine. (hmm.. now i'm wondering if they use that expression in england. don't remember). anyway, while the results didn't give me any written explanation of the error, it did produce some pages where the error _occured_ and was recorded by the search engine spider. Here are some of the words around the error message. It gives a little more context.

... for the information to be retrieved. InternetCrackUrl parses a URL string into ... a server.
AfxThrowInternetException throws a memory exception, such as a call to ...


... ANSI and Unicode Strings. Error Checking. Exception Handling and Page Faults. Conclusion.
2 ... a Session. Cracking the URL InternetCrackUrl. Connecting to a Server ...


Not that either of those are any less scary.

Have you tried taking the URL listed after the error message and pasting it into a web browser to see what you get? It might come up fine, or it might display this message again (perhaps even with 'instructions' as are sometimes given). If it doesn't display, then you can st least go poking about to see why this particular page won't display. My guess is that there's a connection problem--either intermittent, like a random blip, or a permanent connecton problem that actually requires intervention before it will work again.

I can see being concerned if your log is full of these errors, but a few might not be an issue if the site is big. If there is more than one you might look for a pattern.

In general, if you get an odd error in the log, just take the URL and try it in a web browser. The spider is almost like a robot clicking through a website. You want to put the human element back in when you're troubleshooting.

Also, you could search for that URL in the database to see if it was actually spidered. That might be a scary message generated by the webserver, even though things seem ok to the 'user'.

HTH
 
Posts: 181 | Location: Boston, MA | Registered: Thu July 13 2000Reply With QuoteEdit or Delete MessageReport This Post
GTB
Posted Hide Post
Just for Yucks...can't say I've heard it over here before, but I can guess what you mean! Suppose I'd say for the fun of it....

Thanks for the help. I thought it best to include a section of my logfile, so you can see how bad things are? As detailed below, your suggestion of trying to see if an URL logged after the Spider exception error works isn't really applicable:

2/6/2001 10:37:23 AM: Master 3.0 started
2/6/2001 10:37:34 AM: Master(1) Started cycle 15
2/6/2001 10:37:35 AM: HTTP Spider 3.0 started
2/6/2001 10:37:35 AM: File Crawler 3.0 started
2/6/2001 10:37:35 AM: Importer 3.0 started
2/6/2001 10:37:37 AM: Document Converter 3.0 started
2/6/2001 10:37:39 AM: Importer(1) Start processing textbase 'D:\webpub\textbase\spider2\respider'
2/6/2001 10:37:40 AM: File Crawler user stopped processing
2/6/2001 10:37:40 AM: File Crawler stopped
2/6/2001 10:37:43 AM: Spider error - HTTP error 404 File not found, URL: http://planet.bdp.co.uk/project/m&s/ms.htm
2/6/2001 10:37:44 AM: Spider error - HTTP error 404 File not found, URL: http://planet.bdp.co.uk/Help/planet.htm
2/6/2001 10:37:45 AM: Spider error - HTTP error 404 File not found, URL: http://planet.bdp.co.uk/dbtw-wpd/faq.htm
2/6/2001 10:39:00 AM: Spider error - HTTP error 404 File not found, URL: http://planet.bdp.co.uk/_vti_bin/header.htm
2/6/2001 10:39:40 AM: Spider error - HTTP error 404 File not found, URL: http://planet.bdp.co.uk/_vti_bin/Header.htm
2/6/2001 10:49:30 AM: Master(1) stopped
2/6/2001 10:49:30 AM: Importer(1) Stop processing textbase 'D:\webpub\textbase\spider2\respider'
2/6/2001 10:49:30 AM: Importer user stopped processing
2/6/2001 10:49:30 AM: Importer stopped
2/6/2001 10:49:30 AM: Document Converter user stopped processing
2/6/2001 10:49:30 AM: Document Converter stopped
2/6/2001 10:49:35 AM: Cycle 15 did not complete by specified stop time. It will resume processing at next start time.
2/6/2001 10:49:36 AM: Master user stopped processing
2/6/2001 10:49:36 AM: Master stopped
2/6/2001 10:49:44 AM: Master 3.0 started
2/6/2001 10:49:44 AM: HTTP Spider 3.0 started
2/6/2001 10:49:44 AM: Importer 3.0 started
2/6/2001 10:49:45 AM: File Crawler 3.0 started
2/6/2001 10:49:45 AM: Document Converter 3.0 started
2/6/2001 10:49:46 AM: Spider error - HTTP error 404 File not found, URL: http://planet.bdp.co.uk/_vti_bin/manc/aa/announce.htm
2/6/2001 10:49:47 AM: Spider error - HTTP error 404 File not found, URL: http://planet.bdp.co.uk/_vti_bin/lond/aa/Announc2.htm
2/6/2001 10:49:47 AM: Spider error - HTTP error 404 File not found, URL: http://planet.bdp.co.uk/_vti_bin/project/m&s/ms.htm
2/6/2001 10:49:48 AM: Importer(1) Start processing textbase 'D:\webpub\textbase\spider2\respider'
2/6/2001 10:49:49 AM: File Crawler user stopped processing
2/6/2001 10:49:49 AM: File Crawler stopped
2/6/2001 10:50:06 AM: Spider error - HTTP error 404 File not found, URL: http://planet.bdp.co.uk/_vti_bin/Header.htm
2/6/2001 10:52:33 AM: Spider error - HTTP error 404 File not found, URL: http://planet.bdp.co.uk/_vti_bin/Pep/Business/BPWin/PEPBusiness6.7.htm
2/6/2001 10:53:11 AM: Spider error - HTTP error 404 File not found, URL: http://planet.bdp.co.uk/project/m&s/ms.htm
2/6/2001 10:53:48 AM: Spider error - HTTP error 401 Authorization failed, URL: http://planet.bdp.co.uk/Pep/Business/Confidential/PEPBusiness4.htm
2/6/2001 10:53:49 AM: Spider error - HTTP error 404 File not found, URL: http://planet.bdp.co.uk/Pep/Business/Confidential/PEPBusiness5.htm
2/6/2001 10:53:51 AM: Spider error - HTTP error 401 Authorization failed, URL: http://planet.bdp.co.uk/Pep/Business/Confidential/PEPBusiness4.htm
2/6/2001 10:53:51 AM: Spider error - HTTP error 401 Authorization failed, URL: http://planet.bdp.co.uk/Pep/Business/Confidential/PEPBusiness3.htm
2/6/2001 10:54:24 AM: Spider error - HTTP error 404 File not found, URL: http://planet.bdp.co.uk/working/proc/qp/qpindex.htm
2/6/2001 10:54:27 AM: Spider error - HTTP error 404 File not found, URL: http://planet.bdp.co.uk/handbook/S&W/pt.htm
2/6/2001 10:54:28 AM: Spider error - HTTP error 404 File not found, URL: http://planet.bdp.co.uk/handbook/S&W/ql.htm
2/6/2001 10:54:33 AM: Spider error - HTTP error 404 File not found, URL: http://planet.bdp.co.uk/handbook/S&W/pt.htm
2/6/2001 10:54:33 AM: Spider error - HTTP error 404 File not found, URL: http://planet.bdp.co.uk/handbook/S&W/ql.htm
2/6/2001 10:55:01 AM: Spider error - HTTP error 404 File not found, URL: http://planet.bdp.co.uk/glas/Envelope/envelope.htm
2/6/2001 10:55:53 AM: Spider error - HTTP error 401 Authorization failed, URL: http://planet.bdp.co.uk/Pep/Business/Confidential/PEPBusiness3.htm
2/6/2001 10:55:53 AM: Spider error - HTTP error 401 Authorization failed, URL: http://planet.bdp.co.uk/Pep/Business/Confidential/PEPBusiness2.htm
2/6/2001 10:55:53 AM: Spider error - HTTP error 401 Authorization failed, URL: http://planet.bdp.co.uk/Pep/Business/Confidential/PEPBusiness1.htm
2/6/2001 10:55:56 AM: Spider error - HTTP error 401 Authorization failed, URL: http://planet.bdp.co.uk/Pep/Business/Confidential/PEPBusiness2.htm
2/6/2001 10:55:56 AM: Spider error - HTTP error 401 Authorization failed, URL: http://planet.bdp.co.uk/Pep/Business/Confidential/PEPBusiness1.htm
2/6/2001 10:56:04 AM: Spider error - Exception in InternetCrackUrl(), URL:
2/6/2001 10:56:04 AM: Spider error - Exception in InternetCrackUrl(), URL:
2/6/2001 10:56:04 AM: Spider error - Exception in InternetCrackUrl(), URL:
2/6/2001 10:56:04 AM: Spider error - Exception in InternetCrackUrl(), URL:
2/6/2001 10:56:04 AM: Spider error - Exception in InternetCrackUrl(), URL:
2/6/2001 10:56:04 AM: Spider error - Exception in InternetCrackUrl(), URL:
2/6/2001 10:56:04 AM: Spider error - Exception in InternetCrackUrl(), URL:
2/6/2001 10:56:04 AM: Spider error - Exception in InternetCrackUrl(), URL:
2/6/2001 10:56:04 AM: Spider error - Exception in InternetCrackUrl(), URL:
2/6/2001 10:56:04 AM: Spider error - Exception in InternetCrackUrl(), URL:
2/6/2001 10:56:04 AM: Spider error - Exception in InternetCrackUrl(), URL:
2/6/2001 10:56:04 AM: Spider error - Exception in InternetCrackUrl(), URL:
2/6/2001 10:56:04 AM: Spider error - Exception in InternetCrackUrl(), URL:
2/6/2001 10:56:04 AM: Spider error - Exception in InternetCrackUrl(), URL:
2/6/2001 10:56:04 AM: Spider error - Exception in InternetCrackUrl(), URL:
2/6/2001 10:56:04 AM: Spider error - Exception in InternetCrackUrl(), URL:
2/6/2001 10:56:04 AM: Spider error - Exception in InternetCrackUrl(), URL:
2/6/2001 10:56:04 AM: Spider error - Exception in InternetCrackUrl(), URL:
2/6/2001 10:56:04 AM: Spider error - Exception in InternetCrackUrl(), URL:
2/6/2001 10:56:04 AM: Spider error - Exception in InternetCrackUrl(), URL:
2/6/2001 10:56:04 AM: Spider error - Exception in InternetCrackUrl(), URL:
2/6/2001 10:56:04 AM: Spider error - Exception in InternetCrackUrl(), URL:
2/6/2001 10:56:04 AM: Spider error - Exception in InternetCrackUrl(), URL:
2/6/2001 10:56:04 AM: Spider error - Exception in InternetCrackUrl(), URL:
2/6/2001 10:56:04 AM: Spider error - Exception in InternetCrackUrl(), URL:
2/6/2001 10:56:04 AM: Spider error - Exception in InternetCrackUrl(), URL:
2/6/2001 10:56:04 AM: Spider error - Exception in InternetCrackUrl(), URL:
2/6/2001 10:56:04 AM: Spider error - Exception in InternetCrackUrl(), URL:


Time to panic?!?
 
Posts: 19 | Location: Manchester, UK | Registered: Fri October 13 2000Reply With QuoteEdit or Delete MessageReport This Post
Posted Hide Post
I'm not sure it's time to panic, but I think the little square is a big clue. That should be where the URL is. Instead, there's some binary character. Lisa thinks it might be a carriage return in front of the URL. Lisa's verified that the InternetCrackURL() error is the one you get when there's a bad problem with the URL that would mean the spider couldn't process it. Other than hunting by hand I can't figure out at the moment how to go and see where it is exactly that the problem starts and check the URLs from that page to see what it is that's weird about them.
 
Posts: 181 | Location: Boston, MA | Registered: Thu July 13 2000Reply With QuoteEdit or Delete MessageReport This Post
GTB
Posted Hide Post
Apart from the InternetCrack problem, I wonder if you could tell me why I get the followng message:
2/6/2001 10:54:28 AM: Spider error - HTTP error 404 File not found, URL: http://planet.bdp.co.uk/handbook/S&W/ql.htm
2/6/2001 10:54:33 AM: Spider error - HTTP error 404 File not found, URL: http://planet.bdp.co.uk/handbook/S&W/pt.htm

For these two URL's, but infact they both work fine?
 
Posts: 19 | Location: Manchester, UK | Registered: Fri October 13 2000Reply With QuoteEdit or Delete MessageReport This Post
Posted Hide Post
They don't work fine for me. I get cannot find server instantly... no half minute lookup or anything. that's because a dns lookup says "planet.bdp.co.uk: Non-existent host/domain"

you may have some sort of access to that machine that the rest of us don't. is 'planet' an internal only name, for instance?
 
Posts: 181 | Location: Boston, MA | Registered: Thu July 13 2000Reply With QuoteEdit or Delete MessageReport This Post
Posted Hide Post
Consider the above still, but I looked closer at your log and saw there were actual authorisation failures on that host, but some, like the two you listed, were file not found. Perhaps the ampersand is throwing it off. I'm not entirely certain that using an ampersand is a valid web naming convention--just like a space isn't. Most browsers will concede and give you the page anyway, but usually the ampersand is used in a search string. Did the entry near the top of the posted log with the directory name m&s go into the textbase ok?
 
Posts: 181 | Location: Boston, MA | Registered: Thu July 13 2000Reply With QuoteEdit or Delete MessageReport This Post
GTB
Posted Hide Post
I'm glad you can't access planet.bdp.co.uk as it's our Intranet site!

As for the ampersand, would even an ampersand in the name of a directory throw it off-course? I've seen it before where it would cause problems in an htm file name, as well as a space as you mention. I guess it could cause problems though.

Just looked and it appears that it isn't in the textbase, although the page does actually exist. Is this the fault of the ampersand? Would something like this though prevent the Spider from running completely do you think?
 
Posts: 19 | Location: Manchester, UK | Registered: Fri October 13 2000Reply With QuoteEdit or Delete MessageReport This Post
Posted Hide Post
quote:
Originally posted by GTB:
I'm glad you can't access planet.bdp.co.uk as it's our Intranet site!


Oh good, then. Smile

quote:
As for the ampersand, would even an ampersand in the name of a directory throw it off-course?


Unknown. The trouble, when there are no definitive standards in place saying 'you can't do that'.

quote:
Is this the fault of the ampersand? Would something like this though prevent the Spider from running completely do you think?


The ampersand seems to be a commonality, at least. The spider however, should probably just skip over the URL and move on. The long string of URLs that's producing the InternetCrackURL() error might be causing the spider to stop--because it runs out of places to go if it can't get to those pages.
 
Posts: 181 | Location: Boston, MA | Registered: Thu July 13 2000Reply With QuoteEdit or Delete MessageReport This Post
GTB
Posted Hide Post
Thanks for the help, that gives me something else to look at and mess around with.

Another thing though. I've just tried running the spider again now. In the past, after around 5 minutes or so, it would crash with an access violation error and so the service would be terminated. Now though, it still thinks that it is running, but as I look at whats going on using the Spider monitor, nothing is being imported or rejected or anything. In other words, it's come to a grnding halt but without any indication that something has gone wrong......

After what you have said that concerning the bad URL's, I shall see what I can remedy here before moaning some more.

Thanks again

quote:
Originally posted by rachel:
The ampersand seems to be a commonality, at least. The spider however, should probably just skip over the URL and move on. The long string of URLs that's producing the InternetCrackURL() error might be causing the spider to stop--because it runs out of places to go if it can't get to those pages.
 
Posts: 19 | Location: Manchester, UK | Registered: Fri October 13 2000Reply With QuoteEdit or Delete MessageReport This Post
Posted Hide Post
quote:
Originally posted by GTB:
Now though, it still thinks that it is running, but as I look at whats going on using the Spider monitor, nothing is being imported or rejected or anything


So the service dialog says it's running but you see no activity in the monitor? I'm not surprised it's not importing anything, because it's not successfully spidering any of those pages (cause it can't get to them). It sort of has a list of links that it can follow, so it may still be trying all those links before it gives up or determines that it's finished. In the monitor, the number of files spidered should jump up real quick, and then the number converted and then imported, should follow at a slower pace. So if you have 0 files spidered you should have 0 files imported. At least, when the spider finishes those numbers should be even or all the numbers should match up somehow.

You don't have a fun situation, but at least it's not an utter black box. Good luck.
 
Posts: 181 | Location: Boston, MA | Registered: Thu July 13 2000Reply With QuoteEdit or Delete MessageReport This Post
GTB
Posted Hide Post
It has been in the state below now for around 50 minutes with no change:

Cycle started: 2/14/2001 16:11

Cycle status: in progress

Services running: yes

Unchanged Documents (skipped): 0

HTML Documents spidered: 179

Non-HTML Documents spidered: 0

Inaccessible HTTP links: 0

Broken HTTP links: 4

Documents imported: 179

Documents rejected: 0


I guess I'll leave it to it then, see what happens and then in the mean time sort out the possible bad URL's

** Infact, when I look into the actual logfile, the last URL it mentions it has a problem with it:

2/14/2001 4:23:50 PM: Spider error - HTTP error 404 File not found, URL: http://planet.bdp.co.uk/Year2K/Windows NT Server 4-0.htm

And as we've talked about, the spaces in the .htm filename could be all at fault.



[This message has been edited by GTB (edited 14 February 2001).]
 
Posts: 19 | Location: Manchester, UK | Registered: Fri October 13 2000Reply With QuoteEdit or Delete MessageReport This Post
 Previous Topic | Next Topic powered by eve community  
 

www.Inmagic.com    Inmagic Forums    Inmagic Forums  Hop To Forum Categories  Gatherer    Errors in Logfile