Finding bad links with Xenu’s Link Sleuth

Transcript

If you’re like me, you’ve had websites for a while. One of the things that happens is the links start to stagnate. Some of your links don’t work. Maybe you changed servers or done things to your server. I just recently had that happen and it reminded me of this problem. So I shot this little video that’s going to show you an easy way to find these links that are not working. Then you can either change them, remove them or forward them to a working link. You can check on both internal and external links, because your external links are important as well. In my case, I transferred a blog to a new host, and the server name was different on the two servers. Then the problems began, but I found a great tool; Xenu’s Link Sleuth.

When you move servers

The new server on the host was mayuli.com and all the links had mayuli.com/jongriffin/, instead of jongriffin.com as the beginning of the URL. The plug-in that was supposed to transfer the website kind of screwed that up. So all of these links had some funny-looking names, with another weird name and I’ll show you an example of that in the video so I didn’t fix them all yet.
Bad external links

You may also have external links where you’re linking to another site. Let’s say the original post is five years old. Maybe the website you’re linking to is out of business, or the site is gone, or the page changed. What should you do? Well, the first step is to find those links and this program works for both internal and external links. I’m using a program called Xenu Link Sleuth and below this post, I’ll have a link to where you can download it. It’s free.

So to run the program, you just open it up. It’s usually at the bottom down here, Xenu Link Sleuth. Hit Xenu. You’ll get a basic window.

Go to the File, check URL and click on that. All you need to do is type in the URL to your website. It does keep track of ones you had before so you can see some of the sites I checked but we’ll just do this one. There’s an option to check external links. This will slow the program down but I’ll show it to you anyway so you can see it.

You can also include or exclude URLs. This can come in handy if you have a certain URL that you don’t want to check or a group, maybe admin or something. I generally leave everything alone unless I’m searching for something very specific. There are also some advanced options. You can decide how many threads are running. You can adjust how your reports are going to show up. Like I say, I leave everything pretty much the way it is.

All right, so let’s go ahead and run this, and go ahead and hit okay. This is going to take a while to run so I’ll come back when it’s done. You’ll notice that the threads are down at the bottom. You can also see how many URLs there are. That will change as it discovers more. But it says we’re about 33% done, now 29. So don’t worry about that. It’s going to take, I know, on this site, about 3 ½ minutes but I just want you to notice you can look down there and see how things are running. If you want to do stuff in the background, you can do that as well.

Get the results

All right, we’re done now. You can see that it pops up with a box that says link sleuth finished. Do you want to report? I’ll click that in a second but I want to bring your attention down here. You’ll notice it says, 1,272 URLs. So that’s a far cry from what it started from, as well as taking two minutes and 30 seconds. So depending on the speed of your web server and how many threads you’ve got, it can take a while.

I don’t bother with the reporting because it does an HTML report. I don’t really want to open my browser. I’d just as soon, look at it here.

Okay, assuming you hit no, you’re going to just be brought back right here and this is giving you your basic screen. So basically what you’re going to see is your address, which tells you what webpage that they were checking, whether it’s an internal or an external page. What the status is and we’ll deal with that in a minute. What the type was, whether it’s HTML, CSS, some kind of application, the size, the title if it’s a webpage, if they can extract the title out of it, the date doesn’t really matter. I believe that’s one that was checked. That’s what it looks like.

Level is how deep from the homepage. So you can see, here’s the homepage. It’s level zero. I don’t have a deep structure. I pretty much have one or two levels. It tells you how many outbound links are on the page and inbound links. Don’t think this is a program to find out how many incoming links there are. You can’t use it like that. It tells you what server there is. This line tells you it’s coming from Amazon, so that’s obviously some kind of a theme or a download or something.

The duration is how long it took to get the page, so that can give you a little bit of a head start on seeing what your server’s reacting like. What the character set is. Most of the time, you won’t care about that but it could come in handy if you find a page that’s displaying weird. A description is listed, if there is one. So you can get a little bit of information without looking at your analytics account all the time.

Sort and fix the errors

The main thing you’re going to want to use it for is to go and find the status. So you’ll notice these all highlight. That’s because they’re sortable. So let’s hit status and that’s now sorting reverse. So it’s going to go by errors, not okay. Don’t worry about it in Wikipedia. Most of the forbidden requests just means that these sites don’t accept the robot which went out and checked them.

Lexycle.com shows no connection, you might want to check. There’s obviously a link somewhere. No object data’s obviously a problem somewhere. No such host. There’s a bad link and I can see already that the address is Jon Griffin, not jongriffin.com. So let’s start with that one.

What I usually do is just go on here, right-click on the link and it says URL properties, and it shows you the title. No links are on the page, but one page links to this and that’s where your error is. It is jongriffin.com/articles/action-enforcer-supercharge.

So I can go to that page and either fix the HTML through the visual editor if I was on WordPress or the straight HTML page. Whatever you use, I’ll leave that up to you to figure out. But that will be an easy one. All you have to do is basically add .com to Jon Griffin. It would fix it.

So let’s look at another one here. So here’s one where it’s linking to my university site. For some reason, they changed the name. So on my About page, I am pointing to my department site for the hotel college and that got changed. So I know I can change that one back out to the new page.

These Amazon content things, you can look at a little later. These are mainly pictures that are, for some reason, not there anymore. So we can look and see what URL it’s pointing to. There is a gallery plug-in that’s missing some stuff, worth checking on.

The archives also seem to be missing some things. But here’s what’s really interesting in what I was telling you about.

You can see mayuli.com/jongriffin/attachment or Jon Griffin. That’s an image, so we won’t bother with that but it’s worth fixing. Let’s go to this article. This article is obviously linking back so we’ll hit the properties. So marketing calendar is linked to this Bye, Bye Google article and you can see mayuli.com needs to be taken out and jongriffin.com put in.

That’s the easy way to fix that. So that’s the main reason I use this. You can go through and fix all these. It takes a little time but you can have somebody else do it if you have a staff. That will help you because then Google will start indexing pages again that have bad links and you may be linking to interior pages like I am. These are links to other articles on your site and you want to make sure that they’re not a bunch of 404 errors that say searchers can find them. Interior links are very important for search engine discovery.

Resources

http://home.snafu.de/tilman/xenulink.html