|
Haven't registered yet? Do it here now!
|
|
|
|
unc0nnected
Joined: 17 Dec 2010 Posts: 27
|
|
|
|
|
|
|
Posted: Fri Mar 11, 2011 2:23 am Post subject: BA crawling feed but not picking up 100% of the posts |
|
|
|
|
|
|
|
|
|
|
I have BA set to crawl several RSS feeds from one of my wordpress blogs. Now each feed has exactly 400 posts in it but when I get BA to crawl them it never picks up 400, it always picks up some random amount ranging from 50 up to 258.
If I go back and recrawl the same feed it comes up with nothing new.
The interesting thing is that it find the exact (wrong) same amount of items when I empty the feed and recrawl it. So I opened the feed up in firefox, copied everything into quanta and stripped everything except the title and sure enough I was left with 400 lines. I went even further and removed all items with identical titles just incase BA was filtering out what it thought were duplicates and I was still left with 308.
So for some reason BA is only seeing 50 in this instance when there are 400, or 263 or whatever the number is. It's always the same number and it's always wrong!
Any idea? |
|
|
|
|
|
|
|
unc0nnected
Joined: 17 Dec 2010 Posts: 27
|
|
|
|
|
|
|
Posted: Fri Mar 11, 2011 4:41 am Post subject: |
|
|
|
|
|
|
|
|
|
|
Ok I just changed the pemalink syntax from /%monthnum%/%postname%/ to wordpress/?p=123 and I am getting a Parser error instead now that says the following:
Parser Error: Undeclared entity error at line 9, column 57
but I then pointed BA to the RSS feed(not the RSS2 feed) and now it is crawling again however still with the same wrong amount. 107 posts pulled when there are 400 and then I crawl the next feed and it finds 68 items out of 400.
Thanks |
|
|
|
|
|
|
|
Atanasis Owner
Joined: 22 May 2004 Posts: 4284 Location: The Net
|
|
|
|
|
|
|
Posted: Fri Mar 11, 2011 10:16 am Post subject: |
|
|
|
|
|
|
|
|
|
|
are your posts with unique titles? As BA is skipping posts within same feed with same title.. _________________ Thanks,
Kaktusan
|
|
|
|
|
|
|
|
unc0nnected
Joined: 17 Dec 2010 Posts: 27
|
|
|
|
|
|
|
Posted: Fri Mar 11, 2011 10:20 pm Post subject: |
|
|
|
|
|
|
|
|
|
|
Yep, as I mentioned above I ever went out and deleted all duplicate titles to count the number of uniquely titled posts there were and it was still way way higher than the number of entries BA was pulling in(as in 308 uniquely titled posts versus less than 100 entries reported by BA)
I'll PM you some more info |
|
|
|
|
|
|
|
unc0nnected
Joined: 17 Dec 2010 Posts: 27
|
|
|
|
|
|
|
Posted: Fri Mar 11, 2011 11:17 pm Post subject: |
|
|
|
|
|
|
|
|
|
|
Is there any way to disable that feature where BA only grabs uniquely titled Posts? Because I have 1 post per restaurant and there might be 20 McDonalds in a city but all of the posts are titled McDonalds and then inside the post it has the address and info for each of them. Like a phone book listing. |
|
|
|
|
|
|
|
Atanasis Owner
Joined: 22 May 2004 Posts: 4284 Location: The Net
|
|
|
|
|
|
|
Posted: Sat Mar 12, 2011 11:58 am Post subject: |
|
|
|
|
|
|
|
|
|
|
replied to your pm..
no, there is no way to turn off that, as by that is determined if a post is already downloaded from a feed or not.. _________________ Thanks,
Kaktusan
|
|
|
|
|
|
|
|
unc0nnected
Joined: 17 Dec 2010 Posts: 27
|
|
|
|
|
|
|
Posted: Sun Mar 13, 2011 1:08 pm Post subject: |
|
|
|
|
|
|
|
|
|
|
Thank you for your help, it definitely looks to be caused by duplicate posts, that sucks that uniq was reporting incorrect values.. I've changed the title structure of my posts now and it seems to have resolved this.
Thanks again |
|
|
|
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2002 phpBB Group
|
|
|
|
| |