Hello All,

Let me introduce myself to the list first off: My name is Matt and I am pretty experienced with HTTP and Content Relays; etc... And especially headers and things such as this.  Great to be participating in such a massive realtime collaboration on email as in an ops setting!  Neat... I would like to get more involved and likely (if the foundation accepts and can utilize my offers) to provide some ip blocks/dedicated nodes and haproxy layers etc and storage if it is needed.  I will post a seperate thread for htis.

Anyway:  Regarding this issue:

I don't think my original approach had any fans. 

What is your original approach here when you went about this?  Do you have a diagram at all or any process flow that you could guide me through or record a screenshot or youtube video of? :).

I always felt that Varnish was bloated but I haven't used it for about 2.5 years maybe since it was first created or around the time it became noticed first in the dev community world.  Do they have the UI now for Varnish?


Purges are now sent to both varnish instances per host,
What is a Purge?  Over what protocol or from which application? Or is it just a static 300 second clearing of the cache memory for this instance uri?

and more recently, the 300s ttl override was removed from the frontends.

Not sure what that means; but obviously this mix up is probably why headers and last modified times and http outputs aren't matching / syncing up right.  It's going through too many layers I feel.

But all of the purges are no-ops. 
Right.  Because they are not clearing correct or in order time right.  It's probably more of a kernal stack issue and or switch / tcp transport.  Use ettercap/tcpdump -i [interface] -vv and figure out what's happening more granularly?

There are multiple ways to approach making the purges sent to the frontends actually work

I still would like to understand what 'sending a purge to "the frontend" does.  And what a purge is and what/where or why the frontend is receiving a command statement such as this.  It sounds like you're talking about BGP/ARP stuff, but you're not, right?  (or at the minimum; DNS.)   You simply mean the cache layer functions I am assuming?

such as rewriting the purges in varnish,
Don't do that in varnish?  Whatever that is; its already having problems at the moment.  Test the issue and occurance/re-duplication of this happening and occurance over time as quickly as possible perhaps by:

rewriting them before they're sent to varnish

Bridge a squid and ICAP layer right in front of the varnish if you want to do that ^^  The squid is just to transport the ICAP layer.  You can do REQMOD and RESMOD (request and response modification) super fast and low latency and very nicely. 

depending on where they're being sent, or perhaps changing how cached objects are stored in the frontend.
What layer are cached objects stored in?  Is it something like memcached or just simple http cache?

  I personally think it's all an unnecessary waste of resources and prefer my original approach.

Hahaha! wow I just read the above line.  WHat is your original approach, Asher.



Can someone clarify for me the bug and intent here and guide me through a demo of it?

I'll help out right away.


----
+Matt Kaufman
[ops@mi2.com] [matt@mi2.com]
703-677-8901, 202-407-7998 | skype: mi2com | gchat: mkfmncom@gmail.com



-Asher

On Fri, May 3, 2013 at 2:23 PM, Arthur Richards <arichards@wikimedia.org> wrote:
+wikitech-l

I've confirmed the issue on my end; ?action=purge seems to have no effect and the 'last modified' notification on the mobile main page looks correct (though the content itself is out of date and not in sync with the 'last modified' notification). What's doubly weird to me is the 'Last modified' HTTP response headers says:

Last-Modified: Tue, 30 Apr 2013 00:17:32 GMT

Which appears to be newer than when the content I'm seeing on the main page was updated... Anyone from ops have an idea what might be going on?

Yes this is normal in HTTP World.  It doesn't really work so that is what's going on all the time.



On 5/3/2013 6:19 PM, Asher Feldman wrote:


I don't think my original approach had any fans.  Purges are now sent to both varnish instances per host, and more recently, the 300s ttl override was removed from the frontends.  But all of the purges are no-ops. 

There are multiple ways to approach making the purges sent to the frontends actually work such as rewriting the purges in varnish, rewriting them before they're sent to varnish depending on where they're being sent, or perhaps changing how cached objects are stored in the frontend.  I personally think it's all an unnecessary waste of resources and prefer my original approach.

-Asher

On Fri, May 3, 2013 at 2:23 PM, Arthur Richards <arichards@wikimedia.org> wrote:
+wikitech-l

I've confirmed the issue on my end; ?action=purge seems to have no effect and the 'last modified' notification on the mobile main page looks correct (though the content itself is out of date and not in sync with the 'last modified' notification). What's doubly weird to me is the 'Last modified' HTTP response headers says:

Last-Modified: Tue, 30 Apr 2013 00:17:32 GMT

Which appears to be newer than when the content I'm seeing on the main page was updated... Anyone from ops have an idea what might be going on?