• jaschen@lemm.ee
    link
    fedilink
    arrow-up
    1
    ·
    6 months ago

    Real question. Doesn’t the computer actually slow down when you have that much memory? Doesn’t the CPU need to seek into a bigger vast vs a smaller memory set?

    Or is this an old school way of thinking?

    • IHawkMike@lemmy.world
      link
      fedilink
      arrow-up
      5
      ·
      6 months ago

      No that’s not how it works. Handling a larger address space (e.g., 32-bit vs 64-bit) maybe could affect speed between same sized modules on a very old CPU but I’m not sure that’s even the case by any noticeable margin.

      The RA in RAM stands for random access; there is no seeking necessary.

      Technically at a very low level size probably affects speed, but not to any degree you’d notice. RAM speed is actually positively correlated with size, but that’s more because newer memory modules are both generally both bigger and faster.

      • The_Decryptor@aussie.zone
        link
        fedilink
        English
        arrow-up
        5
        ·
        6 months ago

        The RA in RAM stands for random access; there is no seeking necessary.

        Well there is, CPUs need to map virtual addresses to physical ones. And the more RAM you have the more management of that memory you need to do (e.g. modern Intel and AMD CPUs have 5 levels of indirection between a virtual and physical address)

        But it also caches those address mappings, as long as your TLB is happy, you’re happy. An alternative is to use larger page sizes (A page being the smallest amount of RAM you can address), the larger the page the less you need recurse into the page tables to actually find said page, but you also can end up wasting RAM if you’re not careful.

        • IHawkMike@lemmy.world
          link
          fedilink
          arrow-up
          2
          ·
          6 months ago

          You clearly know more than me, but wouldn’t everything from 4GB to 1TB have the same number of walks? And one more walk gets you up to 256TB?

          • The_Decryptor@aussie.zone
            link
            fedilink
            English
            arrow-up
            1
            ·
            6 months ago

            So one of the problems is the size of a “physical page”, on a stock x86 system that’s only 4KiB. If you allocate just 1MiB of RAM you need to back that with 256 “page table entries”, and to then load a virtual address within that allocation you need to walk that list of 256 entries to find the physical address in RAM that the CPU needs to request.

            Of course these days an app is more likely to use 1 GiB of RAM, that’s a mere 262,144 page table entries to scan through, on each memory load.

            Oh but then we’re also not running a single process, there’s multiple processes on the system, so there will be several million of these entries, each one indexed by address (Which can be duplicated, each process has its own private view of the address space), and then by process ID to disambiguate which entry belongs to each process.

            That’s where the TLB comes in handy, to avoid the million or so indexing operations on each and every memory load.

            But caching alone can’t solve everything, you need a smarter way to perform bookkeeping than simply using a flat list for when you don’t have a cached result. So the OS breaks down those mappings into smaller chunks and then provides a table that maps address ranges to those chunks. An OS might cap a list of PTEs at 4096 and have another table index that, so to resolve an address the CPU checks which block of PTEs to load from the first table and then only has to scan the list it points to.

            Like this, this is a 2 level scheme that Intel CPUs used before the Pentium Pro (iirc), the top 10 bits of an address selected an entry in the “page directory”, the CPU loads that and uses the next 10 bits to select the group of PTEs from that list, following that link that it finds the actual PTEs that describe the mappings and then it can scan that list to find the specific matching entry that describes the physical address to load (And it then promptly caches the result to avoid doing that again)

            So yes, for a given page size and CPU you have a fixed number of walks regardless of where the address lives in memory, but we also have more memory now. And much like a hoarder, the more space we have to store things, the more things we do store, and the more disorganised it gets. And even if you do clear a spot, the next thing you want to store might not fit there and you end up storing it someplace else. If you end up bouncing around looking for things you end up thrashing the TLB, throwing out cached entries you still need so now need to perform the entire table walk again (Just to invariably throw that result away soon after).

            Basically, you need to defrag your RAM periodically so that the mappings don’t get too complex and slow things down (Same is true for SSDs btw, you still need to defrag them to clean up the filesystem metadata itself, just less often than HDDs). Meta have been working on improvements to how Linux handles all this stuff (page table layout and memory compaction) for a while because they were seeing some of their long-lived servers ending up spending about 20% of CPU time simply wasted on doing repetitive walks due to a highly fragmented address space.

    • suicidaleggroll@lemm.ee
      link
      fedilink
      English
      arrow-up
      2
      ·
      6 months ago

      That’s a complicated question. Bigger memory can split it between more banks, which can mean more precharge penalties if the memory you need to access is spread out between them.

      But big memory systems generally use workstation or server processors, which means more memory channels, which means the system can access multiple regions of memory simultaneously. Mini-PCs and laptops generally only have one memory controller, higher end laptops and desktops usually have two, workstations often have 4, and big servers can have 8+. That’s huge for parallel workflows and virtualization.