Send the link below via email or IMCopy
Present to your audienceStart remote presentation
- Invited audience members will follow you as you navigate and present
- People invited to a presentation do not need a Prezi account
- This link expires 10 minutes after you close the presentation
- A maximum of 30 users can follow your presentation
- Learn more about this feature in our knowledge base article
Do you really want to delete this prezi?
Neither you, nor the coeditors you shared it with will be able to recover it again.
Make your likes visible on Facebook?
You can change this under Settings & Account at any time.
Transcript of soft spmvisor
- Cell Processor Architecture SPMVisor Does The Mechanisms High level Features Types of the researchs with Scratchpad Memory Allocation Policy Management Machnism Hard SPMVisor More features to be added in SPMVisor? Currently SPMVisor has Transparency Security Performance due to the reduced number of flush / loading Access List Enforcement or Secure DMA or DMA that locks region of main memory Light-weight C-like API Static Dymamic Currently SPMVisor doesn't have Predictability Low overhead Specific way of compiler supports Target temporal locality Introduction SRAM vs. DRAM The speed of SRAM is increasing by 60% a year versus only 7% a year for DRAM
- John Hennessy, Computer ARchitecture: A quantitative Approach. Cache vs. Scratchpad memory (the both are SRAM) Smaller chip area Predictable performance Low energy consumption Priority of applications should be considered Related Works spm_visor interface (App) v_spm_malloc(PID, IPA, Size, Priority, Type, ACL ) spmvisor_core return SPMVisor Software Stack v_spm_init() v_spm_deinit() spm_init() check_avail_spm(size) spmvisor_core spm_interface update_avail_spm(size) Page Table v_spm_malloc(PID, IPA, Size, Priority, Type, ACL ) DMA spmvisor_core v_spm_delete (PID, va) v_spm_delete(PID, va) spm_visor interface (App) return spm_visor interface (App) page_table_update() return spmvisor_core mapping_create(PID) check_available_spm v_spm_transfer (PID, SrcAddr, DstAddr, TxType) v_spm_transfer (PID, SrcAddr, DstAddr, TxType) spmvisor_core return v_spm_create (PID, priority, IPA, ACL) spm_visor interface (App) v_spm_create (PID, priority, IPA, ACL) build_vSPM_info_for_task (PID, priority, ACL) basic vSPM info update_available_spm (size) v_spm_delete_all(PID) spm_visor interface (App) v_spm_delete_all(PID) spmvisor_core return spm_visor interface (App) v_spm_demote(PID, va_vspm) spmvisor_core mapping_update(PID, va, pa) v_spm_demote(PID, va_vspm) check_available_spm (size) update_available_spm (size) return MMU SPMVisor Software Stack Access List v_spm_create (PID, priority, IPA, ACL) v_spm_delete(PID, va) data_transfer (PID, SrcAddr, DstAddr, TxType) v_spm_malloc(PID, IPA, Size, Priority, Type, ACL ) Alloc / Dealloc Data Transfer v_spm_delete_all(PID) Initialize register_PID () check_available_spm(size) update_available_spm (size) entry = mapping_create(PID, IPA, Size, Priority, ACL) page_table_update(entry) update_available_spm (size) delete_pt_entry(PID, va) physical SPM PEM pem_init() create_new_pem() append_pem_pool() pem_destroy() zeroing(pa) zeroing(pa) Bitmap best_fit_with_size(size) frist_fit_with_size(size) last_fit_with_size(size) init_bitmap() generate_acl (PID) update_acl (pa, acl_value) spmvisor_low mapping_create(PID, IPA, Size, Priority, ACL) mapping_update(PID, va, pa) page_table_update(entry) pt_write (PID, va, pa) pt_read (PID, va) load_to_spm(PID, SrcPa, Dst size) Priority List set_app_priority (PID, Priority) init_priority_llist () set_block_priority (PID, va, BlkPriority) Process Info register_PID (PID, priority, ACL) best_fit_with_size(size) page_table_update() loop (all the spm of PID) Compiler support to chack the available scratch pad memory through bitmap fitting mechanign to give a programmer proper layout for the requested size of spm to fill the allocated space with 0 for initialization to create a entry for relation between va (logical address of vspm) and pa (physical address of vspm) to fill the page table with the entry for the process of PID to update the available size of spm for management in bitmap to build the vSPM related information for the Task of the PID Requirements SPMVisor: The Software Stack and APIs in Sequence Diagrams MMU (Memory Management Unit) Scratchpad memory or tightly coupled memory Secure DMA Mapping Handling between VA and PA (SRAM or DRAM) Software controlled memory for secure Data transfer between the SRAM and the DRAM Scratchpad memory or tightly coupled memory Virtualizing Scratchpad memories (vSPM) with Light Weight C like APIs Guaranteeing Secure usage of vSPM Easy to use allocate, delete, promote, demote with vSPMs Transparent management for vSPM with Priority based Eviction Policy Transparent allocation of where SPM and PEM Transparent and secure transfer vSPM data between SPM(SRAM) and PEM(DRAM) Multitasking env. Priorities for app or vSPM block are determined prior to the request of allocation PID: priority: acl : va: pa: size delete_pt_entry(PID, va) spm_size_used_per_task(PID) append_app_priority(PID, Priority) append_app_priority(PID, Priority) Problem Definition IPA = 0 (soft SPMVisor) or
the address of hard SPMVisor when a program requests vSPM and no enough SPM is left, the lower priority task yields its physical SPM. vSPM (SPM or PEM) needs to be protected to access to scratchpad memory with transperant management by a logical address Data Transfer v_spm_demote(PID, va_vspm) v_spm_promote(PID, va_normal_memory) Programmer side SPMVisor side Inside of SPMVisor spm_visor interface (App) v_spm_promote(PID, IPA, Size, Priority, Type, ACL ) spmvisor_core return v_spm_promote(PID, IPA, Size, Priority, Type, ACL ) check_available_spm(size) update_available_spm (size) entry = mapping_create(PID, IPA, Size, Priority, ACL) page_table_update(entry) load_to_spm(PID, pa, size) best_fit_with_size(size) to chack the available scratch pad memory through bitmap fitting mechanign to give a programmer proper layout for the requested size of spm to move data from DRAM to SRAM to create a entry for relation between va (logical address of vspm) and pa (physical address of vspm) to fill the page table with the entry for the process of PID to update the available size of spm for management in bitmap v_spm_transfer (PID, SrcAddr, DstAddr, TxType) data_transfer (PID, SrcAddr, DstAddr, TxType) Data Management Task Info 1. Programmers should manage the scratchpad memory directly and know the size of scratchpad memory (Programming Overhead) 2. Programmers have to know the physical addressible address of the scratchpad memory (Restriction) 3. Programmers have to know how (where & how much) other programmers use the spm. (restriction) Overhead of inserting the codes 5. Other programmers still can access the used scratchpad memory area (Security Problem) Access control should be provided (ACL) in h/w or s/w (overhead & programming model changes) Programming Overhead & Restriction Sharing Problem Security Problem for the Security Problem 4. SPMVisor solves the sharing problem among processes and tasks for the Sharing Problem in multitask env. MMU will be used for access control to scratchpad memory The data of the lower priority task in the confliting area will be flushed out automatically by SPMVisor. b. Priority based Evict Policy Our Approach for the Programming Overhead & Restriction 1. SPMVisor automatically handles management of scratchpad memory. 5. SPMVisor prevents the unauthorized access to vSPM (SPM and PEM) by ACL 2. SPMVisor virtualizes scaratchpad memory and exposes a virtual address to programmers. 3. SPMVisor provides the infinite size of virtualized scartchpad memory a. vSPM allocation and Access Control SPMVisor dynamically adjust the size of the area for evicted data in PEM Address Translation Virtual Address (MMU and Page Table Control) Dynamic Allocation, Deallocation, Retrive, Transfer vSPM = SPM(limited size) + PEM(unlimited size) (Structures for SPM and PEM management) Evict Policy Management based on priorities Bitmap based total management fo scratchpad memory Access control with MMU (ACL with MMU) If not, a area of the spm can be redundently used by multiple programs (overwritten) 4. Codes for data replacement of evicted spm data should be inserted in programs, but how does it know where eviction occurs? System level simulation is required to know the when, how much e. The evicted data management Easy to share scratchpad memory among programs & Programming Model Changes Basic Management Virtual Address and Physical Address Management (MMU and Page Table Control) Targetting Global, Stack data
by Dynamic allocation DRAM DRAM 4. delete (vSPM -> x) data 3. demote (vSPM -> main memory) data vSPM 2. promote (main memory -> vSPM) X vSPM data vSPM vSPM 1. allocate (x -> SRAM) Eviction priority comparision no Available SPM left The Use cases i Requested size of spm from Programmer - vSPM Free The target cases
all the vSPM of a process 1. return vSPM to the Allocator of SPMVisor
2. update the amount of available SPMs
3. returned vSPM will be used again 1. Replace data in DRAM to vSPM
2. update the amount of available SPMs - vSPM Promote normally allocated data -> vSPM
in order to exploit temporal locality Targets
Heap data Priority comparision in Allocation Application level priority comparison benefits Efficient operations with scratchpad memory for high priority tasks Provide scratchpad memory sharing Unlimited size of vSPM Reduce the programmer's burden vSPM block level priority comparison - vSPM Demote 1. replace data in vSPM to DRAM
2. update the amount of available SPMs The target cases
Global data promoted to vSPM
Heap data promoted to vSPM The target cases
Heap data vSPM -> normally allocated data
in order to release vSPM in Allocation Evict and Data Transfer occur If lower priority task occupies physical SPM If there is no lower priroity task occupies physical SPM in the same application priority level process, lower priority block will be selected as a vimcim Assumption Data Transfer DMA operation Page Table update Issues 1. How to collaborate the software infrastructure of OS Bitmap based allocation vSPM guarantees that Page Faults doesn't occur Checker - vSPM Allocation Allocation in SPM Available SPM left The two States bit-map size, fit vSPM access MMU virtual address physicaladdress page table 1. bit-map checking 3. mapping vSPM allocation mm_struct Consideration Internal / External Fragmentation
Priority reflection Every allocation has the meta information. allocation with the requested size Page Table: va - pa mapping Page Table: va - pa mapping free with the requested size Implementation Step Pure functionalities of SPMVisor SPMVisor with OS support Integrated with OS support (as a module)
Replace the functions with OS support Allocation Manager
Page Table Management
Data Transfer with DMA spmvisor_interface spmvisor_core spmvisor_low The APIs exposed to programmers to use vSPM The Funcstions to operate vSPM The low level functions to abstract h/w Register / De-register a process to SPMVisor Allocate / de-allocate Promote / Demote Initialize / Finish SPMVisor Allocator (bit-map based) Data management Transfer between SPM and DRAM Access control Software Stack process SPMvisor process process Register Connection to SPMVisor
int v_spm_create(PID pid, int priority, ADDR IPA, enum ACL acl);
int v_spm_finish(PID pid);
Allocation & Free
int v_spm_malloc(PID pid, ADDR IPA, int Size, int priority, enum AllocType type, enum ACL acl);
int v_spm_free(PID pid, ADDR va);
Data Replacement with vSPM
int v_spm_promote(PID pid, ADDR va_normal_memory);
int v_spm_demote(PID pid, ADDR va_vspm); APIs (Programmer) for freeing all the vSPM by a process process (task info) va <-> pa, mapping
process priority in allocation in Registering build meta information per every vSPM allocation (SPMVisor) System Architecture app1 app2 app3 Soft SPMVisor vSPM info physical SPM PEM spm_interface Page Table Manager page table for app1 page table for app2 page table for app3 System Level Application Level H/W Level SPM main memory PEM MMU vSPM Manager Allocation Retrieve Eviction Etc Transfer Priority & Access Control DRAM -> SPM SPM -> DRAM in cases of Promote, Demote, Evict vSPM Management vSPM allocation vSPM access vSPM status check with size, priority Page Table Mapping Scratchpad memory or DRAM MMU refers to the page table mapping vSPM retrieve vSPM status update with a retrieved vSPM (SPM or DRAM) when no available spm left and priority comparison occurs vSPM evict DATA moved between scratchpad memory and DRAM by DMA transfer Sharing Strategy based on priority Higher priorty of tasks have higher probability to be resided in SPM When vSPM allocation is requested, where to be loaded in SPM or DRAM is decided OS integration Assumption DMA transfer Page Table Access vSPM considered as system memory seperating SPM management from a decision of allocations Sharing Strategy Dynamic management of SPM Programming Model Target Memory types Dynamic allocation
Global pointer (create)
Stack pointer (create)
Global normal (create or promote)
Stack normal (create or promote) mapping creation (vspm_create) mapping update (trasfer: promote, demote) vSPM allocation in DRAM is integrated with System memory management OSes generally has APIs for DMA transfer 2. allocating Some functions only can be done by OS.
I will find the exact API in Linux for the purposes. i.e., Demote, Evict avail_spm += 1 avail_dram -= 1 to arbitrate for using SPM with limited size of scratchpad memory vSPM create register to SPMVisor for vSPM SPM Sharing in multitasking environment. Performance Limitation of Uniprocessor Speeding up cpu clock Larger heat
Larger power consumption Multicore is getting popular. Improvement at the same performance Less heat
Less power consumption Coherence problem in memory hierarchy with scalability Caches require coherence protocol Coherency Problem Figure a Need of Scracthpad memory Cache is like a buffer
buffer and original data is different, buffer has its own modified value
synchronization is required for many caches Problem Coherency Problem with scratchpad memory Figure Data in Scratchpad memory is always synchronized amont multicore. The benefits of Scratchpad memory against cache due to no tag-checking because scratchpad memory is a software controlled memory. Scalable Memory Development speed Definition
Layout of data doesn't change during execution Definition
Layout of data change during execution for exploting locality Embedded System cache
Transparent to software
longer latency due to cache miss and off chip access spm
Software controlled memory
Single cycle access lantency How to place data into scratchpad memory?
which data? Decision
at compiler time (before execution) Decision
dynamic profiled information (during execution) Limitation
source code is pre-avaiable
not suitable to open platform
only track static data flow
not fit to dynamic behavior
sharing scratchpad memory in multitasking env.
can't know when context switch occurs Overhead
by profiling Pro.
fit to open platform
reflect to dynamic behavior P. Francesco, P. Marchal, D. Atienza, L. Benini, F. Catthoor, and J. M. Mendias, “An integrated hardware/software approach for run-time scratchpad management,” Proceedings of the 41st annual Design Automation Conference, pp. 238-243, 2004. Dynamic Management in multitasking env.
focus on reduce cost of DATA transfer between SPM and DRAM
high level APIs for DMA data transfer
compared with cache based architecture, explicit memory copy by cpu B. Egger, J. Lee, and H. Shin, “Scratchpad memory management in a multitasking environment,” Proceedings of the 8th ACM international conference on Embedded software, pp. 265-274, 2008. Dynamic Management by SPM Manager (SPMM) in multitasking env.
Working set based allocation by post optimizer
Binary modification by the post optimizer
Modification in OS (loader, scheduler, page faults handler)
On-demand loading data into spm in cases of page faults (multitasking env.) The Final Goal:
OS level scratch pad dynamic management Issues for using Scratchpad Memory Assumed Architecture Problem 1 Optimal data placement based-on priorities Problem 2 Virtualization with Access control is required Trusted Application Execution Problem 3 High level APIs are required for use of SPM Tranparent Management
for upper layers of software stack 1. A task requests spm more than the spm currently have 2. A wise allocation strategy is required considering fregmentation When is transparent management needed? requested size larger than the physical spm size in a platform requested size larger than the physical spm size left performance factors Problem 4 Allocation A allocation strategy
fitting mechanism vSPM information (size)
DRAM (main memory) Virtualizing Scratchpad memory virtual address <-> physical address mapping
DRAM (main memory) Retrieve Data Transfer Evict Management of vSPM system wide overhead by data transfer + : faster and fixed access latency - : data transfer cost # * (cache speed - spm speed) > data transfer cost the total benefit what if # * (cache speed - spm speed) - data transfer cost Security Transparent dynamic management for scratchpad memory Considering multitasking environment Fast and lightweight C like APIs data copy with DMA
page table entry update Priority based a type of priority can be
performance, security, real-time priority in case of no spm left to allocate SPMVisor Transparent dynamic management for scratchpad memory Fast and lightweight C like APIs Virtualizing scratchpad memory Considering multitasking environment software stack Management of vSPM (Allocation, Security, etc...) page table entry update loader: additional info by post-optimizer is inserted to binary
scheduler: notify SPMM redistribute SPM in the five cases.
page fault handler: on-demand load Comparison create, finish, schedule, status change (ready->run) SPMVisor SPMM allocation complexity and speed simple and fast wait until avail SPMs left reflection of apps priority considered not considered kernel modification frequency info yes yes loader
page fault hanlder only the part to update DRAM status changes by evicted memory Working set based allocation by post optimizer Modification in OS (loader, scheduler, page faults handler) Binary modification by the post optimizer cluster the data into a page unit of MMU