HSET kontra użycie pamięci SET?

Przeczytałem this article, który wzmiankuje przechowywanie 1Million kluczy w trybie redis będzie korzystać z 17 GB pamięci. Jednak po przełączeniu na hasze masując je po 1k każdy (np. HSET "mediabucket:1155" "1155315" "939"), można przechowywać 1M w 5GB, co jest dość dużym oszczędności.HSET kontra użycie pamięci SET?

Czytam redis memory-optimization, ale nie całkiem rozumiem różnicę. Mówi on, że HGET nie są w pełni O (1), ale wystarczająco blisko i wspomina o większym wykorzystaniu procesora podczas używania hsetów. Nie rozumiem, dlaczego byłoby więcej użycia procesora (pewny czas wymiany miejsca, ale jak/co?). Wymienia kodowanie, ale nie kodowanie.

Wymienia także tylko ciąg znaków, ale nie mam pojęcia, co oznacza tylko ciąg. Czy to jest pole mieszania? Czy to oznacza hash? Nie widzę nic na ten temat w HSET. Co dokładnie byłoby zakodowane i dlaczego kodowanie byłoby bardziej wydajne niż użycie SET?

Jak to jest możliwe HSET "mediabucket:1155" "1155315" "939" jest bardziej wydajny niż SET "mediabucket:1155315" "939"? Mniej danych w SET (1155315 i d 1155 zamiast 1155315). Osobiście próbowałbym używać kluczy binarnych, ale nie sądzę, że ma to związek z tym, dlaczego HSET są bardziej wydajne.

EDIT:

Krzyż wysłana na listę mailingową Redis-db, a także: https://groups.google.com/d/topic/redis-db/90K3UqciAx0/discussion

Źródło

2012-10-08 Anonymous

Małe obiekty hash są kodowane jako ziplists zależności od wartości hash-max-ziplist-wpisów i hash-maksymalnej ilosci Parametry wartości ziplist. To jest prosta serializacja danych.

ziplist jest zdefiniowany następująco (wyciąg z kodem źródłowym Redis):

/* The ziplist is a specially encoded dually linked list that is designed 
* to be very memory efficient. It stores both strings and integer values, 
* where integers are encoded as actual integers instead of a series of 
* characters. It allows push and pop operations on either side of the list 
* in O(1) time. However, because every operation requires a reallocation of 
* the memory used by the ziplist, the actual complexity is related to the 
* amount of memory used by the ziplist. 
* 
* ---------------------------------------------------------------------------- 
* 
* ZIPLIST OVERALL LAYOUT: 
* The general layout of the ziplist is as follows: 
* <zlbytes><zltail><zllen><entry><entry><zlend> 
* 
* <zlbytes> is an unsigned integer to hold the number of bytes that the 
* ziplist occupies. This value needs to be stored to be able to resize the 
* entire structure without the need to traverse it first. 
* 
* <zltail> is the offset to the last entry in the list. This allows a pop 
* operation on the far side of the list without the need for full traversal. 
* 
* <zllen> is the number of entries.When this value is larger than 2**16-2, 
* we need to traverse the entire list to know how many items it holds. 
* 
* <zlend> is a single byte special value, equal to 255, which indicates the 
* end of the list. 
* 
* ZIPLIST ENTRIES: 
* Every entry in the ziplist is prefixed by a header that contains two pieces 
* of information. First, the length of the previous entry is stored to be 
* able to traverse the list from back to front. Second, the encoding with an 
* optional string length of the entry itself is stored. 
* 
* The length of the previous entry is encoded in the following way: 
* If this length is smaller than 254 bytes, it will only consume a single 
* byte that takes the length as value. When the length is greater than or 
* equal to 254, it will consume 5 bytes. The first byte is set to 254 to 
* indicate a larger value is following. The remaining 4 bytes take the 
* length of the previous entry as value. 
* 
* The other header field of the entry itself depends on the contents of the 
* entry. When the entry is a string, the first 2 bits of this header will hold 
* the type of encoding used to store the length of the string, followed by the 
* actual length of the string. When the entry is an integer the first 2 bits 
* are both set to 1. The following 2 bits are used to specify what kind of 
* integer will be stored after this header. An overview of the different 
* types and encodings is as follows: 
* 
* |00pppppp| - 1 byte 
*  String value with length less than or equal to 63 bytes (6 bits). 
* |01pppppp|qqqqqqqq| - 2 bytes 
*  String value with length less than or equal to 16383 bytes (14 bits). 
* |10______|qqqqqqqq|rrrrrrrr|ssssssss|tttttttt| - 5 bytes 
*  String value with length greater than or equal to 16384 bytes. 
* |11000000| - 1 byte 
*  Integer encoded as int16_t (2 bytes). 
* |11010000| - 1 byte 
*  Integer encoded as int32_t (4 bytes). 
* |11100000| - 1 byte 
*  Integer encoded as int64_t (8 bytes). 
* |11110000| - 1 byte 
*  Integer encoded as 24 bit signed (3 bytes). 
* |11111110| - 1 byte 
*  Integer encoded as 8 bit signed (1 byte). 
* |1111xxxx| - (with xxxx between 0000 and 1101) immediate 4 bit integer. 
*  Unsigned integer from 0 to 12. The encoded value is actually from 
*  1 to 13 because 0000 and 1111 can not be used, so 1 should be 
*  subtracted from the encoded 4 bit value to obtain the right value. 
* |11111111| - End of ziplist. 
* 
* All the integers are represented in little endian byte order. 
*/

Każdy element z obiektu hash jest reprezentowany jako para klucz/wartość w ziplist (2 kolejnych wpisów). Zarówno klucz, jak i wartości mogą być przechowywane jako prosty ciąg lub liczba całkowita. Ten format jest bardziej zwarty w pamięci, ponieważ oszczędza wiele wskaźników (8 bajtów każdy), które są wymagane do zaimplementowania dynamicznej struktury danych (np. Prawdziwego hashtable).

Minusem jest to, że operacje HSET/HGET są w rzeczywistości O (N), gdy są stosowane na liście zip. Dlatego ziplista musi być mała. Gdy dane zip listy mieszczą się w pamięci podręcznej procesora L1, odpowiednie algorytmy są wystarczająco szybkie pomimo ich liniowej złożoności.

Możesz odnieść się do następujących linków, aby uzyskać więcej informacji:

Redis 10x more memory usage than data

Redis Data Structure Space Requirements

Odpowiedzi te odnoszą się do innych struktur danych (takich jak zestawy, listy lub posortowanych zestawów), ale to jest dokładnie ta sama koncepcja.

Źródło

2012-10-08 12:13:13

Niezłe wyjaśnienie. –

HSET kontra użycie pamięci SET?

Odpowiedz

Powiązane problemy