2024年7月14日 星期日

Notes for JEDEC Standard No. 84-B51A - CH1~CH5


eMMC是一種管理型記憶體(managed memory ),專為儲存代碼和資料而設計,具有低功耗的特點,非常適合行動裝置使用。
eMMC 通訊匯流排: eMMC 使用一個包含11個訊號的匯流排進行通信,包括Clock、Data Strobe(資料同步訊號)、1位元Command、8位元Data Bus。

Clock: 支援的時脈頻率範圍為0到200MHz。
Data Bus: eMMC支援三種不同的資料匯流排寬度模式:1位(預設)、4位和8位。這些模式允許根據需要調整資料傳輸速度和並行性。

eMMC 的優勢:
低功耗: eMMC 設計具有低功耗特性,適合在電池供電的行動裝置中使用。 間接記憶體存取: eMMC 透過一個獨立的控制器來實現間接記憶體存取。這意味著設備可以在不依賴主機軟體的情況下處理後台記憶體管理任務。這種特性簡化了主機系統上的快閃記憶體管理層。


在 eMMC裝置中,記憶體定址和容量由定址模式和磁區大小決定。主要模式:byte addressing與sector addressing。
byte addressing:使用 32 位元直接定址單一位元組,最高支援 2GB,對於更大容量,這種定址模式不切實際。
sector addressing:對於大於 2GB 的設備,eMMC 切換到sector addressing。在這種模式下,定址指的是磁區而不是單一位元組。磁區大小通常為 512 位元組或 4 KB。
512B:理論上可以支援最高 2 TiB 的記憶體,2^32 個磁區 × 512 位元組/磁區 = 2^32 × 2^9 位元組 = 2^41 位元組 = 2 TiB。實際上,最大容量通常被認為是 256GB。這個差異是由於控制器複雜性、邏輯到實體位址映射、磨損均衡和壞區塊管理等開銷所致,這些都會消耗額外的資源。
4 KB 區:理論上可以支援最高2^32 個磁區 × 4 KB/磁區 = 2^32 × 2^12 位元組 = 2^44 位元組 = 16 TiB
控制器複雜性: 隨著磁區大小的減少和容量的增加,邏輯到實體映射、磨損均衡和壞區塊管理的複雜性和開銷也隨之增加。

eMMC 訊號線的解釋
CLK(時脈訊號):控制資料傳輸的時序。每個時脈週期觸發指令線上傳輸一位數據,資料線則傳輸一位(1x)或兩位數(2x)數據,視工作模式而定。時脈頻率可以從零到設備支援的最大頻率變化(0~200MHz)。

Data Strobe(資料同步訊號):eMMC 設備生成,用於 HS400 模式下的資料同步。Data Strobe的頻率與時脈訊號(CLK)頻率相符。對於資料輸出,每個週期可以傳輸兩位資料(一個在正邊緣,一個在負邊緣)。對於 CRC 狀態回應和指令回應輸出(僅在 HS400 enhanced strobe mode下),CRC 狀態和指令回應僅在正邊緣上鎖存,負邊緣則不考慮。

CMD(命令訊號):CMD 訊號是一個雙向頻道,用於從 eMMC host發送命令到 eMMC 設備,並接收設備的回應。它有兩種工作模式:
Open-Drain Mode: 初始化期間使用,以最小化雜訊並確保訊號完整性。這種模式對訊號完整性問題有較高的容忍度。
Push-Pull Mode: 用於快速指令傳輸,此模式下訊號主動驅動至高或低電平,實現邏輯狀態之間的快速可靠轉換。

DAT0-DAT7:是雙向資料線,用於在Host和 eMMC 裝置之間傳輸資料。它們工作在Push-Pull Mode模式下,允許快速資料傳輸,因為在任一時刻只有設備或主機驅動這些訊號。
通電或重置後,最初只有 DAT0 用於資料傳輸。然而,資料匯流排寬度可以擴展到包括 DAT0-DAT3(4 位元模式)或 DAT0-DAT7(8 位元模式),以增加資料吞吐量。 eMMC 裝置為 DAT1-DAT7 提供內部上拉電阻,當裝置進入對應的寬匯流排模式(4 位元或 8 位元)時,這些上拉電阻會被斷開。





bus protocol
在裝置上電重設後,主機必須透過特定的訊息初始化裝置。每個訊息由以下幾種標記組成:
Command:CMD是從主機發送到設備。命令透過 CMD 線傳輸。
Response:Response是設備作為先前接收到的命令的回復發送給主機的標記。響應同樣透過 CMD 線串行傳輸。
Data:Data可以從裝置傳輸到主機,也可以從主機傳輸到裝置。資料透過data bus傳輸,資料線的數量可以是 1(DAT0)、4(DAT0-DAT3)或 8(DAT0-DAT7)。對於每條資料線,資料傳輸可以是單一資料速率(每個時脈週期傳輸一位元)或雙資料速率(每個時脈週期傳輸兩位元)。


eMMC 指令和資料傳輸
命令發送一個資料塊後接 CRC 位元。讀取和寫入操作都允許單一或多個區塊的傳輸。多個區塊的傳輸會在 CMD 線上的停止命令發出後終止,類似於順序讀取。




寫入操作: 區塊寫入操作使用簡單DAT0傳輸忙碌訊號。




指令標記:每個指令標記前方有一個起始位元(‘0’)和一個結束位元(‘1’),總長度為 48 位元。每個標記都受到 CRC 位元的保護,以便偵測傳輸錯誤,如果發生錯誤,操作可以重複。


回應標記:根據其內容有五種編碼方案。標記長度為 48 位元或 136 位元。區塊資料的 CRC 保護演算法採用 16 位元 CCITT 多項式。這些規範確保 eMMC 設備與主機之間的資料和命令傳輸的可靠性和完整性。




    參考資料:
  • JESD84-B51A (Revision of JESD84-B51, February 2015)



2024年7月5日 星期五

Configuration of a minimal systemd setup in QEMU


本文記錄如何在 QEMU 環境中設定最小的 systemd init。基本上需要建立兩個必要的目標:sysinit.target 和 basic.target,以及用於登入的 getty@.service。
SRCROOT=/opt/armv7vet2hf/sysroots/
DSTROOT=initrd
SRCLIB=$SRCROOT/lib
DSTLIB=$DSTROOT/lib
SRCSYSDLIB=$SRCROOT/lib/systemd
DSTSYSDLIB=$DSTROOT/lib/systemd
SRCUSRLIB=$SRCROOT/usr/lib
DSTUSRLIB=$DSTROOT/usr/lib

rm -rf $DSTROOT
mkdir -p $DSTROOT/bin $DSTROOT/usr/lib $DSTROOT/lib/systemd/system $DSTROOT/etc/systemd/system

# copy libraries
cp -a $SRCLIB/libselinux.so* $DSTLIB
cp -a $SRCLIB/libmount.so* $DSTLIB
cp -a $SRCLIB/libaudit.so* $DSTLIB
cp -a $SRCLIB/libc.so* $DSTLIB
cp -a $SRCLIB/ld-linux-armhf.so* $DSTLIB
cp -a $SRCLIB/libblkid.so* $DSTLIB
cp -a $SRCLIB/libcap.so* $DSTLIB
cp -a $SRCLIB/libm.so* $DSTLIB
cp -a $SRCLIB/libpcre.so* $DSTLIB
cp -a $SRCLIB/libcap-ng.so* $DSTLIB

cp -a $SRCUSRLIB/libacl.so* $DSTUSRLIB
cp -a $SRCUSRLIB/libcrypt.so* $DSTUSRLIB
cp -a $SRCUSRLIB/liblzma.so* $DSTUSRLIB
cp -a $SRCUSRLIB/libattr.so* $DSTUSRLIB

cp -a $SRCSYSDLIB/systemd $DSTSYSDLIB
cp -a $SRCSYSDLIB/libsystemd-shared* $DSTSYSDLIB
install -m 555 busybox-build/busybox $DSTROOT/bin/


# Create basic.target
cat << EOF > $DSTSYSDLIB/system/basic.target
[Unit]
Description=Basic System
EOF

# Create sysinit.target
cat << EOF > $DSTSYSDLIB/system/sysinit.target
[Unit]
Description=System Initialization
DefaultDependencies=no
EOF

# Create getty@tty1.service
cat << EOF > $DSTSYSDLIB/system/getty@.service
[Unit]
Description=Getty on %I
ConditionPathExists=/dev/%I

[Service]
ExecStart=-/sbin/getty 115200 %I
Restart=always

[Install]
WantedBy=basic.target rescue.target
EOF


chmod 644 -R $DSTSYSDLIB/system/*

cat << EOF > $DSTROOT/init
#!/bin/busybox sh
## Mount essential filesystems
/bin/busybox mkdir -p /proc /sys /dev /home /run/systemd/journal /tmp /var /mnt /sbin /usr/bin /usr/sbin /etc/systemd/system/basic.target.wants /etc/systemd/system/rescue.target.wants /etc/systemd/system/default.target.wants
/bin/busybox --install -s
ln -sf /lib/systemd/systemd /sbin/init

## Set the path for BusyBox applets if using BusyBox
export PATH=/sbin:/bin:/usr/sbin:/usr/bin
mount -t proc none /proc
mount -t sysfs none /sys
mount -t devtmpfs none /dev
mkdir -p /dev/pts
mount -t devpts none /dev/pts
## Ensure the getty@ttyAMA0.service file is linked in the correct place
ln -sf /lib/systemd/system/getty@.service /etc/systemd/system/basic.target.wants/getty@ttyAMA0.service
# create default password
echo "root:vnpTT1BZdW1/s:0:0:root:/root:/bin/sh" > /etc/passwd
echo "root:x:0:" > /etc/group

# Ensure systemd can find its units
ln -sf /lib/systemd/system/basic.target /lib/systemd/system/default.target

# Start systemd
exec /sbin/init
EOF
chmod +x $DSTROOT/init

fakeroot bash -c "cd linux && ./usr/gen_initramfs.sh ../$DSTROOT -o ../initrd-arm.img"



2024年5月5日 星期日

A pattern for state machine III - SM framework


科技始終來自人性。最近剛好看到別人寫的SM有點糟糕,於是想起自己之前寫的,感覺也是不夠直覺,於是改寫了一下。主要概念還是根據SM的定義。
    An abstract state machine is a software component that defines a finite set of states:
  • One state is defined as the initial state. When a machine starts to execute, it automatically enters this state.
  • Each state can define actions that occur when a machine enters or exits that state.
  • Each state can define events that trigger a transition.
  • A transition defines how a machine would react to the event, by exiting one state and entering another state.
所以就從需求先定義API,再來實作內容。首先想到的是需要一個API來初始化這個SM,於是就有sm_alloc()誕生,並回傳sm這個抽象結構,sm_free()是用來釋放該SM的(destroy)。
typedef void * sm;
sm sm_alloc(char *name, void *data);
int sm_free(sm s);

Each state can define actions that occur when a machine enters or exits that state. 這句化,建立了API int sm_state_add(sm s, int state, sm_fp enter, sm_fp exit),用以建立sm中的"狀態",並且綁定enter action與exit action。
typedef int(*sm_fp)(void *data);
int sm_state_add(sm s, int state, sm_fp enter, sm_fp exit);
int sm_state_del(sm s, int state);

Each state can define events that trigger a transitionA transition defines how a machine would react to the event, by exiting one state and entering another state.這句話中,建立了API int sm_event_add(sm s, int state, int event, int new_state, sm_fp action),用於建立狀態中的"事件",指定"新狀態",以及"事件對應的動作"。並且使用API int sm_run(sm s, int new_event)讓sm根據收到的"事件"執行。
int sm_event_add(sm s, int state, int event, int new_state, sm_fp action);
int sm_event_del(sm s, int state, int event);
int sm_run(sm s, int new_event);

根據One state is defined as the initial state. When a machine starts to execute, it automatically enters this state.,我們定義了API int sm_init_state_set(sm s, int state)用以設定"初始狀態"。
int sm_current_state(sm s);

至此所有API都齊全了,以下是sm.h,其中我用marco稍微讓這個API多存一些資訊。
#ifndef _SM_H_
#define _SM_H_

typedef void * sm; /**< State machine handle */
typedef int(*sm_fp)(void *data); /**< State machine function pointer */

sm sm_alloc(char *name, void *data); /**< Allocate a state machine */
int sm_free(sm s); /**< Free a state machine */

#define sm_state_add(s, state, enter, exit) _sm_state_add(s, state, #state, (sm_fp)enter, #enter, (sm_fp)exit, #exit)
int _sm_state_add(sm s, int state, const char *st_name, sm_fp enter, const char * ent_fname, sm_fp exit, const char * exit_fname); /**< Add a state */

int sm_state_del(sm s, int state); /**< Delete a state */
int sm_run(sm s, int new_event); /**< Run the state machine */

#define sm_event_add(s, state, event, new_state, action) _sm_event_add(s, state, event, #event, new_state, (sm_fp)action, #action)
int _sm_event_add(sm s, int state, int event, const char *ev_name, int new_state, sm_fp action, const char *action_fname); /**< Add an event */
int sm_event_del(sm s, int state, int event); /**< Delete an event */
int sm_init_state_set(sm s, int state); /**< Set the initial state */
int sm_current_state(sm s); /**< Get the current state */
int sm_dump_state(sm s); /**< Dump the state machine */
#endif
接下來要解釋一下每個API的時作內容,首先是sm sm_alloc(char *name, void *data),我的想法是不限制有多少"狀態",所以要用link list去串SM中的每一個狀態,每個狀態都去串始於自己的"事件",為了在刪除"狀態"時,也能刪除指向該"狀態"的"事件",於於是我在每個"狀態"中多存了"被指到的事件(pointed_event_ll)"。
#include "sm.h"
#include "list.h"

struct sm_state {
    struct list_head state_ll;
    int state;
    const char *st_name;
    sm_fp enter;
    const char *ent_fname;
    sm_fp exit;
    const char *exit_fname;
    struct list_head event_ll;
    struct list_head pointed_event_ll;
};

struct _sm {
    void *v;
    char *name;
    struct list_head state_ll;
    struct sm_state *cur_sm_st;
    pthread_mutex_t mutex;
};

sm sm_alloc(char *name, void *data)
{
    struct _sm *sm;
    sm = (struct _sm *) malloc(sizeof(struct _sm));
    if (!sm) {
        sm_pr_err("malloc failed\n");
        return NULL;
    }
    sm->v = data;
    sm->name = strdup(name);
    sm->cur_sm_st = NULL;
    if (!sm->name) {
        sm_pr_err("malloc name failed\n");
        free(sm);
        return NULL;
    }
    INIT_LIST_HEAD(&apm;sm->state_ll);
    pthread_mutex_init(&sm->mutex, NULL);
    return sm;
}

int sm_free(sm s)
{
    struct _sm *sm = (struct _sm *) s;
    struct sm_state *sm_st, *tmp_sm_st;
    struct sm_event *sm_ev, *tmp_sm_ev;
    if (!sm) {
        return -1;
    }
    list_for_each_entry_safe(sm_st, tmp_sm_st, &sm->state_ll, state_ll) {
        list_for_each_entry_safe(sm_ev, tmp_sm_ev, &st->event_ll, event_ll) {
            list_del(&sm_ev->event_ll);
            free(sm_ev);
        }
        list_del(&sm_st->state_ll);
        free(sm_st);
    }
    pthread_mutex_destroy(&sm->mutex);
    free(sm->name);
    free(sm);
    return 0;
}

接著要說一下int sm_state_add(sm s, int state, sm_fp enter, sm_fp exit),其實只要判斷不存在要建立的state,剩下就是把資訊存到struct sm_state *而已,而sm_state_del()就是把對應的event都刪除後,釋放對應的resource。
static struct sm_state *sm_get_sm_state(sm s, int state)
{
    struct _sm *sm = (struct _sm *) s;
    struct sm_state *sm_st;
    list_for_each_entry(sm_st, &sm->state_ll, state_ll) {
        if (sm_st->state == state) {
            return st;
        }
    }
    return NULL;
}

int _sm_state_add(sm s, int state, const char *st_name, sm_fp enter, const char * ent_fname, sm_fp exit, const char *exit_fname)
{
    struct _sm *sm = (struct _sm *) s;
    struct sm_state *sm_st;
    if (!sm) {
        sm_pr_err("invalid argument\n");
        return -1;
    }
    // if state is already exist, return -1
    sm_st = sm_get_sm_state(s, state);
    if (sm_st) {
        sm_pr_err("state exist\n");
        return -1;
    }

    sm_st = (struct sm_state *) malloc(sizeof(struct sm_state));
    if (!sm_st) {
        sm_pr_err("malloc failed\n");
        return -1;
    }
    sm_st->state = state;
    sm_st->st_name = st_name;
    sm_st->enter = enter;
    sm_st->ent_fname = ent_fname;
    sm_st->exit = exit;
    sm_st->exit_fname = exit_fname;
    INIT_LIST_HEAD(&sm_st->event_ll);
    INIT_LIST_HEAD(&sm_st->pointed_event_ll);
    list_add_tail(&sm_st->state_ll, &sm->state_ll);
    return 0;
}

int sm_state_del(sm s, int state)
{
    struct _sm *sm = (struct _sm *) s;
    struct sm_state *sm_st;
    struct sm_event *sm_ev, *tmp_sm_ev;
    if (!sm) {
        sm_pr_err("invalid argument\n");
        return -1;
    }

    sm_st = sm_get_sm_state(s, state);
    if (!sm_st) {
        sm_pr_err("state is not exist\n");
        return -1;
    }

    list_for_each_entry_safe(sm_ev, tmp_sm_ev, &sm_st->event_ll, event_ll) {
        list_del(&sm_ev->event_ll);
        free(sm_ev);
    }
    list_for_each_entry_safe(sm_ev, tmp_sm_ev, &sm_st->pointed_event_ll, pointed_event_ll) {
        sm_st = sm_ev->sm_state;
        list_del(&sm_ev->pointed_event_ll);
    }
    list_del(&sm_st->state_ll);
    free(sm_st);
    return 0;
}

int sm_event_add(sm s, int state, int event, int new_state, sm_fp action);要先判斷"狀態"與"新狀態"存在,且"事件"不存在,接著把該"事件"串到該"狀態"去。
static struct sm_event *sm_get_sm_event(struct sm_state *st, int event)
{
    struct sm_event *sm_ev;
    list_for_each_entry(sm_ev, &st->event_ll, event_ll) {
        if (sm_ev->event == event) {
            return sm_ev;
        }
    }
    return NULL;
}

int _sm_event_add(sm s, int state, int event, const char *ev_name, int new_state, sm_fp action, const char *action_fname)
{
    struct _sm *sm = (struct _sm *) s;
    struct sm_state *sm_st, *new_sm_st;
    struct sm_event *sm_ev;
    if (!sm) {
        sm_pr_err("invalid argument\n");
        return -1;
    }
    sm_st = sm_get_sm_state(s, state);
    if (!sm_st) {
        sm_pr_err("state is not exist\n");
        return -1;
    }

    sm_ev = sm_get_sm_event(sm_st, event);
    if (sm_ev) {
        sm_pr_err("event is already exist\n");
        return -1;
    }

    new_sm_st = sm_get_sm_state(s, new_state);
    if (!new_sm_st) {
        sm_pr_err("new state is not exist\n");
        return -1;
    }

    sm_ev = (struct sm_event *) malloc(sizeof(struct sm_event));
    if (!sm_ev) {
        sm_pr_err("malloc failed\n");
        return -1;
    }
    sm_ev = (struct sm_event *) malloc(sizeof(struct sm_event));
    if (!sm_ev) {
        sm_pr_err("malloc failed\n");
        return -1;
    }
    sm_ev->sm_state = sm_st;
    sm_ev->event = event;
    sm_ev->ev_name = ev_name;
    sm_ev->new_sm_state = new_sm_st;
    sm_ev->action = action;
    sm_ev->action_fname = action_fname;
    list_add_tail(&sm_ev->event_ll, &sm_st->event_ll);
    list_add_tail(&sm_ev->pointed_event_ll, &new_sm_st->pointed_event_ll);
    return 0;
}

int sm_event_del(sm s, int state, int event)
{
    struct _sm *sm = (struct _sm *) s;
    struct sm_state *sm_st;
    struct sm_event *sm_ev;
    if (!sm) {
        sm_pr_err("invalid argument\n");
        return -1;
    }
    sm_st = sm_get_sm_state(s, state);
    if (!sm_st) {
        sm_pr_err("state is not exist\n");
        return -1;
    }
    sm_ev = sm_get_sm_event(sm_st, event);
    if (!sm_ev) {
        sm_pr_err("event is not exist\n");
        return -1;
    }
    list_del(&sm_ev->event_ll);
    free(sm_ev);
    return 0;
}

int sm_init_state_set(sm s, int state),其實就是找到,該"狀態",然後把SM的cur_sm_st指向它
int sm_init_state_set(sm s, int state)
{
    struct _sm *sm = (struct _sm *) s;
    struct sm_state *sm_st;
    if (!sm) {
        sm_pr_err("invalid argument\n");
        return -1;
    }
    sm_st = sm_get_sm_state(s, state);
    if (!sm_st) {
        sm_pr_err("state is not exist\n");
        return -1;
    }
    sm->cur_sm_st = sm_st;
    return 0;
}

最後是 int sm_run(sm s, int event),從cur_sm_st去找對應的"事件",如果找到,就執行離開該"狀態"的"動作",接著觸發該"事件"的"動作",最後設定"新狀態"為cur_sm_st,並執行新狀態的進入"動作"
int sm_run(sm s, int event)
{
    struct _sm *sm = (struct _sm *) s;
    struct sm_state *sm_st;
    struct sm_event *sm_ev;
    if (!sm) {
        sm_pr_err("invalid argument\n");
        return -1;
    }
    pthread_mutex_lock(&sm->mutex);
    if (!sm->cur_sm_st) {
        sm_pr_err("invalid stats\n");
        pthread_mutex_unlock(&sm->mutex);
        return -1;
    }

    sm_st = sm->cur_sm_st;
    sm_ev = sm_get_sm_event(sm_st, event);
    if (!sm_ev) {
        sm_pr_err("event is not exist\n");
        pthread_mutex_unlock(&sm->mutex);
        return -1;
    }
    if (sm->cur_sm_st->exit) {
        sm->cur_sm_st->exit(sm->v);
    }
    if (sm_ev->action) {
        sm_ev->action(sm->v);
    }
    sm->cur_sm_st = sm_ev->new_sm_state;
    if (sm->cur_sm_st->enter) {
        sm->cur_sm_st->enter(sm->v);
    }
    
    pthread_mutex_unlock(&sm->mutex);
    return 0;
}

剩下的僅是一些協助的API
int sm_current_state(sm s)
{
    struct _sm *sm = (struct _sm *) s;
    if (!sm) {
        sm_pr_err("invalid argument\n");
        return -1;
    }
    return sm->cur_sm_st->state;
}

int sm_dump_state(sm s)
{
    struct _sm *sm = (struct _sm *) s;
    struct sm_state *sm_st, *ori_sm_st;
    struct sm_event *sm_ev;
    if (!sm) {
        sm_pr_err("invalid argument\n");
        return -1;
    }
    list_for_each_entry(sm_st, &sm->state_ll, state_ll) {
        printf("state: %d/%p/%s\n", sm_st->state, sm_st, sm_st->st_name);
        printf("\tenter_fp: %p, enter_fname: %s\n", sm_st->enter, sm_st->enter?sm_st->ent_fname:"");
        printf("\texit_fp: %p, exit_fname: %s\n", sm_st->exit, sm_st->exit?sm_st->exit_fname:"");
        list_for_each_entry(sm_ev, &sm_st->event_ll, event_ll) {
            printf("\tevent: %d/%p/%s, new_state: %d/%p/%s, ev_fp:%p/%s\n", sm_ev->event, sm_ev, sm_ev->ev_name,
                            sm_ev->new_sm_state->state, sm_ev->new_sm_state, sm_ev->new_sm_state->st_name,
                            sm_ev->action, sm_ev->action?sm_ev->action_fname:"");
        }
        // pointed event is the event that point to this state
        list_for_each_entry(sm_ev, &sm_st->pointed_event_ll, pointed_event_ll) {
            ori_sm_st = sm_ev->sm_state;
            printf("\tpointed event: %d/%p/%s, from state: %d/%p/%s, ev_fp:%p/%s\n", sm_ev->event, sm_ev, sm_ev->ev_name,
                            ori_sm_st->state, ori_sm_st, ori_sm_st->st_name,
                            sm_ev->action, sm_ev->action?sm_ev->action_fname:"");
        }
    }
    return 0;
}



2023年9月24日 星期日

Linux Kernel(25.1)- Gadget Configfs


這篇是gadget_configfs.txt的心得, 透過Dummy HCD的模擬, 就可以不用真的去連USH host才能驗證Gadget了.
首先把Dummy HCD與USB Gadget functions configurable through configf選成built-in了, 方便後面驗證, 下面就直接用例子說明
/ # lsusb 查看目前USB裝置, ID <Vendor ID>:<Device ID>
Bus 001 Device 001: ID 1d6b:0002 可以看到目前只有一組, 1B6D是Linux Foundation, 0002是2.0 root hub

/ # mount -t configfs none /sys/kernel/config/ 要把configfs掛起來才能開始設定gadget
/ # mount
rootfs on / type rootfs (rw,size=40392k,nr_inodes=10098)
tmpfs on /dev type tmpfs (rw,relatime,size=64k,mode=755)
devpts on /dev/pts type devpts (rw,relatime,mode=600,ptmxmode=000)
proc on /proc type proc (rw,relatime)
sysfs on /sys type sysfs (rw,relatime)
none on /sys/kernel/config type configfs (rw,relatime)

步驟一(Creating the gadgets) : 每一個Gadget都要建立自己的目錄, 並在其子目錄下做設定, 
我們就約定俗成取成g1吧
格式如下:
	$ mkdir /sys/kernel/config/usb_gadget/<gadget name>
/ # mkdir /sys/kernel/config/usb_gadget/g1 
/ # cd /sys/kernel/config/usb_gadget/g1

每個Gadget都需要有自己的VID與PID, 
格式如下:
	$ echo <VID> > idVendor
	$ echo <PID> > idProduct
/sys/kernel/config/usb_gadget/g1 # echo 0x1d6b > idVendor
/sys/kernel/config/usb_gadget/g1 # echo 0x0104 > idProduct

接著要為每個Gadget設定serial number, manufacturer和product strings
而這些設定會放置在strings底的語系的目錄下, 這裡0x409是英語系
格式如下:
	$ echo <serial number> > strings/0x409/serialnumber
	$ echo <manufacturer> > strings/0x409/manufacturer
	$ echo <product> > strings/0x409/product
/sys/kernel/config/usb_gadget/g1 # mkdir strings/0x409
/sys/kernel/config/usb_gadget/g1 # echo "Brook Technologies" > strings/0x409/manufacturer
/sys/kernel/config/usb_gadget/g1 # echo "Brook's Dummy Storage Gadget" > strings/0x409/product
/sys/kernel/config/usb_gadget/g1 # echo "12345678" > strings/0x409/serialnumber

步驟二(Creating the configurations) : 每個Gadget都會包含許多配置(configurations), 
這些configurations對應的目錄都要被建立,
格式如下:
	$ mkdir configs/<name>.<number>
每個configuration都需要自己的strings與語系, 比如
	$ mkdir configs/c.1/strings/0x409
	$ echo <configuration> > configs/c.1/strings/0x409/configuration
也有一些attributes如MaxPower需要被設定,    
/sys/kernel/config/usb_gadget/g1 # mkdir configs/c.1
/sys/kernel/config/usb_gadget/g1 # mkdir configs/c.1/strings/0x409
/sys/kernel/config/usb_gadget/g1 # echo 'Brook_Gadget' > configs/c.1/strings/0x409/configuration
/sys/kernel/config/usb_gadget/g1 # echo 250 > configs/c.1/MaxPower

步驟三 (Creating the functions) : 每個Gadget都會提供一些function, 每個function都要有對應的目錄
格式如下:
	$ mkdir functions/<name>.<instance name>
name還有對應的attribute可以參考Documentation/ABI/*/configfs-usb-gadget*
以下以mass_storage為範例, 並參考ABI/testing/configfs-usb-gadget-mass-storage
/sys/kernel/config/usb_gadget/g1 # mkdir functions/mass_storage.usb0
Mass Storage Function, version: 2009/09/11
LUN: removable file: (no medium)
/sys/kernel/config/usb_gadget/g1 # ls functions/mass_storage.usb0/
lun.0  stall
/sys/kernel/config/usb_gadget/g1 # ls functions/mass_storage.usb0/lun.0/
cdrom           inquiry_string  removable
file            nofua           ro

參數stall: 必須設為true.
/sys/kernel/config/usb_gadget/g1 # echo 1 > functions/mass_storage.usb0/stall

參數lun.0/removable: 是否可被移除
/sys/kernel/config/usb_gadget/g1 # echo 0 > functions/mass_storage.usb0/lun.0/removable

參數lun.0/ro: 是否唯讀
/sys/kernel/config/usb_gadget/g1 # echo 0 > functions/mass_storage.usb0/lun.0/ro

參數lun.0/file: 如果設定為lun.0/removable=0, 就要提供LUN的備份檔案路徑
/sys/kernel/config/usb_gadget/g1 # dd if=/dev/zero of=/mass.vfat bs=1M count=8
8+0 records in
8+0 records out
8388608 bytes (8.0MB) copied, 0.133141 seconds, 60.1MB/s
/sys/kernel/config/usb_gadget/g1 # mkfs.vfat /mass.vfat
/sys/kernel/config/usb_gadget/g1 # echo '/mass.vfat' > functions/mass_storage.usb0/lun.0/file

步驟四 (Associating the functions with their configurations) : 
格式如下:
	$ ln -s functions/<name>.<instance name> configs/<name>.<number>
    
/sys/kernel/config/usb_gadget/g1 # ln -s functions/mass_storage.usb0 configs/c.1/

步驟五 (Enabling the gadget) : 
基本上enable gadget就是把它跟UDC(USB Device Controller)做綁定. UDC可以在/sys/class/udc/找到
比如我的系統是"dummy_udc.0", 把把他echo 到UDC就可以了
/sys/kernel/config/usb_gadget/g1 # ls /sys/class/udc/
dummy_udc.0
/sys/kernel/config/usb_gadget/g1 # echo dummy_udc.0 > UDC
底下就是長出來的USB Disk了
usb 1-1: new high-speed USB device number 2 using dummy_hcd
usb 1-1: New USB device found, idVendor=1d6b, idProduct=0104, bcdDevice= 5.15
usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
usb 1-1: Product: Brook's Dummy Storage Gadget
usb 1-1: Manufacturer: Brook Technologies
usb 1-1: SerialNumber: 12345678
usb-storage 1-1:1.0: USB Mass Storage device detected
scsi host0: usb-storage 1-1:1.0
scsi 0:0:0:0: Direct-Access     Linux    File-Stor Gadget 0515 PQ: 0 ANSI: 2
sd 0:0:0:0: Power-on or device reset occurred
sd 0:0:0:0: [sda] 16384 512-byte logical blocks: (8.39 MB/8.00 MiB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
 sda:
sd 0:0:0:0: [sda] Attached SCSI disk

/sys/kernel/config/usb_gadget/g1 # lsusb
Bus 001 Device 001: ID 1d6b:0002
Bus 001 Device 002: ID 1d6b:0104
下次有機會再多介紹幾個gadget吧

    參考資料:
  • Documentation/usb/gadget_configfs.txt



2023年9月16日 星期六

Linux Kernel(11.2)- mdev.conf


busybox實作了mdev來處理動態更新/dev, 這篇文章主要是mdev.txt心得記錄.
以下幾個範例是在init script做init的範例,
Here's a typical code snippet from the init script:
[0] mount -t proc proc /proc
[1] mount -t sysfs sysfs /sys
[2] echo /sbin/mdev > /proc/sys/kernel/hotplug
[3] mdev -s

Alternatively, without procfs the above becomes:
[1] mount -t sysfs sysfs /sys
[2] sysctl -w kernel.hotplug=/sbin/mdev
[3] mdev -s
基本上不論如何都要先mount /sys才能開始做, mdev -s, 因為mdev主要就是靠讀取/sys/dev底下的資訊來建立相關的device node. 基本上kernel config要開CONFIG_UEVENT_HELPER才會有/proc/sys/kernel/hotplug (kernel.hotplug), 不過沒開就是在系統裝置異動時, 得自己手動執行mdev -s. 最後就是執行mdev -s建立相關的device node.

接著說明一下/etc/mdev.conf
The file has the format:
[-][envmatch]<device regex>     <uid>:<gid> <permissions>
[envmatch]@<maj[,min1[-min2]]>  <uid>:<gid> <permissions>
        $envvar=<regex>         <uid>:<gid> <permissions>
        
You can rename/move device nodes by using the next optional field.
 <device regex> <uid>:<gid> <permissions> [=path]
 
For example:
        hd[a-z][0-9]* 0:3 660

Mdev has an optional config file for controlling ownership/permissions of
device nodes if your system needs something more than the default root/root
660 permissions.

基本上mdev是採first match, 如果第一個rule比對成功, 就會套用該rule, 不然就往下一個去, 但是如果遇到"-"不論有沒有match, 都會往下一個去match去執行. 另外, 參數沒有給齊會用"0:0 600"當預設值.
下面的範例就是用"-"跟沒有"-"作範例, 可以看到"-"rule match, 也會繼續往下比
/ # cat /etc/mdev.conf
-tty0 1:1 0660 @/x.sh 就算tty0 match這個rule, 也會繼續執行下一個
tty0 1:1 0660 @/y.sh
tty1 1:1 0660 @/x.sh 如果tty1 match這個rule, 就會結束, 下一個就不會被執行了
tty1 1:1 0660 @/y.sh

/ # cat /x.sh
#!/bin/sh -x

# Redirect standard output to /dev/console
exec 1>/dev/kmsg

echo "start $0"
env

# Your script commands go here
echo "end $0"

/ # cat /y.sh
#!/bin/sh -x

# Redirect standard output to /dev/console
exec 1>/dev/kmsg

echo "start $0"
env

# Your script commands go here
echo "end $0"

/ # mdev -s
+ exec
+ echo 'start /x.sh'
start /x.sh
+ env
USER=root
ACTION=add
SHLVL=3
HOME=/
OLDPWD=/dev
MDEV=tty0 第一次呼叫
TERM=vt102
SUBSYSTEM=tty
PATH=/sbin:/usr/sbin:/bin:/usr/bin
SHELL=/bin/sh
PWD=/dev
+ echo 'end /x.sh'
end /x.sh
+ exec
+ echo 'start /y.sh'
start /y.sh
+ env
USER=root
ACTION=add
SHLVL=3
HOME=/
OLDPWD=/dev
MDEV=tty0 第二次呼叫
TERM=vt102
SUBSYSTEM=tty
PATH=/sbin:/usr/sbin:/bin:/usr/bin
SHELL=/bin/sh
PWD=/dev
+ echo 'end /y.sh'
end /y.sh
+ exec
+ echo 'start /x.sh'
start /x.sh
+ env
USER=root
ACTION=add
SHLVL=3
HOME=/
OLDPWD=/dev
MDEV=tty1 一次呼叫
TERM=vt102
SUBSYSTEM=tty
PATH=/sbin:/usr/sbin:/bin:/usr/bin
SHELL=/bin/sh
PWD=/dev
+ echo 'end /x.sh'
end /x.sh

下一個範例是rename或者放到/dev底下的子目錄, 語法是<device regex> <uid>:<gid> <permissions> [=path], 而如果要產在在某個子目錄下, 就在後面多加"/".
/ # rm /dev/tty* 刪除所有tty檔案後重新透過mdev -s建立 
/ # ls /dev/ 確認tty檔案都被砍掉
console          mmcblk0p2        ptyp5            urandom
cpu_dma_latency  mtd0             ptyp6            usbmon0
dri              mtd0ro           ptyp7            vcs
fb0              mtd1             ptyp8            vcs1
full             mtd1ro           ptyp9            vcs2
gpiochip0        mtdblock0        ptypa            vcsa
gpiochip1        mtdblock1        ptypb            vcsa1
gpiochip2        null             ptypc            vcsa2
gpiochip3        ptmx             ptypd            vcsu
hwrng            pts              ptype            vcsu1
input            ptyp0            ptypf            vcsu2
kmsg             ptyp1            random           zero
mem              ptyp2            rtc0
mmcblk0          ptyp3            snd
mmcblk0p1        ptyp4            ubi_ctrl
/ # cat /etc/mdev.conf
tty0 1:1 0660 =ttyBrook0 會把tty0 rename 成 ttyBrook0
tty.* 1:1 0660 =ttyBrook/ 把其他tty都建立在/dev/ttyBrook目錄下

/ # mdev -s
/ # ls /dev/
console          mmcblk0p2        ptyp5            ttyBrook0
cpu_dma_latency  mtd0             ptyp6            ubi_ctrl
dri              mtd0ro           ptyp7            urandom
fb0              mtd1             ptyp8            usbmon0
full             mtd1ro           ptyp9            vcs
gpiochip0        mtdblock0        ptypa            vcs1
gpiochip1        mtdblock1        ptypb            vcs2
gpiochip2        null             ptypc            vcsa
gpiochip3        ptmx             ptypd            vcsa1
hwrng            pts              ptype            vcsa2
input            ptyp0            ptypf            vcsu
kmsg             ptyp1            random           vcsu1
mem              ptyp2            rtc0             vcsu2
mmcblk0          ptyp3            snd              zero
mmcblk0p1        ptyp4            ttyBrook

/ # ls /dev/ttyBrook除了tty0其餘tty都在這子目錄下
tty      tty19    tty29    tty39    tty49    tty59    ttyAMA2  ttyp9
tty1     tty2     tty3     tty4     tty5     tty6     ttyAMA3  ttypa
tty10    tty20    tty30    tty40    tty50    tty60    ttyp0    ttypb
tty11    tty21    tty31    tty41    tty51    tty61    ttyp1    ttypc
tty12    tty22    tty32    tty42    tty52    tty62    ttyp2    ttypd
tty13    tty23    tty33    tty43    tty53    tty63    ttyp3    ttype
tty14    tty24    tty34    tty44    tty54    tty7     ttyp4    ttypf
tty15    tty25    tty35    tty45    tty55    tty8     ttyp5
tty16    tty26    tty36    tty46    tty56    tty9     ttyp6
tty17    tty27    tty37    tty47    tty57    ttyAMA0  ttyp7
tty18    tty28    tty38    tty48    tty58    ttyAMA1  ttyp8

這個範例,是在最後一個欄位加上"!", 代表不創建該node, 語法是<device regex> <uid>:<gid> <permissions> [!] [@|$|*<command>], 下面只會建立tty0, 其餘的tty都不會被建立
/ # cat /etc/mdev.conf
tty0 1:1 0660 =ttyBrook0 會把tty0 rename 成ttyBrook0
tty1 1:1 0660 ! @/x.sh 雖然不會建立tty1, 但是會去執行/x.sh
tty.* 1:1 0660 ! 其餘的tty都不建立了

/ # cat /x.sh
#!/bin/sh -x

# Redirect standard output to /dev/console
exec 1>/dev/kmsg

echo "start $0"
env

# Your script commands go here
echo "end $0"

/ # rm -rf /dev/tty*
/ # ls /dev/
console          mmcblk0p2        ptyp5            urandom
cpu_dma_latency  mtd0             ptyp6            usbmon0
dri              mtd0ro           ptyp7            vcs
fb0              mtd1             ptyp8            vcs1
full             mtd1ro           ptyp9            vcs2
gpiochip0        mtdblock0        ptypa            vcsa
gpiochip1        mtdblock1        ptypb            vcsa1
gpiochip2        null             ptypc            vcsa2
gpiochip3        ptmx             ptypd            vcsu
hwrng            pts              ptype            vcsu1
input            ptyp0            ptypf            vcsu2
kmsg             ptyp1            random           zero
mem              ptyp2            rtc0
mmcblk0          ptyp3            snd
mmcblk0p1        ptyp4            ubi_ctrl
/ # mdev -s
+ exec
+ echo 'start /x.sh'
start /x.sh
+ env
USER=root
ACTION=add
SHLVL=3
HOME=/
OLDPWD=/dev
MDEV=tty1 雖然不會建立tty1, 但是會去執行/x.sh
TERM=vt102
SUBSYSTEM=tty
PATH=/sbin:/usr/sbin:/bin:/usr/bin
SHELL=/bin/sh
PWD=/dev
+ echo 'end /x.sh'
end /x.sh
/ # ls /dev/
console          mmcblk0p2        ptyp5            ubi_ctrl
cpu_dma_latency  mtd0             ptyp6            urandom
dri              mtd0ro           ptyp7            usbmon0
fb0              mtd1             ptyp8            vcs
full             mtd1ro           ptyp9            vcs1
gpiochip0        mtdblock0        ptypa            vcs2
gpiochip1        mtdblock1        ptypb            vcsa
gpiochip2        null             ptypc            vcsa1
gpiochip3        ptmx             ptypd            vcsa2
hwrng            pts              ptype            vcsu
input            ptyp0            ptypf            vcsu1
kmsg             ptyp1            random           vcsu2
mem              ptyp2            rtc0             zero
mmcblk0          ptyp3            snd
mmcblk0p1        ptyp4            ttyBrook0只有ttyBrook0

這個範例是說明指令被執行的時機,'@'建立node之後執行, '$'移除node前執行, '*'等同'@'+'$', 這個我就想透過CONFIG_UEVENT_HELPER來demo比較有感覺
The special characters have the meaning:
        @ Run after creating the device.
        $ Run before removing the device.
        * Run both after creating and before removing the device.
/ # sysctl -a| grep hotplug 確認hotplug有被設定
kernel.hotplug = /sbin/mdev

/ # cat /etc/mdev.conf 
mmc.* 1:1 0660 @/x.sh 設定create node之後執行
/ # cat /x.sh
#!/bin/sh -x

# Redirect standard output to /dev/console
exec 1>/dev/kmsg

echo "start $0"
sleep 1
env
ls /dev/mmcblk*

# Your script commands go here
echo "end $0"

/ # ls /dev/mmcblk*
/dev/mmcblk0
/ # echo -e 'n\np\n1\n\n\nw' | fdisk /dev/mmcblk0 建立一個新分區

The number of cylinders for this disk is set to 32768.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): Partition type
   p   primary partition (1-4)
   e   extended
Partition number (1-4): First sector (16-2097151, default 16): Using default value 16
Last sector or +size{,K,M,G,T} (16-2097151, default 2097151): Using default value 2097151

Command (m for help): The partition table has been altered.
Calling ioctl() to re-read partition table
 mmcblk0: p1
/ # 
start /x.sh該scrip會在node建立後被執行
DEVNAME=mmcblk0p1
ACTION=add
SHLVL=2
HOME=/
SEQNUM=762
MAJOR=179
MDEV=mmcblk0p1
DEVPATH=/devices/platform/bus@40000000/bus@40000000:motherboard-bus@40000000/bus@40000000:motherboard-bus@40000000:iofpga@7,00000000/10005000.mmci/mmc_host/mmc0/mmc0:4567/block/mmcblk0/mmcblk0p1
SUBSYSTEM=block
PATH=/sbin:/bin:/usr/sbin:/usr/bin
DISKSEQ=3
MINOR=1
PARTN=1
PWD=/dev
DEVTYPE=partition
/dev/mmcblk0
/dev/mmcblk0p1
end /x.sh

/ # echo -e 'd\nw' | fdisk /dev/mmcblk0 刪除分區, 
因為mdev只有設定新增node之後執行, 所以移除分區不會執行/x.sh

The number of cylinders for this disk is set to 32768.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): Selected partition 1

Command (m for help): The partition table has been altered.
Calling ioctl() to re-read partition table
 mmcblk0:
/ # cat /etc/mdev.conf 修改移除node之後執行
mmc.* 1:1 0660 $/x.sh
/ # ls /dev/mmcblk* 底下沒有任何新分割區
/dev/mmcblk0
/ # echo -e 'n\np\n1\n\n\nw' | fdisk /dev/mmcblk0 建立新分區, 
但是mdev被設定移除才會執行/x.sh, 所以此時不會執行/x.sh

The number of cylinders for this disk is set to 32768.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): Partition type
   p   primary partition (1-4)
   e   extended
Partition number (1-4): First sector (16-2097151, default 16): Using default value 16
Last sector or +size{,K,M,G,T} (16-2097151, default 2097151): Using default value 2097151

Command (m for help): The partition table has been altered.
Calling ioctl() to re-read partition table
 mmcblk0: p1
/ # ls /dev/mmcblk*
/dev/mmcblk0    /dev/mmcblk0p1
/ # echo -e 'd\nw' | fdisk /dev/mmcblk0 移除分區

The number of cylinders for this disk is set to 32768.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): Selected partition 1

Command (m for help): The partition table has been altered.
Calling ioctl() to re-read partition table
 mmcblk0:
/ # 
start /x.sh 因為此時的mdev.conf被設定移除node後執行/x.sh, 所以這裡會執行/x.sh
DEVNAME=mmcblk0p1
ACTION=remove
SHLVL=2
HOME=/
SEQNUM=767
MAJOR=179
MDEV=mmcblk0p1
DEVPATH=/devices/platform/bus@40000000/bus@40000000:motherboard-bus@40000000/bus@40000000:motherboard-bus@40000000:iofpga@7,00000000/10005000.mmci/mmc_host/mmc0/mmc0:4567/block/mmcblk0/mmcblk0p1
SUBSYSTEM=block
PATH=/sbin:/bin:/usr/sbin:/usr/bin
DISKSEQ=3
MINOR=1
PARTN=1
PWD=/dev
DEVTYPE=partition
/dev/mmcblk0
/dev/mmcblk0p1
end /x.sh


  • busybox docs/mdev.txt
  • http://kernel.org/doc/pending/hotplug.txt
  • https://www.cnblogs.com/sky-heaven/p/5688092.html




2023年9月15日 星期五

Linux Kernel(24.1)- fdisk Multimedia Card


這章節透過fdisk來了解底層的kernel相關的訊息, 為了方便學習, 這裡會使用qemu來模擬掛載eMMC.
首先會透過qemu-img create創建一個1GByte的eMMC, 並透過qemu掛載起來,
[brook@:~/Projects/qemu]$ qemu-img create -f qcow2 emmc_image.qcow2 1G
Formatting 'emmc_image.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zli6
[brook@:~/Projects/qemu]$ qemu-img info emmc_image.qcow2
image: emmc_image.qcow2
file format: qcow2
virtual size: 1 GiB (1073741824 bytes)
disk size: 196 KiB
cluster_size: 65536
Format specific information:
    compat: 1.1
    compression type: zlib
    lazy refcounts: false
    refcount bits: 16
    corrupt: false
    extended l2: false
[brook@:~/Projects/qemu]$ qemu-system-arm -smp cpus=4 -s \
    -M vexpress-a9 -m 128M -kernel ./linux/arch/arm/boot/zImage \
    -dtb ./linux/arch/arm/boot/dts/vexpress-v2p-ca9.dtb \
    -initrd ./initrd-arm.img -nographic -append "console=ttyAMA0" \
    -drive id=mysdcard,if=none,format=qcow2,file=emmc_image.qcow2 \
    -device sd-card,drive=mysdcard


進入qemu之後, 會長出/dev/mmcblk0, 可以看到就是當時候的1GB, 不論是透過fdisk或是/proc/partitions都可以看出這是一個未經分割的eMMC, 隨後透過fdisk分割成兩個512MB的分區, 寫入離開fdisk之後, 就可以看到/proc/partition有多出兩個partition, 對應的/dev也會長出mmcblk0p1與mmcblk0p2, 沒看到可以用/sbin/mdev -s再掃一次
/ # fdisk -l /dev/mmcblk0
Disk /dev/mmcblk0: 1024 MB, 1073741824 bytes, 2097152 sectors
32768 cylinders, 4 heads, 16 sectors/track
Units: sectors of 1 * 512 = 512 bytes

/ # cat /proc/partitions 

major minor  #blocks  name

  31        0     131072 mtdblock0
  31        1      32768 mtdblock1
 179        0    1048576 mmcblk0
/ # fdisk /dev/mmcblk0
Device contains neither a valid DOS partition table, nor Sun, SGI, OSF or GPT disklabel
Building a new DOS disklabel. Changes will remain in memory only,
until you decide to write them. After that the previous content
won't be recoverable.


The number of cylinders for this disk is set to 32768.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)
   
Command (m for help): m
Command Action
a       toggle a bootable flag
b       edit bsd disklabel
c       toggle the dos compatibility flag
d       delete a partition
l       list known partition types
n       add a new partition
o       create a new empty DOS partition table
p       print the partition table
q       quit without saving changes
s       create a new empty Sun disklabel
t       change a partition's system id
u       change display/entry units
v       verify the partition table
w       write table to disk and exit
x       extra functionality (experts only)

Command (m for help): n
Partition type
   p   primary partition (1-4)
   e   extended
p
Partition number (1-4): 1
First sector (16-2097151, default 16):使用default, 直接按下enter
Using default value 16
Last sector or +size{,K,M,G,T} (16-2097151, default 2097151): +512M
Command (m for help): n
Partition type
   p   primary partition (1-4)
   e   extended
p
Partition number (1-4): 2
First sector (1048592-2097151, default 1048592):使用default, 直接按下enter
Using default value 1048592
Last sector or +size{,K,M,G,T} (1048592-2097151, default 2097151):使用default, 直接按下enter
Using default value 2097151

Command (m for help): p
Disk /dev/mmcblk0: 1024 MB, 1073741824 bytes, 2097152 sectors
32768 cylinders, 4 heads, 16 sectors/track
Units: sectors of 1 * 512 = 512 bytes
Device       Boot StartCHS    EndCHS        StartLBA     EndLBA    Sectors  Size Id Type
/dev/mmcblk0p1    0,1,1       1023,3,16           16    1048591    1048576  512M 83 Linux
/dev/mmcblk0p2    1023,3,16   1023,3,16      1048592    2097151    1048560  511M 83 Linux

Command (m for help): w
The partition table has been altered.
Calling ioctl() to re-read partition table
 mmcblk0: p1 p2
/ # cat /proc/partitions
major minor  #blocks  name

  31        0     131072 mtdblock0
  31        1      32768 mtdblock1
 179        0    1048576 mmcblk0
 179        1     524288 mmcblk0p1
 179        2     524280 mmcblk0p2

/ # ls -al /dev/mmcblk0p1
brw-rw----    1 0        0         179,   1 Sep  2 06:13 /dev/mmcblk0p1
/ # ls -al /dev/mmcblk0p2
brw-rw----    1 0        0         179,   2 Sep  2 06:13 /dev/mmcblk0p2

Cylinder-head-sector (CHS) 是早期硬碟定址的方式, head就是讀寫頭, 每個磁盤(platter)兩面各有一個讀寫頭, 而一個又一個的同心圓就是track, 一個track被切割成若干個sector, 早期每個sector大小約512 byte, Cylinder是把所有磁盤上相同同心圓的總稱. 可以參考Wiki的圖片與解說
而後來改用LBA(Logical Block Addressing)取代了CHS, 其轉換公式如下,
LBA = (Cylinder × HPC + Head) × SPT + (Sector − 1)
Cylinder = LBA ÷ (HPC × SPT)
Head = (LBA ÷ SPT) mod HPC
S = (LBA mod SPT) + 1

HPC(Header Per Cylinder=4)與SPT(Sector Per Track=16)都是由driver回傳.以上述/dev/mmcblk0p1的StartLBA=16, 是因為offset就是16, EndLBA=1048591= (Cylinder=32768 × HPC=4 + Head) × SPT=16 + (Sector − 1), 這些轉換目前已經沒有太大意義了, 只要有概念就可以了

當執行完前面的fdisk作分割之後, 其實fdisk就會將partition table寫入/dev/mmcblk0前面的LBA0中, 也就是我們所謂的 Master Boot Record(MBR), 以下就是MBR的格式
MnemonicByte OffsetByte Length Description
BootCode0440x86 code used on a non-UEFI system to select an MBR par tition record and load the first logical block of that partition. This code shall not be executed on UEFI systems.
Unique MBRDisk Signature4404Unique Disk Signature This may be used by the OS to identify the disk from other disks in the system. This value is always written by the OS and is never written by EFI firmware
Unknown4442Unknown. This field shall not be used by UEFI firmware
Partition Record44616*4Array of four legacy MBR partition records
Signature5102Set to 0xAA55(i.e., byte 510 contains 0x55 and byte 511 contains 0xAA).

以下就是Partition Record的格式
MnemonicByte OffsetByte Length Description
BootIndicator01 0x80 indicates that this is the bootable legacy partition. Other values indicate that this is not a bootable legacy partition. This field shall not be used by UEFI firmware
StartingCHS13Start of partition in CHS address format. This field shall not be used by UEFI firmware
OSType41Type of partition
EndingCHS53End of partition in CHS address format. This field shall not be used by UEFI firmware.
StartingLBA84Starting LBA of the partition on the disk. This field is used by UEFI firmware to determine the start of the partition.
SizeInLBA124Size of the partition in LBA units of logical blocks. This field is used by UEFI firmware to determine the size of the partition

以下是常見的OS Type, 其餘可以參考Partition types
IDName
0x00Empty (Unused)
0x01FAT12 (DOS)
0x05DOS 3.3+ Extended Partition
0x07NTFS (Windows)
0x82Linux SWAP
0x83Linux
0x8ELinux LVM
0xEEGPT Protective Partition

以下的sample code分成兩部分, dump_geometry()與dump_mbr(), dump_geometry()主要是透過IOCTL取得MMC的物理資訊, 也就是Cylinder-head-sector (CHS), dump_mbr()主要是取得partition table資訊
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/ioctl.h>
#include <stdint.h>
#include <linux/fs.h>		// Include the header file for ioctl and BLKSSZGET
#include <linux/hdreg.h>

#define MAX_SECTOR_SIZE 512
#pragma pack(push, 1)
static unsigned char MBRbuffer[MAX_SECTOR_SIZE];
struct PartitionEntry {
    uint8_t boot_ind;
    uint8_t chs_start[3];
    uint8_t type;
    uint8_t chs_end[3];
    uint32_t lba_start;
    uint32_t sectors;
};

// Structure to represent the Master Boot Record
struct MBR {
    uint8_t bootstrap_code[446];
    struct PartitionEntry partitions[4];
    uint16_t signature;
};
#pragma pack(pop)		// Restore default packing

static int sec_size = 0;
static struct hd_geometry geo;
static int dump_geometry(int fd)
{
    unsigned long long sz;
    if (ioctl(fd, BLKSSZGET, &sec_size) < 0) {
	printf("get BLKSSZGET failed\n");
	return 1;
    }

    if (ioctl(fd, HDIO_GETGEO, &geo) < 0) {
	// Print the obtained geometry information
	printf("%s(#%d): Cylinders: %d\n", __FUNCTION__, __LINE__, geo.cylinders);
	printf("%s(#%d): Heads: %d\n", __FUNCTION__, __LINE__, geo.heads);
	printf("%s(#%d): Start: %lu\n", __FUNCTION__, __LINE__, (unsigned long) geo.start);
	return 1;
    }
    sz = geo.cylinders * geo.heads * geo.sectors * sec_size;
    printf("%d MB, %llu bytes, %lu sectors\n", sz / (1024 * 1024), sz, geo.sectors);
    printf("%lu cylinders, %d heads, %d sectors/track\n", geo.cylinders, geo.heads, geo.sectors);
    printf("Units: sectors %d bytes\n\n", sec_size);
    return 0;
}

static void set_hsc(unsigned long lba, unsigned long *h, unsigned long *s, unsigned long *c)
{
    if ((lba / (geo.sectors * geo.heads) > 1023))
	lba = geo.heads * geo.sectors * 1024 - 1;
    *s = (lba % geo.sectors) + 1;
    *h = (lba / geo.sectors) % geo.heads;
    *c = (lba / geo.sectors) / geo.heads;
}


static int dump_mbr(int fd, char const * const prefix)
{
    struct MBR *mbr;
    struct PartitionEntry *p;
    ssize_t bytes_read;
    unsigned long lba, sc, sh, ss, ec, eh, es;

    bytes_read = read(fd, MBRbuffer, sizeof(MBRbuffer));
    if (bytes_read != 512) {
	printf("read failed: %d\n", bytes_read);
	return 1;
    }

    mbr = (struct MBR *) MBRbuffer;
    if (mbr->signature != 0xAA55) {
	printf("invalid MBR. signature:%04x/ M[510]:%02x, M[511]:%02x\n", mbr->signature, MBRbuffer[510], MBRbuffer[511]);
	return 1;
    }

    for (int i = 0; i < 4; i++) {
	p = &(mbr->partitions[i]);
	if (p->sectors == 0) {
	    continue;
	}
	lba = p->lba_start;
	set_hsc(lba, &sh, &ss, &sc);

	lba = p->lba_start + p->sectors - 1;
	set_hsc(lba, &eh, &es, &ec);

	printf("Device\t\t Boot\t StartCHS\t EndCHS\t StartLBA\t EndLBA\t Sectors\n");
	printf("%sp%d \t %c \t %d,%d,%d \t\t %d,%d,%d \t %d\t %d \t %lu \t %dM \t %x\n",
	       prefix, i, (p->boot_ind & 0x80) ? '*' : ' ', sc, sh, ss, ec, eh, es,
	       p->lba_start, p->lba_start + p->sectors - 1, p->sectors, (p->sectors * sec_size) / (1024 * 1024), p->type);
    }
}

static int sys_check(int argc, char *argv[])
{
    struct MBR *mbr;
    struct PartitionEntry *p;

    if (argc < 2) {
	printf("./%s <dev_name>\n", argv[0]);
	exit(-1);
    }

    if (sizeof(*mbr) != 512) {
	printf("invalid mbr size:%d\n", sizeof(*mbr));
	exit(-1);
    }

    if (sizeof(*p) != 16) {
	printf("invalid mbr size:%d\n", sizeof(*p));
	exit(-1);
    }
}


int main(int argc, char *argv[])
{
    const char *dev = argv[1];	// Replace with your device path
    struct MBR *mbr;
    struct PartitionEntry *p;
    int fd, sec_size = 0;

    sys_check(argc, argv);
    fd = open(dev, O_RDONLY);
    if (fd == -1) {
	printf("Error opening device %s failed\n", dev);
	return 1;
    }

    printf("Disk %s: ", dev);
    dump_geometry(fd);
    dump_mbr(fd, dev);

    close(fd);
    return 0;
}

以下為sample code執行結果
/ # /list-part /dev/mmcblk0
Disk /dev/mmcblk0: 1073741824 MB, 1024 bytes, 1073741824 sectors
32768 cylinders, 4 heads, 16 sectors/track
Units: sectors 512 bytes

Device          Boot   StartCHS   EndCHS      StartLBA   EndLBA    Sectors
/dev/mmcblk0p0   *     0,1,1      1023,3,16       16      1048591  1048576 512M 83
Device          Boot   StartCHS   EndCHS      StartLBA   EndLBA    Sectors
/dev/mmcblk0p1         1023,3,16  1023,3,16   1048592     2097151  1048560 511M 83



參考資料:
  • https://en.wikipedia.org/wiki/Cylinder-head-sector, Cylinder-head-sector
  • Documentation/ioctl/hdio.txt, Summary of HDIO_ ioctl calls
  • busybox/blob/master/util-linux/fdisk.c, busybox fdisk
  • https://uefi.org/sites/default/files/resources/UEFI_Spec_2_10_Aug29.pdf, CH5 - GUID PARTITION TABLE (GPT) DISK LAYOUT




2023年8月6日 星期日

Linux Kernel(22.1)- My Socket Domain and Protocol


本章主要參考Add a new protocol to Linux Kernel寫一個自創新的socket protocol family小範例, 主要要填寫“struct proto” (/include/net/sock.h) 與“struct net_proto_family” (/include/linux/net.h)相關的operation,再分別用proto_register(struct proto *)與sock_register(struct net_proto_famil*)去跟系統註冊, 並將struct proto_ops分配給socket, 讓對應的system call都能找到對應的operation去執行

首先要先呼叫“proto_register()”跟系統註冊protocol handler.
struct my_sock {
  /* struct sock must be the first member of my_sock */
  struct sock sk;
  int channel;
};

static struct proto my_proto = {
  .name = "MYSOCK",
  .owner = THIS_MODULE,
  .obj_size = sizeof(struct my_sock),
};

static int __init myproto_init(void)
{
  int ret = -1;

  ret = proto_register(&my_proto, 0);
  if (ret) {
    mypr_err("Failed to register myprotocol\n");
    return ret;
  }
  ...
}

這個註冊動作只是把自訂的proto加入proto_list中, 我跳過這個註冊也不影響該範例, 有空再來研究細節吧, 註冊成功後可以在/proc/net/protocols中看見.
/ # cat /proc/net/protocols | grep MY
/ # insmod /lib/modules/5.15.0/extra/socket_demo.ko
socket_demo: loading out-of-tree module taints kernel.
NET: Registered PF_MCTP protocol family
myproto_init(#182)myprotocol module loaded
/ # cat /proc/net/protocols | grep MY
MYSOCK     504      0      -1   NI       0   no   socket_demo  n  n  n  n  n  n  n  n  n  n  n  n  n  n  n  n  n  n  n

接著要註冊socket layer的handler, 是透過sock_register()註冊到net_families[NPROTO=AF_MAX]中, 當user space呼叫socket()時, 就會透過sock_rgister()所掛載的create()創建對應的socket.
socket() /* userspace */
|-> SYSCALL_DEFINE3(socket, int, family, int, type, int, protocol) /* kernel */
  |-> __sys_socket(family, type, protocol);
    |-> __sock_create(family, type, protocol, &sock);
      |-> __sock_create(current->nsproxy->net_ns, family, type, protocol, res, 0);
        |-> pf = rcu_dereference(net_families[family]);
        |-> err = pf->create(net, sock, protocol, kern);
  |->sock_map_fd(sock, flags & (O_CLOEXEC | O_NONBLOCK));

相對應的"sock_register()"代碼
#define PF_MYPROTO 45         // (AF_MAX - 1), 隨意給個我沒用的PROTO
#define AF_MYPROTO PF_MYPROTO

#define mypr_info(fmt, ...)  pr_info("%s(#%d)"fmt, __func__, __LINE__, ##__VA_ARGS__);
#define mypr_err(fmt, ...)  pr_err("%s(#%d)"fmt, __func__, __LINE__, ##__VA_ARGS__);

/* for user space */
struct sockaddr_my {
  int channel;
};

static const struct proto_ops my_proto_ops = {
  .family = PF_MYPROTO,
  .owner = THIS_MODULE,
  .bind = my_bind,
  .listen = my_listen,
  .accept = my_accept,
  .connect = my_connect,
  .release = my_release,
  .sendmsg = my_sendmsg,
  .recvmsg = my_recvmsg,
};

static int myproto_create(struct net *net, struct socket *sock, int protocol, int kern)
{
  struct sock *sk;
  struct my_sock *my_sock;
  // 這裡的alloc會把my_proto帶入, 這樣在alloc時, 就可以alloc "struct my_sock"大小的記憶體
  // struct my_sock的struct sock sk;可以用kernel的sk相關函數操作, 自定義部分再轉型成"my_sock"去操作
  sk = sk_alloc(net, PF_MYPROTO, GFP_KERNEL, &my_proto, kern);
  if (!sk) {
    mypr_err("sk_alloc failed\n");
    return -ENOMEM;
  }
  // 將socket operation掛上來, 屆時對應的system call就會呼叫到對應的socket operation
  sock->ops = &my_proto_ops;
  // struct sock *sk 剛alloc, 透過sock_init_data()做一下init, 並將sock與sk做關聯
  // sk->sk_socket = sock;
  sock_init_data(sock, sk);
  // sk已經透過sock_init_data()處理好後, 再轉型成my_sock做自定義操作
  my_sock = (struct my_sock *) sk;
  my_sock->channel = 999; // 範例而已, 沒特別意思
  mypr_info("default channel:%d\n", my_sock->channel);

  return 0;
}

static struct net_proto_family myproto_family = {
  .family = PF_MYPROTO,
  .create = myproto_create,
  .owner = THIS_MODULE,
};

static int __init myproto_init(void)
{
  ret = sock_register(&myproto_family);
  if (ret) {
    mypr_err("Failed to register myprotocol family\n");
    proto_unregister(&my_proto);
    return ret;
  }

  mypr_err("myprotocol module loaded\n");
  return 0;
}

下面舉幾個socket operation從user到kernel的socket operation的路徑
bind() /* userspace */
|-> SYSCALL_DEFINE3(bind, int, fd, struct sockaddr __user *, umyaddr, int, addrlen) // kernel space
  |-> _sys_bind(fd, umyaddr, addrlen);
    |-> sock = sockfd_lookup_light(fd, &err, &fput_needed);
    |-> sock->ops->bind(sock,(struct sockaddr *)&address, addrlen);
    
listen() // userspace
|-> SYSCALL_DEFINE2(listen, int, fd, int, backlog) // kernel space
  |-> __sys_listen(fd, backlog);
    |-> sock = sockfd_lookup_light(fd, &err, &fput_needed);
    |-> sock->ops->listen(sock,(struct sockaddr *)&address, addrlen);  
從上面的範例不難理解, 大概就是在system call(__sys_xx())時直接呼叫對應的socket operation, 但是, 用過user space的都知道, 也可以透過read()/write()呼叫對應的sendmsg()與recvmsg(), 主要是在__sys_socket()時, 透過sock_map_fd()將file operation掛上去, 其中的read()/write()就是對應到sendmsg()/recvmsg().
int sock_map_fd(struct socket *sock, int flags)
|-> sock_alloc_file(sock, flags, NULL);
  |-> alloc_file_pseudo(&socket_file_ops);
    |-> file = alloc_file(&path, flags, fops);
      |-> file->f_op = fop;
      
static const struct file_operations socket_file_ops = {
  .read_iter =    sock_read_iter,
  .write_iter =   sock_write_iter,
};

sock_read_iter(struct kiocb *iocb, struct iov_iter *to)
|-> sock_recvmsg(sock, &msg, msg.msg_flags);

sock_write_iter(struct kiocb *iocb, struct iov_iter *from)
|-> res = sock_sendmsg(sock, &msg);

這篇只有簡單的介紹一下相關的API, 所以底下的socket operation都只是簡單的印出訊息, sendmsg()則是將user資料印出, 而recvmsg()則是固定回傳"My test", 如果不支援的socket operation可以使用sock_no_xxx即可.
/* Bind socket to specified sockaddr. */
static int my_bind(struct socket *sock, struct sockaddr *saddr, int len)
{
  DECLARE_SOCKADDR(struct sockaddr_my *, addr, saddr);
  struct my_sock *my_sock = my_sock_sk(sock->sk);
  struct sock *sk = sock->sk;
  mypr_info("sock->channel %d\n", my_sock->channel);
  if (len < sizeof(*addr)) {
    mypr_err("len of addr is small\n");
    return -EINVAL;
  }
  my_sock->channel = addr->channel;
  return 0;
}

static int my_listen(struct socket *sock, int len)
{
  struct my_sock *my_sock = my_sock_sk(sock->sk);
  mypr_info("sock->channel %d\n", my_sock->channel);
  return sock_no_listen(sock, len);
}

static int my_accept(struct socket *sock, struct socket *newsock, int flags, bool kern)
{
  struct my_sock *my_sock = my_sock_sk(sock->sk);
  mypr_info("sock->channel %d\n", my_sock->channel);
  return sock_no_accept(sock, newsock, flags, kern);
}

static int my_release(struct socket *sock)
{
  struct my_sock *my_sock = my_sock_sk(sock->sk);
  mypr_info("sock->channel %d\n", my_sock->channel);
  return 0;
}

static int my_connect(struct socket *sock, struct sockaddr *saddr, int len, int flags)
{
  DECLARE_SOCKADDR(struct sockaddr_my *, addr, saddr);
  struct my_sock *my_sock = my_sock_sk(sock->sk);
  struct sock *sk = sock->sk;

  if (len < sizeof(*addr)) {
    return -EINVAL;
  }
  return 0;
}

static int my_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, int flags)
{
  struct my_sock *my_sock = my_sock_sk(sock->sk);
  struct sock *sk = sock->sk;
  struct sk_buff *skb;
  int err;
  size_t copied;
  unsigned char buf[] = "My test";
  memcpy_to_msg(msg, buf, sizeof(buf));

  return sizeof(buf);
}

static int my_sendmsg(struct socket *sock, struct msghdr *msg, size_t len)
{
  struct my_sock *my_sock = my_sock_sk(sock->sk);
  struct sock *sk = sock->sk;
  int err;
  unsigned *buf;
  mypr_info("len:%d, channel:%d\n", len, my_sock->channel);

  buf = kmalloc(len + 1, GFP_KERNEL);
  if (!buf) {
    return -ENOMEM;
  }
  // Safely copy data from user space to kernel space
  memset(buf, 0, len + 1);
  err = memcpy_from_msg(buf, msg, len);
  mypr_info("data: err:%d, msg:%s\n", err, (char *) buf);
  kfree(buf);

  return len;
}

完整的Module code
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/socket.h>
#include <linux/net.h>
#include <linux/sockios.h>
#include <linux/netdevice.h>
#include <linux/errno.h>
#include <linux/proc_fs.h>
#include <linux/file.h>
#include <linux/fs.h>
#include <net/protocol.h>

#define PF_MYPROTO 45		// (AF_MAX - 1)
#define AF_MYPROTO PF_MYPROTO

#define mypr_info(fmt, ...)  pr_info("%s(#%d)"fmt, __func__, __LINE__, ##__VA_ARGS__);
#define mypr_err(fmt, ...)  pr_err("%s(#%d)"fmt, __func__, __LINE__, ##__VA_ARGS__);

#include <net/sock.h>
struct my_sock {
  /* struct sock must be the first member of my_sock */
  struct sock sk;
  int channel;
};

static inline struct my_sock *my_sock_sk(struct sock *sk)
{
  return container_of(sk, struct my_sock, sk);
}

/* for user space */
struct sockaddr_my {
  int channel;
};

static struct proto my_proto = {
  .name = "MYSOCK",
  .owner = THIS_MODULE,
  .obj_size = sizeof(struct my_sock),
};

/* Bind socket to specified sockaddr. */
static int my_bind(struct socket *sock, struct sockaddr *saddr, int len)
{
  DECLARE_SOCKADDR(struct sockaddr_my *, addr, saddr);
  struct my_sock *my_sock = my_sock_sk(sock->sk);
  struct sock *sk = sock->sk;
  mypr_info("sock->channel %d\n", my_sock->channel);
  if (len < sizeof(*addr)) {
    mypr_err("len of addr is small\n");
    return -EINVAL;
  }
  my_sock->channel = addr->channel;
  return 0;
}

static int my_listen(struct socket *sock, int len)
{
  struct my_sock *my_sock = my_sock_sk(sock->sk);
  mypr_info("sock->channel %d\n", my_sock->channel);
  return sock_no_listen(sock, len);
}

static int my_accept(struct socket *sock, struct socket *newsock, int flags, bool kern)
{
  struct my_sock *my_sock = my_sock_sk(sock->sk);
  mypr_info("sock->channel %d\n", my_sock->channel);
  return sock_no_accept(sock, newsock, flags, kern);
}

static int my_release(struct socket *sock)
{
  struct my_sock *my_sock = my_sock_sk(sock->sk);
  mypr_info("sock->channel %d\n", my_sock->channel);
  return 0;
}

static int my_connect(struct socket *sock, struct sockaddr *saddr, int len, int flags)
{
  DECLARE_SOCKADDR(struct sockaddr_my *, addr, saddr);
  struct my_sock *my_sock = my_sock_sk(sock->sk);
  struct sock *sk = sock->sk;

  if (len < sizeof(*addr)) {
    return -EINVAL;
  }
  return 0;
}

static int my_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, int flags)
{
  struct my_sock *my_sock = my_sock_sk(sock->sk);
  struct sock *sk = sock->sk;
  struct sk_buff *skb;
  int err;
  size_t copied;
  unsigned char buf[] = "My test";
  memcpy_to_msg(msg, buf, sizeof(buf));

  return sizeof(buf);
}

static int my_sendmsg(struct socket *sock, struct msghdr *msg, size_t len)
{
  struct my_sock *my_sock = my_sock_sk(sock->sk);
  struct sock *sk = sock->sk;
  int err;
  unsigned *buf;
  mypr_info("len:%d, channel:%d\n", len, my_sock->channel);

  buf = kmalloc(len + 1, GFP_KERNEL);
  if (!buf) {
    return -ENOMEM;
  }
  // Safely copy data from user space to kernel space
  memset(buf, 0, len + 1);
  err = memcpy_from_msg(buf, msg, len);
  mypr_info("data: err:%d, msg:%s\n", err, (char *) buf);
  kfree(buf);

  return len;
}

static const struct proto_ops my_proto_ops = {
  .family = PF_MYPROTO,
  .owner = THIS_MODULE,
  .bind = my_bind,
  .listen = my_listen,
  .accept = my_accept,
  .connect = my_connect,
  .release = my_release,
  .sendmsg = my_sendmsg,
  .recvmsg = my_recvmsg,
};

static int myproto_create(struct net *net, struct socket *sock, int protocol, int kern)
{
  struct sock *sk;
  struct my_sock *my_sock;
  sk = sk_alloc(net, PF_MYPROTO, GFP_KERNEL, &my_proto, kern);
  if (!sk) {
    mypr_err("sk_alloc failed\n");
    return -ENOMEM;
  }
  sock->ops = &my_proto_ops;
  sock_init_data(sock, sk);
  my_sock = (struct my_sock *) sk;
  my_sock->channel = 999;
  mypr_info("default channel:%d\n", my_sock->channel);

  return 0;
}

static struct net_proto_family myproto_family = {
  .family = PF_MYPROTO,
  .create = myproto_create,
  .owner = THIS_MODULE,
};

static int __init myproto_init(void)
{
  int ret = -1;

  ret = proto_register(&my_proto, 0);
  if (ret) {
    mypr_err("Failed to register myprotocol\n");
    return ret;
  }

  ret = sock_register(&myproto_family);
  if (ret) {
    mypr_err("Failed to register myprotocol family\n");
    proto_unregister(&my_proto);
    return ret;
  }

  mypr_err("myprotocol module loaded\n");
  return 0;
}

static void __exit myproto_exit(void)
{
  sock_unregister(PF_MYPROTO);
  proto_unregister(&my_proto);
  mypr_info("myprotocol module unloaded\n");
}

module_init(myproto_init);
module_exit(myproto_exit);

MODULE_LICENSE("GPL");

完整的User code
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <arpa/inet.h>

#define AF_MYPROTO 45
#define PF_MYPROTO AF_MYPROTO

struct sockaddr_my {
  int channel;
};

int main(int argc, char *argv[]) {
    int sfd, new_socket, ret;
    struct sockaddr_my saddr;
    char buf[128];

    // Create a socket
    printf("%s(#%d): socket\n", __FUNCTION__, __LINE__);
    sfd = socket(AF_MYPROTO, SOCK_STREAM, 0);
    if (sfd == -1) {
        perror("Socket creation failed");
        exit(EXIT_FAILURE);
    }

    // Set up the server address structure
    saddr.channel = 123;

    printf("%s(#%d): bind\n", __FUNCTION__, __LINE__);
    // Bind the socket to the specified port
    if (bind(sfd, (struct sockaddr *)&saddr, sizeof(saddr)) == -1) {
        perror("Bind failed");
    }

    printf("%s(#%d): listen\n", __FUNCTION__, __LINE__);
    // Listen for incoming connections
    if (listen(sfd, 1) == -1) {
        perror("Listen failed");
    }

    ret = write(sfd, argv[1], strlen(argv[1]));
    if (ret < 0) {
        perror("write");
	exit(0);
    }
    printf("write: %d\n", ret);

    memset(buf, 0, sizeof(buf));
    ret = read(sfd, buf, sizeof(buf));
    printf("read: %d/%s\n", ret, buf);

    // Close the server socket
    close(sfd);

    return 0;
}

執行結果
/ # insmod /lib/modules/5.15.0/extra/socket_demo.ko
socket_demo: loading out-of-tree module taints kernel.
NET: Registered PF_MCTP protocol family
myproto_init(#178)myprotocol module loaded
/ # /my_sock abc
main(#23): socket
myproto_create(#150)default channel:999
main(#33): bind
my_bind(#49)sock->channel 999
main(#39): listen
my_listen(#61)sock->channel 123
Listen failed: Operation not supported
my_sendmsg(#110)len:3, channel:123
my_sendmsg(#119)data: err:0, msg:abc
write: 3
read: 8/My test
my_release(#75)sock->channel 123


    參考資料:
  • Add a new protocol to Linux Kernel, https://linuxwarrior.wordpress.com/2008/12/02/add-a-new-protocol-to-linux-kernel/
  • https://lishiwen4.github.io/network/socket-interface-and-network-protocol
  • https://www.cnblogs.com/hellokitty2/p/10188376.html
  • https://liuhangbin.netlify.app/post/linux-socket/
  • https://hackmd.io/@rickywu0421/linux_networking_1




2023年8月4日 星期五

Linux Kernel(21.1)- ID Allocation


如同ID Allocation的Overview提到的, kernel提供了對應的一些API, 用以產生與維護identifiers (IDs), 舉凡file descriptor, process IDs, device instance number等等. IDR主要多了ID與pointer的對用能力, 而IDA就是單純的分配ID, 本章透過簡單的程式碼讓大家能瞭解與使用IDA.

首先, 該範例是個簡易的kernel module, 透過DEFINE_IDA(my_ida)宣告一個my_ida變數, 這是我們, 並透過read file operation去取得一個新的ID(ida_simple_get), 在write file operation中透過寫入特定ID移除該ID(ida_simple_remove), 最後在移除kernel module時, 使用ida_destroy(struct ida * ida)把所有的IDA resource都釋放, 不然會造成memory leak.
#include <linux/init.h>
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/idr.h>
#include <linux/moduleparam.h>
#include <linux/fs.h>
#include <linux/proc_fs.h>
#include <linux/seq_file.h>
#include <linux/uaccess.h> // Required for copy_from_user

MODULE_LICENSE("GPL");
MODULE_AUTHOR("Brook");
MODULE_DESCRIPTION("Kernel module to demo IDA");
MODULE_VERSION("0.1");

static DEFINE_IDA(my_ida);

static ssize_t ida_demo_read(struct file *file, char *buf, size_t count, loff_t *ppos)
{
    int id, len;
    char tmp_buf[10];

    id = ida_simple_get(&my_ida, 0, 0, GFP_KERNEL);
    if (id >= 0) {
        printk(KERN_INFO "IDR Demo: Successfully allocated ID: %d\n", id);
    } else {
        printk(KERN_ERR "IDR Demo: Failed to allocate ID\n");
        return -ENOMEM;
    }

    len = snprintf(tmp_buf, sizeof(tmp_buf), "%d", id);
    if (len < 0) {
        return -EINVAL;
    }

    if (copy_to_user(buf, tmp_buf, len)) {
        return -EFAULT;
    }

    return 0;
}

static ssize_t
ida_demo_write(struct file *file, const char __user *buf, size_t count, loff_t *ppos)
{
    char tmp_buf[20];
    int id;

    if (count >= sizeof(tmp_buf))
        return -EINVAL;

    if (copy_from_user(tmp_buf, buf, count))
        return -EFAULT;

    tmp_buf[count] = '\0';
    // Convert the input string to an integer
    if (kstrtoint(tmp_buf, 10, &id)) {
        printk(KERN_ERR "invalid ID: %s\n", tmp_buf);
        return -EINVAL;
    }

    printk(KERN_INFO "remove ID %d\n", id);
    ida_simple_remove(&my_ida, id);

    return count;
}

// Define a file operation structure for IDA access
static struct file_operations ida_fops = {
    .open = simple_open,
    .read = ida_demo_read,
    .write = ida_demo_write,
    .llseek = default_llseek,
};

static int __init ida_demo_init(void)
{
    struct proc_dir_entry *proc_entry;

    printk(KERN_INFO "IDR Demo: Initializing module\n");

    // Create a file entry to invoke 'ida'
    proc_entry = proc_create("ida", S_IRUGO, NULL, &ida_fops);
    if (!proc_entry) {
        printk(KERN_ERR "Failed to create sysfs entry for 'ida'\n");
        return -ENOMEM;
    }
    return 0;
}

static void __exit ida_demo_exit(void)
{
    printk(KERN_INFO "IDR Demo: Exiting module\n");
    ida_destroy(&my_ida);
}

module_init(ida_demo_init);
module_exit(ida_demo_exit);


簡易的Makefile
KDIR ?= /build/brook/Projects/qemu/linux/
# Modules which are included in the kernel are installed in the
# directory:
#       /lib/modules/$(KERNELRELEASE)/kernel/
# And external modules are installed in:
#       /lib/modules/$(KERNELRELEASE)/extra/
#
# INSTALL_MOD_PATH
# A prefix can be added to the
#       installation path using the variable INSTALL_MOD_PATH:
#
#       $ make INSTALL_MOD_PATH=/frodo modules_install
#       => Install dir: /frodo/lib/modules/$(KERNELRELEASE)/kernel/
export INSTALL_MOD_PATH=/build/brook/Projects/qemu/initrd-arm
obj-m := ida_demo.o
ida_demo-y := ida_main.o

modules modules_install clean:
 $(MAKE) -C $(KDIR) M=$$PWD $@


這是在QEMU下執行的結果, 會先產生ID 0/1/2, 然後移除1, 接著再產生的ID就會把1生出來, 再來就是3了, 所以透過IDA, 可以幫user管理ID(唯一的編號)
/ # uname -a
Linux (none) 5.4.0+ #6 SMP Tue Jan 3 08:39:24 CST 2023 armv7l GNU/Linux
/ # insmod /lib/modules/5.4.0+/extra/ida_demo.ko
ida_demo: loading out-of-tree module taints kernel.
IDR Demo: Initializing module
/ # cat /proc/ida
IDR Demo: Successfully allocated ID: 0
/ # cat /proc/ida
IDR Demo: Successfully allocated ID: 1
/ # cat /proc/ida
IDR Demo: Successfully allocated ID: 2
/ # echo 1 > /proc/ida
remove ID 1
/ # cat /proc/ida
IDR Demo: Successfully allocated ID: 1
/ # cat /proc/ida
IDR Demo: Successfully allocated ID: 3



  • https://www.kernel.org/doc/html/v5.4/core-api/idr.html, ID Allocation




熱門文章