Nano雞排: Linux

顯示具有 Linux - kernel 標籤的文章。顯示所有文章

2009年12月27日星期日

Linux Modules（7.2）- tasklet

Tasklet和timer類似(基本上都是運作在Softirqs上面)，但是不同於timer會在特定時間執行，tasklet會在下一次interrupt來臨時執行。Tasklet有兩種implement，分別為TASKLET_SOFTIRQ和HI_SOFTIRQ，這兩種的差別在於HI_SOFTIRQ筆TASKLET_SOFTIRQ早執行。另外Tasklet只在註冊的CPU上面執行，而且註冊的tasklet同一時間只會被某個CPU執行。

您可以dynamically或statically的建立tasklet，
DECLARE_TASKLET(task, func, data);
DECLARE_TASKLET_DISABLED(task, func, data);

tasklet_init(task, func, data);

宣告後，還必須呼叫tasklet_schedule(task)才會被執行，但如果是用
DECLARE_TASKLET_DISABLED()宣告成disabled狀態，那就還必須用tasklet_enable()將其狀態設成enabled才能被執行。您也可以透過tasklet_disabled() disabled某個tasklet。tasklet_kill()可以保證tasklet不會被schedule，如果已經在執行，就會等它執行結束。

#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/init.h>
#include <linux/interrupt.h>
#include <linux/slab.h>

MODULE_LICENSE("GPL");

static void f(unsigned long name);

// create tasklet statically
static DECLARE_TASKLET(t1, f, (unsigned long)"t1");
static DECLARE_TASKLET_DISABLED(t2, f, (unsigned long)"t2");

static struct tasklet_struct *t3;

static void f(unsigned long name)
{
    printk("%s(): on cpu %d\n", (char*)name, smp_processor_id());
}

static void f3(unsigned long name)
{
    static u32 c = 0;
    tasklet_schedule(t3);
    if (!(c++ % 2000000)) { // 每隔2000000次呼叫就印出訊息
        printk("%s(): on cpu %d\n", (char*)name, smp_processor_id());
    }
}

static int __init init_modules(void)
{
    // create tasklet dynamically
    t3 = kzalloc(sizeof(struct tasklet_struct), GFP_KERNEL);
    tasklet_init(t3, f3, (unsigned long)"t3");

    tasklet_schedule(&t1);
    tasklet_schedule(&t2);
    tasklet_schedule(t3);
    tasklet_enable(&t2); // 沒有enable就不會被啟動
    return 0;
}

static void __exit exit_modules(void)
{
    // remove module就應該要確保tasklet有被移除
    tasklet_kill(&t1);
    tasklet_kill(&t2);
    tasklet_kill(t3);
}

module_init(init_modules);
module_exit(exit_modules);

Based on Kernel Version：2.6.35

參考資料：
Linux Kernel Development 3rd.
Linux Device Driver 3rd, http://www.makelinux.net/ldd3/chp-7-sect-5.shtml

Linux Kernel（7.1）- timer

有時候我們希望能在某個時間點執行某些動作，這時候便可以使用timer，在使用timer有些規矩必須被遵守。因為不是user-space來喚起，所以不允許存取user-space，current也就沒有意義。不能休眠，也不准schedule()或者任何有可能休眠的動作都不准。

struct timer_list {
 struct list_head entry;
 unsigned long expires;

 void (*function)(unsigned long);
 unsigned long data;

 struct tvec_base *base;
#ifdef CONFIG_TIMER_STATS
 void *start_site;
 char start_comm[16];
 int start_pid;
#endif
#ifdef CONFIG_LOCKDEP
 struct lockdep_map lockdep_map;
#endif
};

timer_list必須初始化之後才能使用，您可以選擇init_timer()或TIMER_INITIALIZER()，接著就可以設定expires/callback function/data(參數)，並且使用add_timer()將其加入timer中，或者使用del_timer()移除pending中的timer，也可以使用mod_timer()修改或者重新設定timer。

#include <linux/init.h>
#include <linux/module.h>
#include <linux/timer.h>

MODULE_LICENSE("GPL");

struct timer_list brook_timer;
static void callback(unsigned long);
struct data {
    int count;
};
static struct data data;

static void callback(unsigned long data)
{
    struct data *dp = (struct data*) data;
    printk("%s(): %d\n", __FUNCTION__, dp->count++);
    mod_timer(&brook_timer, jiffies + 5 * HZ);
}

static int __init init_modules(void)
{
    init_timer(&brook_timer);
    brook_timer.expires = jiffies + 5 * HZ;
    brook_timer.function = &callback;
    brook_timer.data = (unsigned long) &data;
    add_timer(&brook_timer);
    return 0;
}

static void __exit exit_modules(void)
{
    del_timer(&brook_timer);
}

module_init(init_modules);
module_exit(exit_modules);

kernel timer最短的間隔是1個jiffies，而且會受到硬體中斷，和其他非同步事件的干擾，所以不適合非常精密的應用。

Linux Kernel（7）- timing

kernel會定期產生timer interrupt，HZ定義每秒產生timer interrupt的次數，定義在linux/param.h，根據平台的不同從50~1200不等。
而jiffies每當發生一次timer interrupt就會遞增一次，jiffies定義於linux/jiffies.h，所以簡單的說，jiffies就等於1/HZ，不管在64bit或32bit上的機器，Linux kernel都使用64位元版的jiffies_64，而jiffies其實是jiffies_64的低32位元版，除了讀取外，我們都不應該直接修改jiffies/jiffies_64。
kernel提供幾組macro來比較時間的先後，time_after()/timer_before()/time_after_eq()/time_before_eq()。

/*
 * These inlines deal with timer wrapping correctly. You are 
 * strongly encouraged to use them
 * 1. Because people otherwise forget
 * 2. Because if the timer wrap changes in future you won't have to
 *    alter your driver code.
 *
 * time_after(a,b) returns true if the time a is after time b.
 *
 * Do this with "<0" and ">=0" to only test the sign of the result. A
 * good compiler would generate better code (and a really good compiler
 * wouldn't care). Gcc is currently neither.
 */
#define time_after(a,b)  \
 (typecheck(unsigned long, a) && \
  typecheck(unsigned long, b) && \
  ((long)(b) - (long)(a) < 0))
#define time_before(a,b) time_after(b,a)

#define time_after_eq(a,b) \
 (typecheck(unsigned long, a) && \
  typecheck(unsigned long, b) && \
  ((long)(a) - (long)(b) >= 0))
#define time_before_eq(a,b) time_after_eq(b,a)

另外，kernel中有兩種時間的structure，struct timeval和struct timespec。

#ifndef _STRUCT_TIMESPEC
#define _STRUCT_TIMESPEC
struct timespec {
 __kernel_time_t tv_sec;   /* seconds */
 long  tv_nsec;  /* nanoseconds */
};
#endif

struct timeval {
 __kernel_time_t  tv_sec;  /* seconds */
 __kernel_suseconds_t tv_usec; /* microseconds */
};

早期以timeval為主，後來因為精密度的需求，有了timespec的誕生。kernel也提供了和jiffies的轉換函數。更多的轉換可以參考linux/jiffies.h

unsigned long timespec_to_jiffies(const struct timespec *value);
void jiffies_to_timespec(const unsigned long jiffies,
    struct timespec *value);
unsigned long timeval_to_jiffies(const struct timeval *value);
void jiffies_to_timeval(const unsigned long jiffies,
          struct timeval *value);

Linux Kernel（3.1）- procfs之vector方式寫入

相信很多人有讀寫過/proc/sys/kernel/printk來控制printk的level，於是乎我就仿照了kernel/sysctl.c的do_proc_dointvec()寫了一個這樣的code，我的write_proc_t就是在做do_proc_dointvec()當中的write。kernel因為沒有豐富的library，所以作這樣的事情得小繞一下。

#include <linux/init.h>
#include <linux/module.h>
#include <linux/proc_fs.h>
#include <linux/uaccess.h>
#include <linux/ctype.h>

MODULE_LICENSE("GPL");

static int int_vec[] = {1, 1, 1};

static int write_proc(struct file *file, const char __user *buf,
                       unsigned long count, void *data)
{
    int *i, vleft, neg, left = count;
    char __user *s = buf;
    char tmpbuf[128], *p;
    size_t len;
    unsigned long ulval;

    i = int_vec;
    vleft = sizeof(int_vec)/sizeof(int_vec[0]);

    for(;left && vleft--; i++) {
        while(left) {
            char c;
            if (get_user(c, s)) {
                return -EFAULT;
            }
            if (!isspace(c)) {
                break;
            }
            left--;
            s++;
        }
        if (!left) {
            break;
        }
        neg = 0;
        len = left;
        if (len > sizeof(tmpbuf) - 1) {
            len = sizeof(tmpbuf) - 1;
        }
        if (copy_from_user(tmpbuf, s, len)) {
            return -EFAULT;
        }
        tmpbuf[len] = 0;
        p = tmpbuf;
        if (*p == '-' && left > 1) {
            neg = 1;
            p++;
        }
        if (*p < '0' || *p > '9') {
            break;
        }
        ulval = simple_strtoul(p, &p, 0);
        len = p - tmpbuf;
        if ((len < left) && *p && !isspace(*p)) {
            break;
        }
        *i = neg ? -ulval : ulval;
        s += len;
        left -= len;
    }
    return count;
}

static int read_proc(char *page, char **start, off_t off,
                       int count, int *eof, void *data)
{
    int *i, vleft;
    char *p;

    i = (int *) int_vec;
    vleft = sizeof(int_vec)/sizeof(int_vec[0]);

    for (p = page, i = int_vec; vleft--; i++) {
        p += sprintf(p, "%d\t", *i);
    }
    *(p++) = '\n';
    *eof = 1;
    return (p - page);
}

static int __init init_modules(void)
{

    struct proc_dir_entry *ent;

    ent = create_proc_entry("brook_vec", S_IFREG | S_IRWXU, NULL);
    if (!ent) {
        printk("create proc child failed\n");
    } else {
        ent->write_proc = write_proc;
        ent->read_proc = read_proc;
    }
    return 0;
}

static void __exit exit_modules(void)
{
    remove_proc_entry("brook_vec", NULL);
}

module_init(init_modules);
module_exit(exit_modules);

2009年12月26日星期六

kernel探索之linux/typecheck.h

#ifndef TYPECHECK_H_INCLUDED
#define TYPECHECK_H_INCLUDED

/*
 * Check at compile time that something is of a particular type.
 * Always evaluates to 1 so you may use it easily in comparisons.
 */
#define typecheck(type,x) \
({ type __dummy; \
 typeof(x) __dummy2; \
 (void)(&__dummy == &__dummy2); \
 1; \
})

/*
 * Check at compile time that 'function' is a certain type, or is a pointer
 * to that type (needs to use typedef for the function type.)
 */
#define typecheck_fn(type,function) \
({ typeof(type) __tmp = function; \
 (void)__tmp; \
})

#endif  /* TYPECHECK_H_INCLUDED */

typecheck(type, x)用於檢查x是不是"type"的資料型態，如果不是，compiler會出現warning: comparison of distinct pointer types lacks a cast提醒programmer。

typecheck_fn(type, function)用於檢查"function"是不是和"type"有相同的資料型態，如果不是，compiler會出現warning: initialization from incompatible pointer type提醒programmer。

2009年12月24日星期四

Linux Kernel（6）- miscdev

有時候我們需要寫一些"小的驅動程式"，而早期的UNIX/Linux需要註冊major/minor number，即便可能只需要1個minor number，往往卻佔住major number底下的所有minor number，於是在Linux 2.0有了miscellaneous character drivers的誕生，misc driver使用major number 10，然後使用者如果需要這樣的"小驅動程式"，便可以指明minor number即可。

使用misc device必須include <linux/miscdevice.h>，裡面包含了許多的官方的minor number，您可以挑選您適合的minor number，裡面也包含了兩個API，misc_register()/misc_deregister()。
一般您只要填好struct miscdevice的內容，再使用misc_register()進行註冊，或者使用misc_deregister()進行移除即可。

struct miscdevice  {
 int minor;
 const char *name;
 const struct file_operations *fops;
 struct list_head list;
 struct device *parent;
 struct device *this_device;
 const char *devnode;
};

　minor：就是您要註冊的minor number。
　name：device的name。
　fops：file operations。
　其他的就不用理會了。

#include <linux/init.h>
#include <linux/module.h>
#include <linux/fs.h>
#include <linux/miscdevice.h>
MODULE_LICENSE("GPL");

#define DEV_BUFSIZE 1024

static int dev_open(struct inode*, struct file*);
static int dev_release(struct inode*, struct file*);
static ssize_t dev_read(struct file*, char __user*, size_t, loff_t*);
static ssize_t dev_write(struct file*, const char __user *, size_t, loff_t*);
static void __exit exit_modules(void);

static struct file_operations dev_fops = {
    .owner = THIS_MODULE,
    .open = dev_open,
    .release = dev_release,
    .read = dev_read,
    .write = dev_write,
};

static struct miscdevice brook_miscdev = {
    .minor      = 11,
    .name       = "brook_dev",
    .fops       = &dev_fops,
};

static int
dev_open(struct inode *inode, struct file *filp)
{
    printk("%s():\n", __FUNCTION__);
    return 0;
}

static int
dev_release(struct inode *inode, struct file *filp)
{
    printk("%s():\n", __FUNCTION__);
    return 0;
}

static ssize_t
dev_read(struct file *filp, char __user *buf, size_t count, loff_t *pos)
{
    printk("%s():\n", __FUNCTION__);
    *pos = 0;
    return 0;
}

static ssize_t
dev_write(struct file *filp, const char __user *buf,
        size_t count, loff_t *pos)
{
    printk("%s():\n", __FUNCTION__);
    return count;
}

static int __init init_modules(void)
{
    int ret;

    ret = misc_register(&brook_miscdev);
    if (ret != 0) {
        printk("cannot register miscdev on minor=11 (err=%d)\n",ret);
    }

    return 0;
}

static void __exit exit_modules(void)
{
    misc_deregister(&brook_miscdev);
}

module_init(init_modules);
module_exit(exit_modules);

2009年12月20日星期日

Linux Kernel（4.2）- seq_file之single page

對於只有一頁的輸出，seq_file中的sart()/next()/stop()就顯得多餘，通常指需要一個show()，所以，seq_file也提供單頁的版本single_open()，以下範例為fs/proc/cmdline.c：

#include <linux/fs.h>
#include <linux/init.h>
#include <linux/proc_fs.h>
#include <linux/seq_file.h>

static int cmdline_proc_show(struct seq_file *m, void *v)
{
 seq_printf(m, "%s\n", saved_command_line);
 return 0;
}

static int cmdline_proc_open(struct inode *inode, struct file *file)
{
 return single_open(file, cmdline_proc_show, NULL);
}

static const struct file_operations cmdline_proc_fops = {
 .open  = cmdline_proc_open,
 .read  = seq_read,
 .llseek  = seq_lseek,
 .release = single_release,
};

static int __init proc_cmdline_init(void)
{
 proc_create("cmdline", 0, NULL, &cmdline_proc_fops);
 return 0;
}
module_init(proc_cmdline_init);

和seq_file的不同在於，因為只有一頁所以不需要sart()/next()/stop()，就只剩下show()，再來就是要用single_open()取代seq_open()。而relese也是要使用single_release()取代seq_release()。

2009年12月16日星期三

Linux Kernel（5）- ioctl

(V)將介紹file operations中的ioctl。ioctl的prototype為：
int (*ioctl) (struct inode *inode, struct file *filp, unsigned int cmd, unsigned long arg);
ioctl藉由cmd來判斷後面所接的參數為何，而早期的ioctl號碼並沒有規則，所以很容易重複，後來為了避免重複，採行編碼方式，將cmd拆成幾個部份，包含：
type：
　　即magic number，可以根據Document/ioctl/ioctl-number.txt挑選一個。
number：
　　為sequential number或者稱為ordinal number，讓user自行定義，只要自己不重複即可。
direction：
　　傳輸的方向，不外乎NONOE/READ/WRITE等等。
size：
　　即參數的size。

因為ioctl藉由cmd來判斷user想要的指令為何，以及後面所帶的參數為何，所以免不了的就會有一個switch/case來判斷，這也算是ioctl的特色吧。
怎麼定義ioctl的command以及如何解譯ioctl的command，我想直接拿ioctl.h來說明。

#define _IOC(dir,type,nr,size) \
         (((dir)  << _IOC_DIRSHIFT) | \
         ((type) << _IOC_TYPESHIFT) | \
         ((nr)   << _IOC_NRSHIFT) | \
         ((size) << _IOC_SIZESHIFT))

/* used to create numbers */
#define _IO(type,nr)            _IOC(_IOC_NONE,(type),(nr),0)
#define _IOR(type,nr,size)      _IOC(_IOC_READ,(type),(nr),(_IOC_TYPECHECK(size)))
#define _IOW(type,nr,size)      _IOC(_IOC_WRITE,(type),(nr),(_IOC_TYPECHECK(size)))
#define _IOWR(type,nr,size)     _IOC(_IOC_READ|_IOC_WRITE,(type),(nr),(_IOC_TYPECHECK(size)))
#define _IOR_BAD(type,nr,size)  _IOC(_IOC_READ,(type),(nr),sizeof(size))
#define _IOW_BAD(type,nr,size)  _IOC(_IOC_WRITE,(type),(nr),sizeof(size))
#define _IOWR_BAD(type,nr,size) _IOC(_IOC_READ|_IOC_WRITE,(type),(nr),sizeof(size))

/* used to decode ioctl numbers.. */
#define _IOC_DIR(nr)            (((nr) >> _IOC_DIRSHIFT) & _IOC_DIRMASK)
#define _IOC_TYPE(nr)           (((nr) >> _IOC_TYPESHIFT) & _IOC_TYPEMASK)
#define _IOC_NR(nr)             (((nr) >> _IOC_NRSHIFT) & _IOC_NRMASK)
#define _IOC_SIZE(nr)           (((nr) >> _IOC_SIZESHIFT) & _IOC_SIZEMASK)

在定義ioctl的command時，我們會根據資料傳輸的方向使用_IO(不需要傳輸資料)/_IOR(讀取)/_IOW(寫入)/_IOWR(讀寫)，type是我們挑選的magic number，nr即number，是流水號，size，就是the size of argument，下面是我們的範例brook_ioctl.h：

#ifndef IOC_BROOK_H
#define IOC_BROOK_H

#define BROOK_IOC_MAGIC     'k'
#define BROOK_IOCSETNUM     _IOW(BROOK_IOC_MAGIC,  1, int)
#define BROOK_IOCGETNUM     _IOR(BROOK_IOC_MAGIC,  2, int)
#define BROOK_IOCXNUM       _IOWR(BROOK_IOC_MAGIC, 3, int)
#define BROOK_IOC_MAXNR     3

#endif

這邊定義三個ioctl的command，分別為設定數值(BROOK_IOCSETNUM)，取得數值(BROOK_IOCGETNUM)和交換數值(BROOK_IOCXNUM)。

以下是我的module：

#include <linux/init.h>
#include <linux/module.h>

#include <linux/fs.h> // chrdev
#include <linux/cdev.h> // cdev_add()/cdev_del()
#include <linux/semaphore.h> // up()/down_interruptible()
#include <asm/uaccess.h> // copy_*_user()

#include "ioc_brook.h"

MODULE_LICENSE("GPL");

#define DEV_BUFSIZE         1024


static int dev_major;
static int dev_minor;
struct cdev *dev_cdevp = NULL;

static int 
  dev_open(struct inode*, struct file*);
static int 
  dev_release(struct inode*, struct file*);
static int
  dev_ioctl(struct inode*, struct file*, unsigned int, unsigned long);

static void __exit exit_modules(void);

struct file_operations dev_fops = {
    .owner   = THIS_MODULE,
    .open    = dev_open,
    .release = dev_release,
    .ioctl   = dev_ioctl
};

static int dev_open(struct inode *inode, struct file *filp)
{
    printk("%s():\n", __FUNCTION__);
    return 0;
}

static int dev_release(struct inode *inode, struct file *filp)
{
    printk("%s():\n", __FUNCTION__);
    return 0;
}

static int brook_num = 0;
static int 
dev_ioctl(struct inode *inode, struct file *filp,
          unsigned int cmd, unsigned long args)
{
    int tmp, err = 0, ret = 0;

    if (_IOC_TYPE(cmd) != BROOK_IOC_MAGIC)
        return -ENOTTY;
    if (_IOC_NR(cmd) > BROOK_IOC_MAXNR)
        return -ENOTTY;

    if (_IOC_DIR(cmd) & _IOC_READ) {
        err = !access_ok(VERIFY_WRITE, (void __user*)args, _IOC_SIZE(cmd));
    } else if (_IOC_DIR(cmd) & _IOC_WRITE) {
        err = !access_ok(VERIFY_READ, (void __user *)args, _IOC_SIZE(cmd));
    }
    if (err)
        return -EFAULT;

    switch (cmd) {
        case BROOK_IOCSETNUM:
            // don't need call access_ok() again. using __get_user().
            ret = __get_user(brook_num, (int __user *)args); 
            printk("%s(): get val = %d\n", __FUNCTION__, brook_num);
            break;
        case BROOK_IOCGETNUM:
            ret = __put_user(brook_num, (int __user *)args);
            printk("%s(): set val to %d\n", __FUNCTION__, brook_num);
            break;
        case BROOK_IOCXNUM:
            tmp = brook_num;
            ret = __get_user(brook_num, (int __user *)args);
            if (!ret) {
                ret = __put_user(tmp, (int __user *)args);
            }
            printk("%s(): change val from %d to %d\n",
                       __FUNCTION__, tmp, brook_num);
            break;
        default: /* redundant, as cmd was checked against MAXNR */
            return -ENOTTY;
    }
    return 0;
}

static int __init init_modules(void)
{
    dev_t dev;
    int ret;

    ret = alloc_chrdev_region(&dev, 0, 1, "brook");
    if (ret < 0) {
        printk("can't alloc chrdev\n");
        return ret;
    }
    dev_major = MAJOR(dev);
    dev_minor = MINOR(dev);
    printk("register chrdev(%d,%d)\n", dev_major, dev_minor);

    dev_cdevp = kmalloc(sizeof(struct cdev), GFP_KERNEL);
    if (dev_cdevp == NULL) {
        printk("kmalloc failed\n");
        goto failed;
    }
    cdev_init(dev_cdevp, &dev_fops);
    dev_cdevp->owner = THIS_MODULE;
    ret = cdev_add(dev_cdevp, MKDEV(dev_major, dev_minor), 1);
    if (ret < 0) {
        printk("add chr dev failed\n");
        goto failed;
    }

    return 0;

failed:
    if (dev_cdevp) {
        kfree(dev_cdevp);
        dev_cdevp = NULL;
    }
    return 0;
}

static void __exit exit_modules(void)
{
    dev_t dev;

    dev = MKDEV(dev_major, dev_minor);
    if (dev_cdevp) {
        cdev_del(dev_cdevp);
        kfree(dev_cdevp);
    }
    unregister_chrdev_region(dev, 1);
    printk("unregister chrdev\n");
}

module_init(init_modules);
module_exit(exit_modules);

在dev_ioctl()先檢視command的type(magic number)和number(sequential number)是否正確，接著在根據command的read/write特性，使用access_ok()檢驗該位址是否合法，後面就是ioctl慣有的switch/case了，根據不同的case執行不同的command和解釋後面所攜帶的參數。

底下是我的application：

#include <stdio.h>
#include <sys/ioctl.h>
#include <sys/types.h>

#include <sys/stat.h>
#include <fcntl.h>

#include "ioc_brook.h"

int main(int argc, char *argv[])
{
    int fd, ret;

    if (argc < 2) {
        printf("Usage: prog \n");
        return -1;
    }

    fd = open(argv[1], O_RDWR);
    if (fd < 0) {
        printf("open %s failed\n", argv[1]);
        return -1;
    }

    ret = 10;
    if (ioctl(fd, BROOK_IOCSETNUM, &ret) < 0) {
        printf("set num failed\n");
        return -1;
    }

    if (ioctl(fd, BROOK_IOCGETNUM, &ret) < 0) {
        printf("get num failed\n");
        return -1;
    }
    printf("get value = %d\n", ret);

    ret = 100;
    if (ioctl(fd, BROOK_IOCXNUM, &ret) < 0) {
        printf("exchange num failed\n");
        return -1;
    }
    printf("get value = %d\n", ret);

    return 0;
}

在app.c中，將開啟上面註冊的device，並且設定數值(BROOK_IOCSETNUM)，讀取數值(BROOK_IOCGETNUM)，和交換數值(BROOK_IOCXNUM)。

2009年12月13日星期日

Linux Kernel（4.1）- seq_file之範例(fp/proc/devices.c)

（IV .1）是seq_file的實例說明，將Linux中的fp/proc/devices.c拿出來當範例並且予以說明。

#include <linux/fs.h>
#include <linux/init.h>
#include <linux/proc_fs.h>
#include <linux/seq_file.h>

static int devinfo_show(struct seq_file *f, void *v)
{
    int i = *(loff_t *) v;

    if (i < CHRDEV_MAJOR_HASH_SIZE) {
        if (i == 0)
            seq_printf(f, "Character devices:\n");
        chrdev_show(f, i);
    }
#ifdef CONFIG_BLOCK
    else {
        i -= CHRDEV_MAJOR_HASH_SIZE;
        if (i == 0)
            seq_printf(f, "\nBlock devices:\n");
        blkdev_show(f, i);
    }
#endif
    return 0;
}


static void *devinfo_start(struct seq_file *f, loff_t *pos)
{
    if (*pos < (BLKDEV_MAJOR_HASH_SIZE + CHRDEV_MAJOR_HASH_SIZE))
        return pos;
    return NULL;
}

static void *devinfo_next(struct seq_file *f, void *v, loff_t *pos)
{
    (*pos)++;
    if (*pos >= (BLKDEV_MAJOR_HASH_SIZE + CHRDEV_MAJOR_HASH_SIZE))
        return NULL;
    return pos;
}

static void devinfo_stop(struct seq_file *f, void *v)
{
    /* Nothing to do */
}

static const struct seq_operations devinfo_ops = {
    .start = devinfo_start,
    .next  = devinfo_next,
    .stop  = devinfo_stop,
    .show  = devinfo_show
};

static int devinfo_open(struct inode *inode, struct file *filp)
{
    return seq_open(filp, &devinfo_ops);
}

static const struct file_operations proc_devinfo_operations = {
    .open  = devinfo_open,
    .read  = seq_read,
    .llseek  = seq_lseek,
    .release = seq_release,
};

static int __init proc_devices_init(void)
{
    proc_create("devices", 0, NULL, &proc_devinfo_operations);
    return 0;
}
module_init(proc_devices_init);

首先，這邊只有module_init()，所以只能載入，不能unload。而載入的點就是create一個proc檔 "devices"，並且註冊其file operations "proc_devinfo_operations"，在"proc_devinfo_operations"可以發現是一個seq_file的架構，所以我們就會想到start()/next()/stop()/show()等function應該負責的功能。
首先會看到start()即"devinfo_start()"，可以看出pos代表的是第幾個device，而pos最大為block+char的總和。
"devinfo_next()"應該負責移動pos，所以可以看出只有做了(*pos)++。
由於"devinfo_start()"並沒有和系統要求任何的resource，所以"devinfo_stop()"就不需要有任何cleanup的動作。
而"devinfo_show()"則是各別呼叫"chrdev_show()"和"blkdev_show()"來顯示char device和block device。

2009年12月5日星期六

Linux Kernel（4）- seq_file

對於如何維護procfs中的position是有些困擾的(超過一頁的顯示時)，所以後來有了seq_file，提供了iterator的interface來讀取資料，最大的好處是，position的定義就可以由programmer自行決定，比如position N是讀取第N行之類的。我們將改寫Linux Modules（III）- procfs的範例一來作為我們seq_file的例子。

#include <linux/init.h>
#include <linux/module.h>
#include <linux/proc_fs.h>
#include <linux/seq_file.h>
#include <linux/uaccess.h>

MODULE_LICENSE("GPL");

#define MAX_LINE        1000
static uint32_t *lines;

/**
 * seq_start() takes a position as an argument and returns an iterator which
 * will start reading at that position.
 */
static void* seq_start(struct seq_file *s, loff_t *pos)
{
    uint32_t *lines;

    if (*pos >= MAX_LINE) {
        return NULL; // no more data to read
    }

    lines = kzalloc(sizeof(uint32_t), GFP_KERNEL);
    if (!lines) {
        return NULL;
    }

    *lines = *pos + 1;

    return lines;
}

/**
 * move the iterator forward to the next position in the sequence
 */
static void* seq_next(struct seq_file *s, void *v, loff_t *pos)
{
    uint32_t *lines = v;
    *pos = ++(*lines);
    if (*pos >= MAX_LINE) {
        return NULL; // no more data to read
    }
    return lines;
}

/**
 * stop() is called when iteration is complete (clean up)
 */
static void seq_stop(struct seq_file *s, void *v)
{
    kfree(v);
}

/**
 * success return 0, otherwise return error code
 */
static int seq_show(struct seq_file *s, void *v)
{
    seq_printf(s, "Line #%d: This is Brook's demo\n", *((uint32_t*)v));
    return 0;
}

static struct seq_operations seq_ops = {
    .start = seq_start,
    .next  = seq_next,
    .stop  = seq_stop,
    .show  = seq_show
};

static int proc_open(struct inode *inode, struct file *file)
{
    return seq_open(file, &seq_ops);
}

static struct file_operations proc_ops = {
    .owner   = THIS_MODULE, // system
    .open    = proc_open,
    .read    = seq_read,    // system
    .llseek  = seq_lseek,   // system
    .release = seq_release  // system
};

static int __init init_modules(void)
{
    struct proc_dir_entry *ent;

    ent = create_proc_entry("brook", 0, NULL);
    if (ent) {
        ent->proc_fops = &proc_ops;
    }
    return 0;
}

static void __exit exit_modules(void)
{
    if (lines) {
        kfree(lines);
    }
    remove_proc_entry("brook", NULL);
}

module_init(init_modules);
module_exit(exit_modules);

首先，我們要實作start()，其主要功能就是回傳一個指向要讀取位置(position)的iterator。在我們的範例中，由於position是由0開始，而我們打算由line 1開始列印。
void * (*start) (struct seq_file *m, loff_t *pos);

接著我們要實作next()，主要功能為移動iterator到下一個位置去。在我們的範例中的iterator是指第N行，所以一到下一個iteraot就是把行數加一。
void * (*next) (struct seq_file *m, void *v, loff_t *pos);

最後要實作stop()，主要功能為當完成讀取後要執行的function，簡單來說就是cleanup。在我們的範例中，就是釋放start()所要求的resource。
void (*stop) (struct seq_file *m, void *v);

顯示的callback function為show()，要注意的是，在顯示資料的function必須使用seq_printf()，參數和printk()一樣，這樣就完成所以的function的實做了，接下來就是和procfs綁在一起了。
int (*show) (struct seq_file *m, void *v);

首先我們必須宣告struct seq_operations，並且填入我們剛剛實作的seq operations。

static struct seq_operations seq_ops = {
    .start = seq_start,
    .next  = seq_next,
    .stop  = seq_stop,
    .show  = seq_show
};

並且呼叫seq_open()將file和seq_operatio綁在一起。

static int proc_open(struct inode *inode, struct file *file)
{
    return seq_open(file, &seq_ops);
}

最後在填入file_operation，由於我們使用的是seq_file，所以，read()/llseek()/release()都只要使用seq_file提供的operations即可。

static struct file_operations proc_ops = {
    .owner   = THIS_MODULE, // system
    .open    = proc_open,
    .read    = seq_read,    // system
    .llseek  = seq_lseek,   // system
    .release = seq_release  // system
};

2009年11月26日星期四

Linux Kernel（3）- procfs

(III)將和大家介紹procfs，雖然後來procfs已經失控，也不再建議大家使用，但是還是有一定的用途，所以，有興趣的人可以一起探討一下。

/proc是一個特殊的檔案系統，當讀取/proc底下的檔案時，其內容由kernel動態產生，很多應用程式也都是存取/proc底下的檔案，我們藉由兩個範例來說明。第一個用於當/proc只提供read-only的時候，第二個用於/proc提供具有write功能的時候。

Read-Only

#include <linux/init.h>
#include <linux/module.h>
#include <linux/proc_fs.h>

MODULE_LICENSE("GPL");

#define MAX_LINE        1000

/* read file operations */
static int read_proc(char *page, char **start, off_t off,
                       int count, int *eof, void *data)
{
    uint32_t curline;
    char *p, *np;

    *start = p = np = page;
    curline = *(uint32_t*) data;

    printk("count=%d, off=%ld\n", count, off);
    for (;curline < MAX_LINE && (p - page) < count; curline++) {
        np += sprintf(p, "Line #%d: This is Brook's demo\n", curline);
        if ((count - (np - page)) < (np - p)) {
            break;
        }
        p = np;
    }

    if (curline < MAX_LINE) {
        *eof = 1;
    }
    *(uint32_t*)data = curline;
    return (p - page);
}

/* private data */
static uint32_t *lines;

static int __init init_modules(void)
{
    struct proc_dir_entry *ent;

    lines = kzalloc(sizeof(uint32_t), GFP_KERNEL);
    if (!lines) {
        printk("no mem\n");
        return -ENOMEM;
    }
    *lines = 0;

    /* create a procfs entry for read-only */
    ent = create_proc_read_entry ("brook", S_IRUGO, NULL, read_proc, lines);
    if (!ent) {
        printk("create proc failed\n");
        kfree(lines);
    }
    return 0;
}

static void __exit exit_modules(void)
{
    if (lines) {
        kfree(lines);
    }
    /* remove procfs entry */
    remove_proc_entry("brook", NULL);
}

module_init(init_modules);
module_exit(exit_modules);

在這個例子中我們使用一個globa variable "lines"，用於存放上次列印到第幾個數值，MAX_LINE是上限。一開始載入module就先分配記憶體給lines，接著呼叫create_proc_read_entry()建立一個唯讀的procfs檔案。

static inline struct proc_dir_entry *
    create_proc_read_entry(const char *name,
        mode_t mode, struct proc_dir_entry *base,
        read_proc_t *read_proc, void * data);

name是proc底下的檔案名稱。mode是檔案的mode，可以直接用0666的表示法表示，如果是建立目錄則要or S_IFDIR。base如果為null，則會把/proc當根目錄，可以是其他的proc_dir_entry pointer，那麼就會以proc_dir_entry當根目錄開始往下長。read_proc則是當user讀取該檔案時的call-back function。data則用於存放額外資訊的地方，可以先alloc，然後每次執行read時，都可以拿出來使用。

typedef int (read_proc_t)(char *page, char **start, off_t off,
    int count, int *eof, void *data);
page是系統給的buffer，也是我們要寫入的地方，寫到buffer裡面的資料，會被當成proc的內容。start如果只要一次就可以read完，可以忽略start，否則只要將page設定給start就可以了。count則是user-space想要讀取的資料量。offset指示目前檔案位置。eof指示是否已經讀到結尾。data用於存放額外的資訊。

Writable

#include <linux/init.h>
#include <linux/module.h>
#include <linux/proc_fs.h>
#include <linux/uaccess.h>

MODULE_LICENSE("GPL");

#define MAX_LINE        1000

/* write file operations */
static int write_proc(struct file *file, const char __user *buf,
                       unsigned long count, void *data)
{
    char num[10], *p;
    int nr, len;

    /* no data be written */
    if (!count) {
        printk("count is 0\n");
        return 0;
    }

    /* Input size is too large to write our buffer(num) */
    if (count > (sizeof(num) - 1)) {
        printk("input is too large\n");
        return -EINVAL;
    }

    if (copy_from_user(num, buf, count)) {
        printk("copy from user failed\n");
        return -EFAULT;
    }

    /* atoi() */
    p = num;
    len = count;
    nr = 0;
    do {
        unsigned int c = *p - '0';
        if (*p == '\n') {
            break;
        }
        if (c > 9) {
            printk("%c is not digital\n", *p);
            return -EINVAL;
        }
        nr = nr * 10 + c;
        p++;
    } while (--len);
    *(uint32_t*) data = nr;

    return count;
}

static int read_proc(char *page, char **start, off_t off,
                       int count, int *eof, void *data)
{
    uint32_t curline;
    char *p, *np;

    *start = p = np = page;
    curline = *(uint32_t*) data;

    printk("count=%d, off=%ld\n", count, off);
    for (;curline < MAX_LINE && (p - page) < count; curline++) {
        np += sprintf(p, "Line #%d: This is Brook's demo\n", curline);
        if ((count - (np - page)) < (np - p)) {
            break;
        }
        p = np;
    }

    if (curline < MAX_LINE) {
        *eof = 1;
    }
    *(uint32_t*)data = curline;
    return (p - page);
}

static uint32_t *lines;

static int __init init_modules(void)
{

    struct proc_dir_entry *ent;
    lines = kzalloc(sizeof(uint32_t), GFP_KERNEL);
    if (!lines) {
        printk("no mem\n");
        return -ENOMEM;
    }
    *lines = 0;

    ent = create_proc_entry("brook", S_IFREG | S_IRWXU, NULL);
    if (!ent) {
        printk("create proc failed\n");
        kfree(lines);
    } else {
        ent->write_proc = write_proc;
        ent->read_proc = read_proc;
        ent->data = lines;
    }
    return 0;
}

static void __exit exit_modules(void)
{
    if (lines) {
        kfree(lines);
    }
    remove_proc_entry("brook", NULL);
}

module_init(init_modules);
module_exit(exit_modules);

第二個範例則提供了write的功能，所以不能直接使用create_proc_read_entry()建立procfs檔案，而是要使用create_proc_entry()建立procfs檔案，並且設定read/write file operations，以及"data"。該範例中的write是設定line。

typedef int (write_proc_t)(struct file *file, const char __user *buffer,
unsigned long count, void *data);

2009年11月25日星期三

Linux Kernel（2）- register char device

II我們將介紹如何向Linux註冊char device，並且read/write該device。

#include <linux/init.h>
#include <linux/init.h>
#include <linux/module.h>

#include <linux/fs.h> // chrdev
#include <linux/cdev.h> // cdev_add()/cdev_del()
#include <asm/uaccess.h> // copy_*_user()

MODULE_LICENSE("GPL");

#define DEV_BUFSIZE 1024

int dev_major;
int dev_minor;
struct cdev *dev_cdevp = NULL;

int dev_open(struct inode*, struct file*);
int dev_release(struct inode*, struct file*);
ssize_t dev_read(struct file*, char __user*, size_t, loff_t*);
ssize_t dev_write(struct file*, const char __user *, size_t, loff_t*);
static void __exit exit_modules(void);

struct file_operations dev_fops = {
    .owner = THIS_MODULE,
    .open = dev_open,
    .release = dev_release,
    .read = dev_read,
    .write = dev_write,
};

int dev_open(struct inode *inode, struct file *filp)
{
    printk("%s():\n", __FUNCTION__);
    return 0;
}

int dev_release(struct inode *inode, struct file *filp)
{
    printk("%s():\n", __FUNCTION__);
    return 0;
}

ssize_t 
dev_read(struct file *filp, char __user *buf, size_t count, loff_t *f_pos)
{
    char data[] = "brook";
    ssize_t ret = 0;

    printk("%s():\n", __FUNCTION__);

    if (*f_pos >= sizeof(data)) {
        goto out;
    }

    if (count > sizeof(data)) {
        count = sizeof(data);
    }

    if (copy_to_user(buf, data, count) < 0) {
        ret = -EFAULT;
        goto out;
    }
    *f_pos += count;
    ret = count;

out:
    return ret;
}

ssize_t 
dev_write(struct file *filp, const char __user *buf, size_t count,
          loff_t *f_pos)
{
    char *data;
    ssize_t ret = 0;

    printk("%s():\n", __FUNCTION__);

    data = kzalloc(sizeof(char) * DEV_BUFSIZE, GFP_KERNEL);
    if (!data) {
        return -ENOMEM;
    }

    if (count > DEV_BUFSIZE) {
        count = DEV_BUFSIZE;
    }

    if (copy_from_user(data, buf, count) < 0) {
        ret = -EFAULT;
        goto out;
    }
    printk("%s(): %s\n", __FUNCTION__, data);
    *f_pos += count;
    ret = count;
out:
    kfree(data);
    return ret;
}

static int __init init_modules(void)
{
    dev_t dev;
    int ret;

    ret = alloc_chrdev_region(&dev, 0, 1, "brook");
    if (ret) {
        printk("can't alloc chrdev\n");
        return ret;
    }

    dev_major = MAJOR(dev);
    dev_minor = MINOR(dev);
    printk("register chrdev(%d,%d)\n", dev_major, dev_minor);

    dev_cdevp = kzalloc(sizeof(struct cdev), GFP_KERNEL);
    if (dev_cdevp == NULL) {
        printk("kzalloc failed\n");
        goto failed;
    }
    cdev_init(dev_cdevp, &dev_fops);
    dev_cdevp->owner = THIS_MODULE;
    ret = cdev_add(dev_cdevp, MKDEV(dev_major, dev_minor), 1);
    if (ret < 0) {
        printk("add chr dev failed\n");
        goto failed;
    }

    return 0;

failed:
    if (dev_cdevp) {
        kfree(dev_cdevp);
        dev_cdevp = NULL;
    }
    return 0;
}

static void __exit exit_modules(void)
{
    dev_t dev;

    dev = MKDEV(dev_major, dev_minor);
    if (dev_cdevp) {
        cdev_del(dev_cdevp);
        kfree(dev_cdevp);
    }
    unregister_chrdev_region(dev, 1);
    printk("unregister chrdev\n");
}

module_init(init_modules);
module_exit(exit_modules);

當module被load進來時，init_modules()會被執行，而init_modules()做了幾件事情，首先向Linux要求分配char device number "alloc_chrdev_region(dev_t *dev, unsigned baseminor, unsigned count, const char *name)"。dev是分配到的dev_t(output parameter)，baseminor是起始的minor number，當baseminor=0會由系統分配，否則會試圖尋找可以符合baseminor的minor number，count是minor的數量，name是註冊的名稱。
接著設定struct cdev "cdev_init()"，並且註冊到系統"cdev_add()"，在cdev_init()的同時也設定了該device的file operations。

struct file_operations就是device的file operations，簡單的說，UNIX所有的東西都可以視為檔案，當然包含device，既然是檔案，就會有open、close/release、read和write等等操作，而這些操作都會觸發對應的function來反應，這就是file operations的功能了。

我們的read會回應"brook"，而write則是藉由printk()印出來，open和close/release都單純的回應成功而已。

而exit_modules()當然就是負責歸還當初和OS要的資源了"kfree()"，當然包含之前註冊的char device number，現在也要unregister "unregister_chrdev_region()"。

2009年11月22日星期日

Linux Kernel（1）- Linux Module簡介

Linux module練習手札I紀錄如何撰寫一個簡單的module，並且編輯它，以及load和unload一個module。

write a module

#include <linux/init.h>
#include <linux/module.h>
MODULE_LICENSE("GPL");

static int __init init_modules(void)
{
    printk("hello world\n");
    return 0;
}

static void __exit exit_modules(void)
{
    printk("goodbye\n");
}

module_init(init_modules);
module_exit(exit_modules);

<linux/init.h>和#include <linux/module.h>是Linux 任何的module都會用到的header file，init.h主要定義module的init和cleanup，如module_init()和module_exit()。而module.h定義了module所需要的資料結構與macro。
對於__init的解釋在init.h有非常好的解釋：
The kernel can take this as hint that the function is used only during the initialization phase and free up used memory resources after.
簡單的說就是這個function在初始化後（執行完）就被free了。

而__exit的解釋是：
__exit is used to declare a function which is only required on exit: the function will be dropped if this file is not compiled as a module.

module_init()的解釋是：
The module_init() macro defines which function is to be called at module insertion time (if the file is compiled as a module), or at boot time: if the file is not compiled as a module the module_init() macro becomes equivalent to __initcall(), which through linker magic ensures that the function is called on boot.
主要是用來設定當insert該module後，應該要被執行的進入點（enrty point）。

module_exit()的解釋是：
This macro defines the function to be called at module removal time (or never, in the case of the file compiled into the kernel). It will only be called if the module usage count has reached zero. This function can also sleep, but cannot fail: everything must be cleaned up by the time it returns.
Note that this macro is optional: if it is not present, your module will not be removable (except for 'rmmod -f').
簡言之，就是當user執行rmmod時，會被執行到的function。沒有module_exit()，module就不能被rmmod。

write a Makefile to manage the module

mname := brook_modules
$(mname)-objs := main.o
obj-m := $(mname).o

KERNELDIR := /lib/modules/`uname -r`/build

all:
        $(MAKE) -C $(KERNELDIR) M=`pwd` modules

clean:
        $(MAKE) -C $(KERNELDIR) M=`pwd` clean

$(mname)-objs是告訴make這個module有哪些object files。
obj-m是告訴make這個module的name是什麼。
KERNELDIR是告訴make這個module的kernel所在的位置。
後面就接兩個target(all/clean)，用於處理產生和清除module用。

load/unload a module

2009年9月25日星期五

linux - Memory Barrier

static inline void barrier(void)
{
    asm volatile("": : : "memory");
}

格式 : __asm__(組合語言:輸出:輸入:修飾詞")
__volatile__ 代表這行指令(這些組合語言)，不和前面的指令一起最佳化。
"memory" 告訴GCC這些組合語言會改變所有的RAM的資料。
因為沒組合語言，又告訴gcc所有RAM的內容都改變，所以這個memory barrier的效用，會讓這行之前被gcc所cache到暫存器的資料通通寫回RAM裡面，也告訴gcc會讓之後讀取RAM的資料，必須再從RAM裡讀取出來。

取自
http://october388.blogspot.com/2008/12/memory-barrier.html

訂閱：文章 (Atom)

Nano雞排

2009年12月27日星期日

Linux Modules（7.2）- tasklet

Linux Kernel（7.1）- timer

Linux Kernel（7）- timing

Linux Kernel（3.1）- procfs之vector方式寫入

2009年12月26日星期六

kernel探索之linux/typecheck.h

2009年12月24日星期四

Linux Kernel（6）- miscdev

2009年12月20日星期日

Linux Kernel（4.2）- seq_file之single page

2009年12月16日星期三

Linux Kernel（5）- ioctl

2009年12月13日星期日

Linux Kernel（4.1）- seq_file之範例(fp/proc/devices.c)

2009年12月5日星期六

Linux Kernel（4）- seq_file

2009年11月26日星期四

Linux Kernel（3）- procfs

Read-Only

Writable

2009年11月25日星期三

Linux Kernel（2）- register char device

2009年11月22日星期日

Linux Kernel（1）- Linux Module簡介

write a module

write a Makefile to manage the module

load/unload a module

2009年9月25日星期五

linux - Memory Barrier

熱門文章

關於我自己

網誌存檔

搜尋此網誌

標籤

2009年12月27日 星期日

2009年12月26日 星期六

2009年12月24日 星期四

2009年12月20日 星期日

2009年12月16日 星期三

2009年12月13日 星期日

2009年12月5日 星期六

2009年11月26日 星期四

Read-Only

Writable

2009年11月25日 星期三

2009年11月22日 星期日

write a module

write a Makefile to manage the module

load/unload a module

2009年9月25日 星期五

熱門文章

關於我自己

網誌存檔

搜尋此網誌

標籤

2009年12月27日星期日

2009年12月26日星期六

2009年12月24日星期四

2009年12月20日星期日

2009年12月16日星期三

2009年12月13日星期日

2009年12月5日星期六

2009年11月26日星期四

2009年11月25日星期三

2009年11月22日星期日

2009年9月25日星期五